GPPI Clustering Tool

This tool uses the DBSCAN algorithm to automatically cluster release groups into tiers based on their Golden Popcorn Performance Index (GPPI) values. It generates a structured JSON output that can be used with Radarr custom format creation scripts.

Overview

The Golden Popcorn Performance Index (GPPI) measures how likely a release group is to produce high-quality "Golden Popcorn" encodes. This script:

Takes a JSON file containing release group GPPI values
Uses DBSCAN to find natural clusters/tiers in the data
Outputs a JSON file with tiered group assignments

No manual assignment of tiers or predefined number of clusters is required - the algorithm discovers the natural tiers in your data.

Requirements

Python 3.6+
Required packages:
- numpy
- pandas
- scikit-learn

Install dependencies with:

pip install numpy pandas scikit-learn

Usage

Basic Usage

./dbscan.py input_file.json --resolution 1080p --type Quality

This will:

Read GPPI values from input_file.json
Cluster the groups using DBSCAN with default parameters
Save the results to 1080p Quality.json

Input Format

The input file should be a JSON object with release group names as keys and their GPPI values as numeric values:

{
  "EbP": 412.45,
  "DON": 350.55,
  "HiDt": 227.22,
  ...
}

Output Format

The output file contains:

Metadata about the clustering
Statistics for each tier
A tiered_groups array with objects containing name and tier properties

{
  "metadata": {
    "total_groups": 37,
    "total_tiers": 5,
    "resolution": "1080p",
    "type": "Quality",
    "algorithm": "DBSCAN",
    "eps_value": 49.37,
    "eps_factor": 0.12,
    "min_samples": 1
  },
  "tier_statistics": {
    "tier_1": {
      "count": 2,
      "min_gppi": 350.55,
      "max_gppi": 412.45,
      "avg_gppi": 381.5
    },
    ...
  },
  "tiered_groups": [
    {
      "name": "EbP",
      "tier": 1
    },
    {
      "name": "DON",
      "tier": 1
    },
    ...
  ]
}

Command Line Options

positional arguments:
  input_file            Input JSON file with GPPI values

required arguments:
  --resolution {SD,720p,1080p,2160p}
                        Resolution for the output
  --type {Quality,Efficient}
                        Type of release groups

optional arguments:
  --output-dir OUTPUT_DIR
                        Directory for output JSON file
  --eps EPS             Epsilon factor (as proportion of data range) for DBSCAN
  --min-samples MIN_SAMPLES
                        Minimum samples parameter for DBSCAN
  --optimize            Automatically optimize epsilon to get 3-8 clusters
  --verbose             Print detailed information about clusters

Advanced Usage

Automatic Optimization

To automatically find the best epsilon value that gives between 3-8 clusters:

./dbscan.py input.json --resolution 1080p --type Quality --optimize --verbose

Manual Epsilon Control

You can control the clustering sensitivity with the --eps parameter:

./dbscan.py input.json --resolution 1080p --type Quality --eps 0.15

The epsilon value is specified as a proportion of the data range:

Lower values (0.05-0.10): More tiers, finer granularity
Higher values (0.15-0.25): Fewer tiers, broader categories

Verbose Output

Add the --verbose flag to see detailed information about the discovered tiers:

./dbscan.py input.json --resolution 1080p --type Quality --verbose

Tips for Getting Good Clusters

Start with --optimize --verbose to see what the automatic optimization discovers
If you want more tiers, use a smaller epsilon (e.g., --eps 0.08)
If you want fewer tiers, use a larger epsilon (e.g., --eps 0.2)
Look at the GPPI distribution to understand natural groupings in your data

Integration with Custom Format Scripts

The output of this script is designed to work directly with Radarr custom format creation scripts. The JSON structure provides a tiered_groups array that can be used to create custom formats for different quality tiers.

Example Workflow

Generate GPPI values for release groups at a specific resolution
Run this script to automatically cluster them into tiers
Use the resulting JSON with your custom format creation script
Import the custom formats into Radarr

This approach ensures that your quality tiers are based on natural groupings in the GPPI data rather than arbitrary assignments.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
output		output
README.md		README.md
dbscan.py		dbscan.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPPI Clustering Tool

Overview

Requirements

Usage

Basic Usage

Input Format

Output Format

Command Line Options

Advanced Usage

Automatic Optimization

Manual Epsilon Control

Verbose Output

Tips for Getting Good Clusters

Integration with Custom Format Scripts

Example Workflow

About

Uh oh!

Releases

Packages

Languages

Dictionarry-Hub/gppi

Folders and files

Latest commit

History

Repository files navigation

GPPI Clustering Tool

Overview

Requirements

Usage

Basic Usage

Input Format

Output Format

Command Line Options

Advanced Usage

Automatic Optimization

Manual Epsilon Control

Verbose Output

Tips for Getting Good Clusters

Integration with Custom Format Scripts

Example Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages