Steps for setup

analyzer is capable of giving index recommendations after going through your postgres logs. It works with all languages, ORMs and query builders!

There are a couple assumptions about your CI pipeline we make for this to work.

There are database queries that hit up a real postgres database in your pipeline. The source is not important, it could be e2e, load or integration tests. The queries can be run in a rolled-back transaction and it will still work fine.
The final schema (after migrations run) is available for analyzer to introspect. And that every table has at least 1 row in it as part of your db seed. We use the database to do extra work by testing your query against different index configurations with your production stats, but all of that work is done in a transaction that’s always rolled back. Data is never modified
Your postgres.conf is configured with at least the following options.

shared_preload_libraries='auto_explain'
auto_explain.log_min_duration=0
auto_explain.log_analyze=true
auto_explain.log_verbose=true
auto_explain.log_buffers=true
auto_explain.log_format='json'
logging_collector=on
log_directory='/var/log/postgresql' # or any path you like
log_filename='postgres.log' # or any name you like

Optional

You have a production database you can pull statistics from (using a query given by us)

Steps for setup

Currently we only support GitHub actions but it would not be difficult to add support for other CI platforms like azure pipelines.

Github Actions

ubuntu runners in github already ships with postgres as part of the default image, so we try to leverage that. Because github workflows does not support specifying arguments to services actions/runner#1152, we can’t run postgres as a container. And trying to run postgres in docker directly causes network problems because it seems --network=host is also not supported in dind (docker-in-docker). So we instead copy an explicit setup to the existing postgres, which boots up 10x faster than docker anyway.

Copy the setup script

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run Postgres
        run: |
          sudo tee -a /etc/postgresql/16/main/postgresql.conf <<EOF
            shared_preload_libraries = 'auto_explain'
            auto_explain.log_min_duration = 0
            auto_explain.log_analyze = true
            auto_explain.log_verbose = true
            auto_explain.log_buffers = true
            auto_explain.log_format = 'json'
            logging_collector = on
            log_directory = '/var/log/postgresql'
            log_filename = 'postgres.log'
          EOF
          sudo tee /etc/postgresql/16/main/pg_hba.conf > /dev/null <<EOF
            host all all 127.0.0.1/32 trust
            host all all ::1/128 trust
            local all all peer
          EOF
          sudo systemctl start postgresql.service
          sudo -u postgres createuser -s -d -r -w me
          sudo -u postgres createdb testing
          sudo chmod 666 /var/log/postgresql/postgres.log

you can change sudo -u postgres createuser -s -d -r -w me to create a new user with a name of your choosing and sudo -u postgres createdb testing to create a db with a different name.

Run your migrations and seed scripts. This is just an example showing that the migrations should target the postgres instance that was set up with the previous command

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      # ... previous steps ...
      - name: Migrate
        run: pnpm run migrate && pnpm run seed
        env:
          POSTGRES_URL: postgres://me@localhost/testing

Run your test suite against the same database. You can do this with any tool and use any query builder or ORM you like.
Run the analyzer. GITHUB_TOKEN is needed to post a comment to your PR reviewing the indexes found in your database.

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      # ... previous steps ...
      - name: Migrate
        run: pnpm run migrate && pnpm run seed
        env:
          POSTGRES_URL: postgres://me@localhost/testing
      - name: Run integration tests
        run: pnpm run test:integration
        env:
          POSTGRES_URL: postgres://me@localhost/testing
      - name: Run query-doctor/analyzer
        uses: query-doctor/analyzer@v0
        env:
          GITHUB_TOKEN: ${{ github.token }}
          POSTGRES_URL: postgres://me@localhost/testing

Add pull-request: write permissions to your job to allow

jobs:
  run:
    permissions:
      contents: read
      pull-requests: write
    runs-on: ubuntu-latest
...

Statistics synchronization

To make sure that we can most accurately emulate your production database, we need access to its stats.

You can use the following function to dump the stats for your database. Copy paste this into psql to create the function.

Since Postgres also keeps track of things like "most common values in table", part of the statistics table includes copies of cells from all tables. include_sensitive_info

CREATE OR REPLACE FUNCTION _qd_dump_stats(include_sensitive_info boolean) RETURNS jsonb AS $$
SELECT
  json_agg(t)
  FROM (
    SELECT
  c.table_name as "tableName",
  c.table_schema as "schemaName",
  cl.reltuples,
  cl.relpages,
  cl.relallvisible,
  n.nspname as "schemaName",
  json_agg(
    json_build_object(
      'columnName', c.column_name,
      'dataType', c.data_type,
      'isNullable', (c.is_nullable = 'YES')::boolean,
      'characterMaximumLength', c.character_maximum_length,
      'numericPrecision', c.numeric_precision,
      'numericScale', c.numeric_scale,
      'columnDefault', c.column_default,
      'stats', (
        select json_build_object(
          'starelid', s.starelid,
          'staattnum', s.staattnum,
          'stainherit', s.stainherit,
          'stanullfrac', s.stanullfrac,
          'stawidth', s.stawidth,
          'stadistinct', s.stadistinct,
          -- slot 1
          'stakind1', s.stakind1,
          'staop1', s.staop1,
          'stacoll1', s.stacoll1,
          'stanumbers1', s.stanumbers1,
          -- slot 2
          'stakind2', s.stakind2,
          'staop2', s.staop2,
          'stacoll2', s.stacoll2,
          'stanumbers2', s.stanumbers2,
          -- slot 3
          'stakind3', s.stakind3,
          'staop3', s.staop3,
          'stacoll3', s.stacoll3,
          'stanumbers3', s.stanumbers3,
          -- slot 4
          'stakind4', s.stakind4,
          'staop4', s.staop4,
          'stacoll4', s.stacoll4,
          'stanumbers4', s.stanumbers4,
          -- slot 5
          'stakind5', s.stakind5,
          'staop5', s.staop5,
          'stacoll5', s.stacoll5,
          'stanumbers5', s.stanumbers5,
          -- non-anonymous stats
          'stavalues1', case when $1 then s.stavalues1 else null end,
          'stavalues2', case when $1 then s.stavalues2 else null end,
          'stavalues3', case when $1 then s.stavalues3 else null end,
          'stavalues4', case when $1 then s.stavalues4 else null end,
          'stavalues5', case when $1 then s.stavalues5 else null end
        )
          from pg_statistic s
        where
          s.starelid = a.attrelid
          and s.staattnum = a.attnum
      )
    )
  ORDER BY c.ordinal_position) as columns
FROM
    information_schema.columns c
JOIN
    pg_attribute a
    ON a.attrelid = (quote_ident(c.table_schema) || '.' || quote_ident(c.table_name))::regclass
    AND a.attname = c.column_name
JOIN
    pg_class cl
    ON cl.relname = c.table_name
JOIN
    pg_namespace n
    ON n.oid = cl.relnamespace
WHERE
    c.table_name not like 'pg_%'
    and n.nspname <> 'information_schema'
    and c.table_name not in ('pg_stat_statements', 'pg_stat_statements_info')
GROUP BY
    c.table_name, c.table_schema, cl.reltuples, cl.relpages, cl.relallvisible, n.nspname /* @qd_introspection */
) t;
$$ LANGUAGE sql STABLE SECURITY DEFINER;

Note: The function is defined with SECURITY DEFINER so that it can be called either manually, or automatically by the analyzer if you set up stats pull integration.

Regular PSQL

psql -d <yourdb> -At -F "" -c "select _qd_dump_stats(false)" > stats.json

Postgres in Kubernetes

This example uses cloudnative-pg, but it can apply to any pod that has access to psql as superuser.

kubectl exec <podname> -n cnpg-system -c postgres -- psql -d <yourdb> -At -F "" -c "select _qd_dump_stats(false)" > stats.json

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
optimizer		optimizer
reporters		reporters
test		test
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
Readme.md		Readme.md
action.yaml		action.yaml
analyzer.ts		analyzer.ts
bootstrap.sql		bootstrap.sql
deno.json		deno.json
deno.lock		deno.lock
env.ts		env.ts
json.ts		json.ts
main.ts		main.ts
main_test.ts		main_test.ts
permutations_test.ts		permutations_test.ts
pg_log.ts		pg_log.ts
runner.ts		runner.ts
statistics.json		statistics.json
umami.sql		umami.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Optional

Steps for setup

Github Actions

Statistics synchronization

Regular PSQL

Postgres in Kubernetes

About

Uh oh!

Releases 4

Packages

Languages

Query-Doctor/analyzer

Folders and files

Latest commit

History

Repository files navigation

Optional

Steps for setup

Github Actions

Statistics synchronization

Regular PSQL

Postgres in Kubernetes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages