Skip to content

Conversation

@scholtzan
Copy link
Collaborator

@scholtzan scholtzan commented Nov 18, 2025

Description

This PR speeds up the DAG generation process by:

  1. caching the stable schemas that get downloaded as part of the generate-sql step and reuses them during DAG generation (save around 1.5minutes). If the generate-sql step gets skipped then it will still download schemas
  2. parallelizing extracting the table references needed for generating DAG task dependencies (saving around 1 minute)

Related Tickets & Documents

Reviewer, please follow this checklist

@dataops-ci-bot

This comment has been minimized.

@scholtzan scholtzan changed the title Speed up DAG generation [DENG-10167] Speed up DAG generation Nov 19, 2025
--ignore derived_view_schemas \
--output-dir /tmp/workspace/generated-sql/sql/ \
--target-project moz-fx-data-shared-prod
PATH="venv/bin:$PATH" script/bqetl format /tmp/workspace/generated-sql/sql/
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to run the formatting if we use the existing generated-sql branch since anything on there is already properly formatted. The formatting does take 2-3 minutes to run

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC a lot (most?) of that is due to qualifying table references. That could be another thing to optimize but it seems hard to do

@scholtzan scholtzan marked this pull request as ready for review November 19, 2025 19:18
@scholtzan scholtzan requested a review from a team as a code owner November 19, 2025 19:18
@dataops-ci-bot

This comment has been minimized.

@scholtzan scholtzan force-pushed the cache-stable-schemas branch from cc1b347 to 99d0511 Compare November 19, 2025 19:43
@dataops-ci-bot
Copy link

Integration report for "Use cached stable schemas for stage deploys"

sql.diff

No content detected.

--ignore derived_view_schemas \
--output-dir /tmp/workspace/generated-sql/sql/ \
--target-project moz-fx-data-shared-prod
PATH="venv/bin:$PATH" script/bqetl format /tmp/workspace/generated-sql/sql/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC a lot (most?) of that is due to qualifying table references. That could be another thing to optimize but it seems hard to do

@scholtzan scholtzan added this pull request to the merge queue Nov 19, 2025
Merged via the queue into main with commit c7e6d5c Nov 19, 2025
22 checks passed
@scholtzan scholtzan deleted the cache-stable-schemas branch November 19, 2025 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants