Skip to content

Conversation

@jangjodi
Copy link
Contributor

Move iteration over query result into _make_postgres_call_with_filter function so that the query actually runs in this function
This will allow the retry logic to actually work and for us to remove unneeded count call
Return batch end group id from _make_postgres_call_with_filter since it could be one of the groups that we filtered out

@jangjodi jangjodi requested a review from a team as a code owner August 13, 2024 17:43
@jangjodi jangjodi requested a review from wedamija August 13, 2024 17:43
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Aug 13, 2024
@jangjodi jangjodi merged commit 10847d4 into master Aug 13, 2024
@jangjodi jangjodi deleted the jodi/similarity-backfill-fix-query branch August 13, 2024 18:20
@sentry
Copy link

sentry bot commented Aug 13, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ OperationalError: QueryCanceled('canceling statement due to user request\n') sentry.tasks.backfill_seer_grouping_records View Issue

Did you find this useful? React with a 👍 or 👎

@wedamija
Copy link
Member

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ OperationalError: QueryCanceled('canceling statement due to user request\n') sentry.tasks.backfill_seer_grouping_records View Issue

Did you find this useful? React with a 👍 or 👎

@jangjodi I don't think the retry will help, here's the query plan (I can send you the query on slack if you want to run it yourself)


Limit (cost=0.70..194.54 rows=45 width=567) (actual time=5814.282..82962.850 rows=26 loops=1) 
-> Index Scan Backward using sentry_grou_project_41a5ce_idx on sentry_groupedmessage (cost=0.70..20918.00 rows=4856 width=567) (actual time=5814.280..82962.816 rows=26 loops=1) |  
Index Cond: (project_id = '<snipped>'::bigint) 
Filter: ((times_seen > 1) AND (type = 1)) 
Rows Removed by Filter: 177544  
Planning Time: 9.216 ms 
Execution Time: 82962.982 ms


You can see that the filter (ie times_seen and type=1) removes 177k rows. That's the reason this is timing out. I would recommend moving these filters into memory. One side effect of doing this is that you will sometimes end up with empty batches, which will cause you to exit early here:

if total_groups_to_backfill_length == 0:
logger.info(
"backfill_seer_grouping_records.no_more_groups",
extra={"project_id": project.id},
)
if enable_ingestion:
logger.info(
"backfill_seer_grouping_records.enable_ingestion",
extra={"project_id": project.id},
)
project.update_option("sentry:similarity_backfill_completed", int(time.time()))
return (
groups_to_backfill_batch,
None,
)

Instead, you should probably check for batch_end_group_id > last_processed_group_id or something along those lines.

ameliahsu pushed a commit that referenced this pull request Aug 13, 2024
Move iteration over query result into retry function
so that the query can be retried
Remove unneeded count call
Return batch end group id from retry function
@jangjodi
Copy link
Contributor Author

Thank you @wedamija! I've opened a PR to address the changes. I only added the manual filter for times_seen, as running explain analyze with only the type filter was quick, but please let me know if you think I should filter that out manually as well.

@wedamija
Copy link
Member

Thank you @wedamija! I've opened a PR to address the changes. I only added the manual filter for times_seen, as running explain analyze with only the type filter was quick, but please let me know if you think I should filter that out manually as well.

I think it's ok to experiment with just the times_seen changes, if you see other problems you can move the other filtering out later

@github-actions github-actions bot locked and limited conversation to collaborators Aug 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants