-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
fix(similarity): Use count instead of length in backfill query #76008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| total_groups_to_backfill_length = groups_to_backfill_batch_raw.count() | ||
| batch_end_group_id = ( | ||
| groups_to_backfill_batch_raw[total_groups_to_backfill_length - 1][0] | ||
| if total_groups_to_backfill_length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're now making two queries here - one to get the count, and one to get the end group id for the batch, in addition to fetching the batch on 183.
Is there any reason we can't just perform this logic after 183?
Any reason we can't move this logic after we filter to groups_to_backfill_batch?
| group_id_filter = Q(id__lt=last_processed_group_id) | ||
|
|
||
| (groups_to_backfill_batch_raw, batch_size) = _make_postgres_call_with_retry( | ||
| group_id_filter, project.id, batch_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_make_postgres_call_with_retry doesn't work properly afaict. This just constructs the queryset
sentry/src/sentry/tasks/embeddings_grouping/utils.py
Lines 124 to 133 in 2376850
| groups_to_backfill_batch_raw = ( | |
| Group.objects.filter( | |
| group_id_filter, | |
| project_id=project_id, | |
| type=ErrorGroupType.type_id, | |
| times_seen__gt=1, | |
| ) | |
| .values_list("id", "data", "status", "last_seen") | |
| .order_by("-id")[:batch_size] | |
| ) |
That means you'll never enter your retry loop here:
sentry/src/sentry/tasks/embeddings_grouping/utils.py
Lines 143 to 149 in 2376850
| except OperationalError: | |
| batch_size = batch_size // 2 | |
| try: | |
| logger.info( | |
| "tasks.backfill_seer_grouping_records.postgres_query_retry", | |
| extra={"project_id": project_id, "batch_size": batch_size}, | |
| ) |
You should coerce to a list in the _make_postgres_call. That also saves you the trouble of using .count here since you won't have a queryset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
Use
.count()instead oflenin postgres query