Skip to content

Conversation

@jangjodi
Copy link
Contributor

Use .count() instead of len in postgres query

@jangjodi jangjodi requested a review from a team as a code owner August 12, 2024 20:45
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Aug 12, 2024
@jangjodi jangjodi merged commit 2376850 into master Aug 12, 2024
@jangjodi jangjodi deleted the jodi/similarity-backfill-query-count branch August 12, 2024 21:16
Comment on lines +175 to 178
total_groups_to_backfill_length = groups_to_backfill_batch_raw.count()
batch_end_group_id = (
groups_to_backfill_batch_raw[total_groups_to_backfill_length - 1][0]
if total_groups_to_backfill_length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're now making two queries here - one to get the count, and one to get the end group id for the batch, in addition to fetching the batch on 183.

Is there any reason we can't just perform this logic after 183?

Any reason we can't move this logic after we filter to groups_to_backfill_batch?

group_id_filter = Q(id__lt=last_processed_group_id)

(groups_to_backfill_batch_raw, batch_size) = _make_postgres_call_with_retry(
group_id_filter, project.id, batch_size
Copy link
Member

@wedamija wedamija Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_make_postgres_call_with_retry doesn't work properly afaict. This just constructs the queryset

groups_to_backfill_batch_raw = (
Group.objects.filter(
group_id_filter,
project_id=project_id,
type=ErrorGroupType.type_id,
times_seen__gt=1,
)
.values_list("id", "data", "status", "last_seen")
.order_by("-id")[:batch_size]
)
. But since we don't convert it to a list, or attempt to iterate over it, then it never runs the query.

That means you'll never enter your retry loop here:

except OperationalError:
batch_size = batch_size // 2
try:
logger.info(
"tasks.backfill_seer_grouping_records.postgres_query_retry",
extra={"project_id": project_id, "batch_size": batch_size},
)

You should coerce to a list in the _make_postgres_call. That also saves you the trouble of using .count here since you won't have a queryset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @wedamija! I've opened up a follow up PR to address these

@sentry
Copy link

sentry bot commented Aug 13, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ OperationalError: QueryCanceled('canceling statement due to user request\n') sentry.tasks.backfill_seer_grouping_records View Issue

Did you find this useful? React with a 👍 or 👎

@github-actions github-actions bot locked and limited conversation to collaborators Aug 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants