-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
ref(ecosystem): refactor link_all_repos to bulk update repositories #95494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look mostly good, just a question on partial failure handling.
| lifecycle.record_halt(str(LinkAllReposHaltReason.RATE_LIMITED)) | ||
| return | ||
|
|
||
| metrics.incr(f"{integration_key}.link_all_repos.api_error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any dashboards we need to update that use this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of any monitoring we have on this, plus we should be using the lifecycle metrics anyway
| repo_config=config, organization=rpc_org | ||
| ) | ||
| success = True | ||
| repo_configs.append(get_repo_config(repo, integration_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏🏼 Much cleaner
| lifecycle.record_halt( | ||
| str(LinkAllReposHaltReason.REPOSITORY_NOT_CREATED), | ||
| {"missing_repos": missing_repos, "integration_id": integration_id}, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any other partial failures cases we need to worry about?
What does integration_repo_provider.create_repositories do if any of these repo creations fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only case is where some fail and some succeed. Previously we would actually mark it as success if any repo was created, but here we only mark it as success if ALL repos are created/updated successfully.
If repo creation fails, we now raise an exception with the failed repos and mark it as a halt
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## master #95494 +/- ##
==========================================
+ Coverage 77.79% 82.71% +4.91%
==========================================
Files 10535 10535
Lines 607265 607335 +70
Branches 23794 23794
==========================================
+ Hits 472431 502333 +29902
+ Misses 134449 104617 -29832
Partials 385 385 |
d77e54a to
57e95a4
Compare
57e95a4 to
c06c9c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Duplicate Halt Events in Repo Linking
The link_all_repos task can record a halt event twice for the same execution. This occurs when create_repositories() raises a RepoExistsError and the missing_repos list is also non-empty. The RepoExistsError handler records a halt but continues execution, allowing a second halt to be recorded by the missing_repos check. This leads to duplicate lifecycle events and inaccurate metrics.
src/sentry/integrations/github/tasks/link_all_repos.py#L92-L107
sentry/src/sentry/integrations/github/tasks/link_all_repos.py
Lines 92 to 107 in c06c9c7
| try: | |
| integration_repo_provider.create_repositories( | |
| configs=repo_configs, organization=rpc_org | |
| ) | |
| except RepoExistsError as e: | |
| lifecycle.record_halt( | |
| str(LinkAllReposHaltReason.REPOSITORY_NOT_CREATED), | |
| {"missing_repos": e.repos, "integration_id": integration_id}, | |
| ) | |
| if missing_repos: | |
| lifecycle.record_halt( | |
| str(LinkAllReposHaltReason.REPOSITORY_NOT_CREATED), | |
| {"missing_repos": missing_repos, "integration_id": integration_id}, | |
| ) |
Bug: Repository Update Fails to Set Provider
The _update_repository method fails to update the provider field for existing repositories. Unlike the previous create_repository logic which explicitly set provider: self.id during updates, the new method only applies fields from RepositoryConfig, which does not include provider. This omission causes inconsistencies and prevents the "migration" of repositories to a new provider.
src/sentry/plugins/providers/integration_repository.py#L206-L212
sentry/src/sentry/plugins/providers/integration_repository.py
Lines 206 to 212 in c06c9c7
| def _update_repository(self, repo: RpcRepository, config: RepositoryConfig): | |
| repo.status = ObjectStatus.ACTIVE | |
| for field_name, field_value in config.items(): | |
| setattr(repo, field_name, field_value) | |
| return repo |
Was this report helpful? Give feedback by reacting with 👍 or 👎
We are hitting the processing deadline of 60 seconds for
link_all_repos. We can improve the performance by making more bulk queries.Changes
RepositoryConfigtypeIntegrationRepositoryProvider.create_repository) with possibly many RPC calls insideUnchanged
Repositoryindividually. This is because we may need to take actions for eachRepositorythat fails creation. Django'sbulk_createwill fail and rollback all changes if there is an error. Alternatively, we can passignore_conflicts=Truetobulk_create, but then we wouldn't be able to see which ones failed.Fixes ECO-801