-
Notifications
You must be signed in to change notification settings - Fork 31
Improve error message for when a public resource is unavailable #388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I just had one question on resource failure and if it makes sense to add functionality to load from gnomAD if a resource is not found. That is likely another piece of work/PR though.
| isinstance(resource_source, GnomadPublicResourceSource) | ||
| and resource_source != GnomadPublicResourceSource.GNOMAD | ||
| ): | ||
| if not self.is_resource_available(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it makes sense to check if the gnomAD resource exists here( given how occasionally the repo lags behind buckets)? I'm wondering if it makes sense to add functionality to load from the gnomAD source on failure? Or if that gets more complicated as we add other cloud providers public buckets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was assuming that resources will always be present in the gnomAD bucket, since they are copied from there to the cloud provider buckets.
I did not consider the case where an older version of this code points to a resource that no longer exists in the gnomAD buckets. Ideally, those URLs would not change but, practically, they do. That's a good point. We could catch that and suggest users update to the most recent version of gnomad_methods.
I'll claim that we should not automatically load from the gnomAD source if the resource doesn't exist. Most Hail resources are in the requester pays bucket, so if a user has configured their pipeline to use one of the cloud provider buckets, automatically falling back to the gnomAD buckets could result in unexpected egress/operation costs for that user. Also, they may not have Hail configured to allow requester pays access. I think the best we can do is point them towards the solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not think of the cost associated with the requester-pays bucket, it absolutely makes sense to not automatically load the gnomAD source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #390 for this.
Some resources may not be available from all sources due to delays in syncing, etc. In those cases, it would be preferable to provide a helpful error message (directing the user to try reading the resource from the gnomAD buckets) instead of simply letting the read fail.
This wraps all methods of GnomadPublicResource subclasses that read the resource (GnomadPublicTableResource.ht, GnomadPublicMatrixTableResource.mt, etc.) and adds a check for if the resource is available from the selected source. If the resource is not available, an exception with a helpful message is thrown.