Skip to content

Conversation

@ketan96-m
Copy link
Collaborator

Resolves #468

Problem
When the client submits the request but encounters client failure, there is no way of resuming the transforms from the last point. The client is forced to rerun the request which triggers the backend again.

Fix
Single Database instance -- Added flag
This PR fixes the broken client problem. The client will be able to resume the request from the last point. Given the client submits the same request again (important since the request hash is calculated and compared with any pending requests)

@codecov
Copy link

codecov bot commented Oct 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.65%. Comparing base (0224b4f) to head (1dc2035).
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #497      +/-   ##
==========================================
+ Coverage   83.27%   83.65%   +0.37%     
==========================================
  Files          26       26              
  Lines        1429     1462      +33     
==========================================
+ Hits         1190     1223      +33     
  Misses        239      239              
Flag Coverage Δ
unittests 83.65% <100.00%> (+0.37%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ketan96-m ketan96-m requested a review from BenGalewsky October 15, 2024 15:31
@ketan96-m
Copy link
Collaborator Author

@BenGalewsky, @ponyisi, @gordonwatts ready for review!

@kyungeonchoi kyungeonchoi self-requested a review October 15, 2024 16:17
@kyungeonchoi
Copy link
Contributor

Just had a chat with @ketan96-m that we need to check how download behaves when resume (or rerun deliver()) ServiceX request. We need to make sure it recognizes the data_dir of original request and resume download from where it lost connection.

Copy link
Collaborator

@ponyisi ponyisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comments following discussion -

  • only add to cache when request ID is obtained
  • do not worry about edge case where two requests of the same hash are made in the same Spec (we could instead raise an error if this happens, but it is almost certainly a user mistake)
  • ensure that if the transform fails, we wipe the in-progress cache record
  • things are bit copy-and-paste in a few places, reduce this

Otherwise the overall structure looks fine to me

@ponyisi
Copy link
Collaborator

ponyisi commented Oct 16, 2024

Just had a chat with @ketan96-m that we need to check how download behaves when resume (or rerun deliver()) ServiceX request. We need to make sure it recognizes the data_dir of original request and resume download from where it lost connection.

Hi @kyungeonchoi if the user changes the spec of the request and therefore the target file area, I think we should just go ahead and deliver there. I believe with this PR we will just redownload all files from the incomplete requests (regardless of what has already been downloaded) which is a bit brute force but can be improved in the future - I don't think we need to include "avoid re-downloading" yet, since that really will require us to do an integrity check on the files that we have already.

Copy link
Collaborator

@gordonwatts gordonwatts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I think my questions are just because I don't understand something deeper going on. Please take a look and then decide!

@gordonwatts
Copy link
Collaborator

It seems like the transform_complete has become a very large method with many many lines. It might be time to add it to the list of things that need to be refactored.

@ketan96-m ketan96-m requested a review from ponyisi October 22, 2024 20:52
Copy link
Collaborator

@ponyisi ponyisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few comments. In principle we could go ahead with this but I think for long-term maintenance a few tweaks would improve things.

transform = Query()
with self.lock:
records = self.db.search((transform.hash == hash_value)
& (transform.status.exists()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mild preference to do this check after the query, i.e. look up the request first (because having the request missing should be an error!) and then check if the field is there.

It's also true that you basically never use this function except to check if the result is SUBMITTED. Perhaps better to make the function get_transform_request_submitted and return a bool?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the function to a bool. However, the request missing shouldn't raise an error since the the request is still not submitted in the case and it should return False rather than a error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I see. Somewhat unfortunate naming of the function I suppose. It would be good to document this contract (it returns False if the transformation is not in the cache at all) in the docstring, since you rely on this behavior.

@ketan96-m ketan96-m requested a review from ponyisi October 23, 2024 17:20
Copy link
Collaborator

@ponyisi ponyisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See one line to update, otherwise I approve

python_dataset.download_files = AsyncMock()
python_dataset.download_files.return_value = []
python_dataset.cache.update_transform_request_id = Mock()
# python_dataset.cache.update_transform_status = Mock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@ketan96-m ketan96-m merged commit 92d1d84 into master Oct 25, 2024
36 checks passed
@ketan96-m ketan96-m deleted the cache_single_db branch October 25, 2024 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Frontend crashes while working transform is in progress are not saved

6 participants