Single Database instance -- Resume request when client fails fixes #468 #497

ketan96-m · 2024-10-11T18:13:01Z

Resolves #468

Problem
When the client submits the request but encounters client failure, there is no way of resuming the transforms from the last point. The client is forced to rerun the request which triggers the backend again.

Fix
Single Database instance -- Added flag
This PR fixes the broken client problem. The client will be able to resume the request from the last point. Given the client submits the same request again (important since the request hash is calculated and compared with any pending requests)

…s SUBMITTED or COMPLETE

servicex/query_core.py

codecov · 2024-10-12T01:48:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.65%. Comparing base (0224b4f) to head (1dc2035).
Report is 3 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #497      +/-   ##
==========================================
+ Coverage   83.27%   83.65%   +0.37%     
==========================================
  Files          26       26              
  Lines        1429     1462      +33     
==========================================
+ Hits         1190     1223      +33     
  Misses        239      239

Flag	Coverage Δ
unittests	`83.65% <100.00%> (+0.37%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ketan96-m · 2024-10-15T15:36:37Z

@BenGalewsky, @ponyisi, @gordonwatts ready for review!

kyungeonchoi · 2024-10-15T16:55:21Z

Just had a chat with @ketan96-m that we need to check how download behaves when resume (or rerun deliver()) ServiceX request. We need to make sure it recognizes the data_dir of original request and resume download from where it lost connection.

ponyisi

General comments following discussion -

only add to cache when request ID is obtained
do not worry about edge case where two requests of the same hash are made in the same Spec (we could instead raise an error if this happens, but it is almost certainly a user mistake)
ensure that if the transform fails, we wipe the in-progress cache record
things are bit copy-and-paste in a few places, reduce this

Otherwise the overall structure looks fine to me

ponyisi · 2024-10-16T16:43:13Z

Just had a chat with @ketan96-m that we need to check how download behaves when resume (or rerun deliver()) ServiceX request. We need to make sure it recognizes the data_dir of original request and resume download from where it lost connection.

Hi @kyungeonchoi if the user changes the spec of the request and therefore the target file area, I think we should just go ahead and deliver there. I believe with this PR we will just redownload all files from the incomplete requests (regardless of what has already been downloaded) which is a bit brute force but can be improved in the future - I don't think we need to include "avoid re-downloading" yet, since that really will require us to do an integrity check on the files that we have already.

gordonwatts

This looks great. I think my questions are just because I don't understand something deeper going on. Please take a look and then decide!

servicex/query_cache.py

gordonwatts · 2024-10-17T12:03:43Z

It seems like the transform_complete has become a very large method with many many lines. It might be time to add it to the list of things that need to be refactored.

ponyisi

I made a few comments. In principle we could go ahead with this but I think for long-term maintenance a few tweaks would improve things.

servicex/query_cache.py

ponyisi · 2024-10-22T22:08:10Z

servicex/query_cache.py

+        transform = Query()
+        with self.lock:
+            records = self.db.search((transform.hash == hash_value)
+                                     & (transform.status.exists()))


Mild preference to do this check after the query, i.e. look up the request first (because having the request missing should be an error!) and then check if the field is there.

It's also true that you basically never use this function except to check if the result is SUBMITTED. Perhaps better to make the function get_transform_request_submitted and return a bool?

I have changed the function to a bool. However, the request missing shouldn't raise an error since the the request is still not submitted in the case and it should return False rather than a error.

Right, I see. Somewhat unfortunate naming of the function I suppose. It would be good to document this contract (it returns False if the transformation is not in the cache at all) in the docstring, since you rely on this behavior.

servicex/query_cache.py

ponyisi

See one line to update, otherwise I approve

ponyisi · 2024-10-23T21:20:05Z

tests/test_dataset.py

        python_dataset.download_files = AsyncMock()
        python_dataset.download_files.return_value = []
+        python_dataset.cache.update_transform_request_id = Mock()
+        # python_dataset.cache.update_transform_status = Mock()


Remove commented code?

created a flag in db instance to make sure that a transform request i…

51f012b

…s SUBMITTED or COMPLETE

BenGalewsky reviewed Oct 11, 2024

View reviewed changes

servicex/query_core.py Outdated Show resolved Hide resolved

ketan96-m added 4 commits October 11, 2024 20:10

code clean up

ecdff0e

code clean up

0d76192

submit download test cases

4bd1a7e

unit tests

ecd9c30

ketan96-m requested a review from BenGalewsky October 15, 2024 15:31

kyungeonchoi self-requested a review October 15, 2024 16:17

ponyisi reviewed Oct 16, 2024

View reviewed changes

gordonwatts reviewed Oct 17, 2024

View reviewed changes

servicex/query_cache.py Outdated Show resolved Hide resolved

servicex/query_cache.py Outdated Show resolved Hide resolved

servicex/query_cache.py Show resolved Hide resolved

servicex/query_cache.py Show resolved Hide resolved

ketan96-m added 3 commits October 22, 2024 11:55

refactored functions for getting the hash and request id of the requests

08ed537

unit tests

d401095

comments for query cache functions

1dd8a46

ketan96-m requested a review from ponyisi October 22, 2024 20:52

ponyisi reviewed Oct 22, 2024

View reviewed changes

ketan96-m added 2 commits October 23, 2024 12:14

submitted flag conditions

1cc3b38

change in test cases

8398eb6

ketan96-m requested a review from ponyisi October 23, 2024 17:20

ponyisi approved these changes Oct 23, 2024

View reviewed changes

ponyisi mentioned this pull request Oct 23, 2024

Resume request when client fails fixes #468 #494

Closed

ketan96-m added 2 commits October 23, 2024 16:48

comment fix

0e0bcc2

transform request submitted docstring

1dc2035

ketan96-m merged commit 92d1d84 into master Oct 25, 2024
36 checks passed

ketan96-m deleted the cache_single_db branch October 25, 2024 00:25

Single Database instance -- Resume request when client fails fixes #468 #497

Single Database instance -- Resume request when client fails fixes #468 #497

Uh oh!

Conversation

ketan96-m commented Oct 11, 2024

Uh oh!

Uh oh!

codecov bot commented Oct 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ketan96-m commented Oct 15, 2024

Uh oh!

kyungeonchoi commented Oct 15, 2024

Uh oh!

ponyisi left a comment

Choose a reason for hiding this comment

Uh oh!

ponyisi commented Oct 16, 2024

Uh oh!

gordonwatts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gordonwatts commented Oct 17, 2024

Uh oh!

ponyisi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ponyisi Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

ketan96-m Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

ponyisi Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ponyisi left a comment

Choose a reason for hiding this comment

Uh oh!

ponyisi Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

ketan96-m Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov bot commented Oct 12, 2024 •

edited

Loading

ponyisi left a comment •

edited

Loading