Skip to content

Conversation

THardy98
Copy link
Contributor

@THardy98 THardy98 commented May 1, 2025

What was changed

Adds support to interrupt heartbeating activities from pause requests.

Why?

Functionally support users pausing their activities via the CLI

  1. Closes Heartbeating activities should be interrupted when the activities are paused. #812

  2. How was this tested:
    Simple integration test

  3. Any docs updates needed?
    Unsure

@THardy98 THardy98 requested a review from a team as a code owner May 1, 2025 07:38
)
if resp_by_id.cancel_requested:
raise AsyncActivityCancelledError()
if resp_by_id.activity_paused:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, looks like we never added "raises" docs to AsyncActivityHandle.heartbeat, wonder if we should now

@THardy98 THardy98 force-pushed the support_activity_pause branch from d5107b0 to 9763505 Compare May 3, 2025 06:35
@THardy98 THardy98 requested a review from cretz May 3, 2025 06:39
@THardy98 THardy98 force-pushed the support_activity_pause branch 2 times, most recently from 5c8299a to 4451e77 Compare May 3, 2025 18:07
Comment on lines 99 to 100
if cancellation_details:
self.cancellation_details.set_details(cancellation_details)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to always be set, even if not provided. May want to change param to be non-option w/ a default of temporalio.activity.ActivityCancellationDetails(). Usually using a default param of a full expression is bad in Python because it isn't called for each call, but since the details are immutable, sharing the default for everyone is safe.

Copy link
Contributor Author

@THardy98 THardy98 May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to always be set, even if not provided.

Why?

May want to change param to be non-option w/ a default of temporalio.activity.ActivityCancellationDetails()

Sort of defeats the purpose of being able to return None without checking all fields are False, in which case, maybe the holder indirection isn't so useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic needs to be "you can guarantee activity.cancellation_details() is not None when the activity is being canceled", but this breaks that guarantee.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolving, I think this is still an issue. We should always set an instance of this on cancel in test environment. So the parameter should default to an empty instance with no optional accepted.

Comment on lines 7438 to 7648
except (CancelledError, asyncio.CancelledError):
return activity.cancellation_details()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need some tests to confirm what happens when this is not caught

Copy link
Contributor Author

@THardy98 THardy98 May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for pausing/unpausing activities, we don't catch cancel errors.

Added the unpause activity method to the bridge client - it was missing.

I made it such that a cancelled error on activity pause returns an activity task failure w/ application failure:

  • When going through the default failure converter, a canceled error populates cancel failure info, but core expects application failure info because we are populating the failed field of the completion instead of cancel.
  • A bit of a weird situation, because we can't populate cancel for the reason you mentioned above (we tell server to cancel and server complains that a cancel hasn't been requested yet - like you mention above)

I can revert this change if you prefer, I don't think it causes any notable server effect anyway (state of paused activities are suspended server-side, iiuc), but probably a good thing to have failures sent properly just as a general rule

@THardy98 THardy98 requested a review from cretz May 6, 2025 18:00
@THardy98 THardy98 force-pushed the support_activity_pause branch 2 times, most recently from c5ca36e to c3634fe Compare May 6, 2025 20:58
@THardy98 THardy98 force-pushed the support_activity_pause branch 2 times, most recently from 4977651 to e055766 Compare May 16, 2025 16:32
@THardy98 THardy98 requested a review from cretz May 16, 2025 17:27
@THardy98 THardy98 force-pushed the support_activity_pause branch from 8257841 to ab65714 Compare May 16, 2025 22:18
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only minor blocking concern is to see if we can get rid of required sleeps in tests

)
if resp_by_id.cancel_requested:
if resp_by_id.cancel_requested or resp_by_id.activity_paused:
raise AsyncActivityCancelledError()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this in a separate issue if you want, but I think we need to put cancellation details on this exception now that there is more than one reason why it may get canceled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Comment on lines 7807 to 7809
# Wait for next heartbeat to propagate the cancellation. Unpausing before the heartbeat
# will show activity as unpaused to core. Consequently, it will *not* issue an activity cancel.
time.sleep(0.3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to get rid of potentially flaky sleeps and wait for situations that are needed? For instance in temporalio/sdk-dotnet#482 I used the presence of the final heartbeat details to confirm activity completed (kinda).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with a helper that waits for the next heartbeat to elapse

@THardy98 THardy98 requested a review from cretz May 27, 2025 22:38
@THardy98 THardy98 force-pushed the support_activity_pause branch from 7977621 to 56913c4 Compare May 27, 2025 22:46
raise AssertionError(
f"Activity with ID {activity_id} not found in pending activities"
)
if current_info.last_heartbeat_time == initial_heartbeat_time:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we did in .NET was just check the final heartbeat details (i.e. we call heartbeat in a finally with certain details). I wonder if this is technically racy because what if the final heartbeat happened and the activity returned before wait_for_next_heartbeat_cycle was called? Also can move this logic to inside the test it's used in since it's not a general helper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the race condition you propose is possible, in which case, we'd be stuck on the assertion because the activity wouldn't emit a subsequent heartbeat (the activity is paused so we wouldn't get a heartbeat from a retry).

I've changed it to just check if heartbeat details exists (not the exact details payload). This is enough to verify that the activity has indeed finished executing and is ready to be unpaused.

@THardy98 THardy98 requested a review from cretz May 29, 2025 12:06
@THardy98
Copy link
Contributor Author

thanks for the reviews :)

@THardy98 THardy98 merged commit 27cc67f into main May 29, 2025
18 checks passed
@THardy98 THardy98 deleted the support_activity_pause branch May 29, 2025 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Heartbeating activities should be interrupted when the activities are paused.

3 participants