-
Notifications
You must be signed in to change notification settings - Fork 641
Labels
Description
I’m encountering an issue when resuming a workflow from the latest checkpoint.
Steps to reproduce
-
I create a workflow with 4 function executors.
-
I pass a checkpoint storage during the workflow build.
-
After all four executors finish, four checkpoints are created as expected.
-
I then retrieve the checkpoints, sort them by timestamp, and resume from the latest one using:
checkpoints = await checkpoint_storage.list_checkpoints() if checkpoints: latest = max(checkpoints, key=lambda cp: cp.timestamp) logger.info(f"Resuming from: {latest.checkpoint_id}") await workflow.run_from_checkpoint(latest.checkpoint_id)
-
When resuming from the latest checkpoint, I expect no executors to run, since all have already completed.
Observed behavior
- The latest checkpoint loads successfully.
- However, the workflow unexpectedly starts executing the first executor again, even though all executors were completed before the checkpoint.
Expected behavior
- When resuming from the latest checkpoint (where all executors are done), the workflow should detect that no remaining nodes need to run and should simply finish.
Possible cause
It seems there might be a bug in determining the remaining nodes when resuming from a checkpoint — possibly in the logic that identifies which parts of the graph still need execution.
Request
Please check the code paths responsible for finding remaining nodes when restoring from a checkpoint.