Skip to content

Python: [Bug] Workflow resumes from latest checkpoint but re-runs first executor #1695

@droideronline

Description

@droideronline

I’m encountering an issue when resuming a workflow from the latest checkpoint.

Steps to reproduce

  1. I create a workflow with 4 function executors.

  2. I pass a checkpoint storage during the workflow build.

  3. After all four executors finish, four checkpoints are created as expected.

  4. I then retrieve the checkpoints, sort them by timestamp, and resume from the latest one using:

    checkpoints = await checkpoint_storage.list_checkpoints()
    if checkpoints:
        latest = max(checkpoints, key=lambda cp: cp.timestamp)
        logger.info(f"Resuming from: {latest.checkpoint_id}")
        await workflow.run_from_checkpoint(latest.checkpoint_id)
  5. When resuming from the latest checkpoint, I expect no executors to run, since all have already completed.

Observed behavior

  • The latest checkpoint loads successfully.
  • However, the workflow unexpectedly starts executing the first executor again, even though all executors were completed before the checkpoint.

Expected behavior

  • When resuming from the latest checkpoint (where all executors are done), the workflow should detect that no remaining nodes need to run and should simply finish.

Possible cause

It seems there might be a bug in determining the remaining nodes when resuming from a checkpoint — possibly in the logic that identifies which parts of the graph still need execution.

Request

Please check the code paths responsible for finding remaining nodes when restoring from a checkpoint.

Metadata

Metadata

Assignees

Labels

pythonworkflowsRelated to Workflows in agent-framework

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions