Skip to content

Conversation

cretz
Copy link
Member

@cretz cretz commented Nov 1, 2022

What was changed

  • Added scripts/run_bench.py and a GH workflow that runs it nightly or can be manually triggered
  • Updated README to clarify the cost of third-party module imports
  • Fixed issue for workflows defined in __main__ module

Results

(if not wanting horizontal scrollbars on table, open dev console and remove overflow CSS property for .markdown-body table class...I don't feel like reworking the tables)

Sandboxed

workflow_count sandbox max_cached_workflows max_concurrent max_mem_mib start_seconds result_seconds workflows_per_second os
100 true 100 100 64.4 0.2 3.3 29.9 linux
100 true 100 100 60.4 0.3 2.7 37.2 linux
100 true 100 100 60.5 0.2 5.2 19.1 linux
1000 true 1000 1000 139.4 2.4 11.7 85.6 linux
1000 true 1000 1000 135.2 2.4 11.7 85.3 linux
1000 true 1000 1000 137.5 2.4 17.9 55.9 linux
1000 true 100 100 91.8 2.4 15.8 63.4 linux
1000 true 100 100 92 2.4 19.5 51.4 linux
1000 true 100 100 91.3 2.4 19.5 51.2 linux
10000 true 10000 10000 894 23.5 130.2 76.8 linux
10000 true 10000 10000 892.4 23.7 136.1 73.5 linux
10000 true 1000 1000 231.6 23.6 125.4 79.7 linux
10000 true 1000 1000 229.8 25.3 117.6 85 linux
100 true 100 100 54.3 0.2 2.9 34.4 windows
100 true 100 100 51.2 0.4 3.6 27.9 windows
100 true 100 100 51.2 0.3 4.9 20.5 windows
1000 true 1000 1000 133.6 2.4 19.6 51 windows
1000 true 1000 1000 131.8 2.4 17.6 56.7 windows
1000 true 1000 1000 132.9 2.4 13.5 74 windows
1000 true 100 100 81.3 2.4 17.2 58.1 windows
1000 true 100 100 77.7 2.4 14 71.7 windows
1000 true 100 100 78 2.4 17.9 56 windows
10000 true 10000 10000 927.2 23.4 143.8 69.6 windows
10000 true 10000 10000 928.2 23.4 157 63.7 windows
10000 true 1000 1000 220.8 23.8 135.8 73.6 windows
10000 true 1000 1000 220.2 23.6 131.1 76.3 windows

Unsandboxed

workflow_count sandbox max_cached_workflows max_concurrent max_mem_mib start_seconds result_seconds workflows_per_second os
100 false 100 100 63.3 0.3 4.7 21.5 linux
100 false 100 100 58.3 0.2 3.3 30.5 linux
100 false 100 100 60.1 0.3 2.8 36.3 linux
1000 false 1000 1000 116.2 2.5 13.6 73.6 linux
1000 false 1000 1000 114.9 2.5 13.8 72.6 linux
1000 false 1000 1000 113.2 2.5 11.6 86 linux
1000 false 100 100 76.9 2.5 19.9 50.4 linux
1000 false 100 100 78.2 2.5 19.8 50.5 linux
1000 false 100 100 71.6 2.5 14 71.5 linux
10000 false 10000 10000 678.1 24.7 110.2 90.8 linux
10000 false 10000 10000 678.1 23.9 137.1 72.9 linux
10000 false 1000 1000 207 23.8 119.9 83.4 linux
10000 false 1000 1000 210.2 23.7 112.8 88.7 linux
100 false 100 100 51.6 0.3 6.2 16 windows
100 false 100 100 48.5 0.3 3.2 30.8 windows
100 false 100 100 48.5 0.3 3.2 31.5 windows
1000 false 1000 1000 108.5 3.4 19 52.8 windows
1000 false 1000 1000 108.5 3.3 16.7 59.9 windows
1000 false 1000 1000 109 3.3 18.9 52.9 windows
1000 false 100 100 65.1 3.4 22.2 45.1 windows
1000 false 100 100 60.4 3.3 19.8 50.6 windows
1000 false 100 100 61.7 3.5 20 50.1 windows
10000 false 10000 10000 694.7 33.9 179.3 55.8 windows
10000 false 10000 10000 693.3 32.5 176.3 56.7 windows
10000 false 1000 1000 195.3 34.6 176.1 56.8 windows
10000 false 1000 1000 199.2 34.3 185.4 53.9 windows

Notes

Notes:

  • The workflow tested is a simple workflow that accepts a string, invokes an activity w/ said string, and relays back the activity's response as its own
  • The linux runner is our 4-core org-level one and the windows runner is the GH-provided one
  • max_concurrent up there applies to both max_concurrent_workflow_tasks and max_concurrent_activities (set as the same number for now)
  • Due to the nature of Python, single-worker/process benchmarks won't show the true power of the system. Python is inherently single-threaded on CPU-bound tasks such as these. It is entirely likely performance scales up linearly proportional to worker process count.
  • Much of the larger tests fought Temporalite for resources.
  • Note how the workflows-per-second number varies wildly on some scenarios. This may be a product of Temporalite running alongside and its performance unpredictability.
  • Lots of linux warnings and errors happen when stopping Temporalite via the Rust ephemeral server shutdown. We are probably not doing this right.
  • Adding an import for even a single third party library tripled or more the memory usage. This is because each import is reloaded and isolated per workflow. README updated to discourage workflows from importing non-passthrough, non-standard-library modules from the same file the workflow is defined in. This won't show up much for small workflow counts.
  • The workflows-per-second times above include the amount of time taken for a client waiting on its response to fetch from server (and convert the response, etc).
  • By default, max_cached_workflows is 1000, max_concurrent_workflow_tasks is 100, and max_concurrent_activities is 100. In light of these numbers above, should we increase those? Also note wrt activities it affects sync activities too.

Things that could be added but weren't:

  • Measurement of CPU
  • Multiple workers in separate processes
  • Better result output for trending

The goal of this project was to just ensure the SDK was good enough performance wise. We'll need to spend more time optimizing.

Checklist

  1. Closes Stress tests #23

@cretz cretz force-pushed the benchmarks branch 9 times, most recently from a490673 to 4e63da4 Compare November 1, 2022 20:03
@cretz cretz requested a review from a team November 1, 2022 20:43
@cretz cretz marked this pull request as ready for review November 1, 2022 20:46
try:
yield None
finally:
report_mem_task.cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth waiting for the task to be cancelled here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just interrupts a sleep, no value I don't think, but I guess I could

@cretz cretz merged commit 656b77b into temporalio:main Nov 1, 2022
@cretz cretz deleted the benchmarks branch November 1, 2022 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stress tests

2 participants