Add simple benchmark script #180

cretz · 2022-11-01T17:22:47Z

What was changed

Added scripts/run_bench.py and a GH workflow that runs it nightly or can be manually triggered
Updated README to clarify the cost of third-party module imports
Fixed issue for workflows defined in __main__ module

Results

(if not wanting horizontal scrollbars on table, open dev console and remove overflow CSS property for .markdown-body table class...I don't feel like reworking the tables)

Sandboxed

workflow_count	sandbox	max_cached_workflows	max_concurrent	max_mem_mib	start_seconds	result_seconds	workflows_per_second	os
100	true	100	100	64.4	0.2	3.3	29.9	linux
100	true	100	100	60.4	0.3	2.7	37.2	linux
100	true	100	100	60.5	0.2	5.2	19.1	linux
1000	true	1000	1000	139.4	2.4	11.7	85.6	linux
1000	true	1000	1000	135.2	2.4	11.7	85.3	linux
1000	true	1000	1000	137.5	2.4	17.9	55.9	linux
1000	true	100	100	91.8	2.4	15.8	63.4	linux
1000	true	100	100	92	2.4	19.5	51.4	linux
1000	true	100	100	91.3	2.4	19.5	51.2	linux
10000	true	10000	10000	894	23.5	130.2	76.8	linux
10000	true	10000	10000	892.4	23.7	136.1	73.5	linux
10000	true	1000	1000	231.6	23.6	125.4	79.7	linux
10000	true	1000	1000	229.8	25.3	117.6	85	linux
100	true	100	100	54.3	0.2	2.9	34.4	windows
100	true	100	100	51.2	0.4	3.6	27.9	windows
100	true	100	100	51.2	0.3	4.9	20.5	windows
1000	true	1000	1000	133.6	2.4	19.6	51	windows
1000	true	1000	1000	131.8	2.4	17.6	56.7	windows
1000	true	1000	1000	132.9	2.4	13.5	74	windows
1000	true	100	100	81.3	2.4	17.2	58.1	windows
1000	true	100	100	77.7	2.4	14	71.7	windows
1000	true	100	100	78	2.4	17.9	56	windows
10000	true	10000	10000	927.2	23.4	143.8	69.6	windows
10000	true	10000	10000	928.2	23.4	157	63.7	windows
10000	true	1000	1000	220.8	23.8	135.8	73.6	windows
10000	true	1000	1000	220.2	23.6	131.1	76.3	windows

Unsandboxed

workflow_count	sandbox	max_cached_workflows	max_concurrent	max_mem_mib	start_seconds	result_seconds	workflows_per_second	os
100	false	100	100	63.3	0.3	4.7	21.5	linux
100	false	100	100	58.3	0.2	3.3	30.5	linux
100	false	100	100	60.1	0.3	2.8	36.3	linux
1000	false	1000	1000	116.2	2.5	13.6	73.6	linux
1000	false	1000	1000	114.9	2.5	13.8	72.6	linux
1000	false	1000	1000	113.2	2.5	11.6	86	linux
1000	false	100	100	76.9	2.5	19.9	50.4	linux
1000	false	100	100	78.2	2.5	19.8	50.5	linux
1000	false	100	100	71.6	2.5	14	71.5	linux
10000	false	10000	10000	678.1	24.7	110.2	90.8	linux
10000	false	10000	10000	678.1	23.9	137.1	72.9	linux
10000	false	1000	1000	207	23.8	119.9	83.4	linux
10000	false	1000	1000	210.2	23.7	112.8	88.7	linux
100	false	100	100	51.6	0.3	6.2	16	windows
100	false	100	100	48.5	0.3	3.2	30.8	windows
100	false	100	100	48.5	0.3	3.2	31.5	windows
1000	false	1000	1000	108.5	3.4	19	52.8	windows
1000	false	1000	1000	108.5	3.3	16.7	59.9	windows
1000	false	1000	1000	109	3.3	18.9	52.9	windows
1000	false	100	100	65.1	3.4	22.2	45.1	windows
1000	false	100	100	60.4	3.3	19.8	50.6	windows
1000	false	100	100	61.7	3.5	20	50.1	windows
10000	false	10000	10000	694.7	33.9	179.3	55.8	windows
10000	false	10000	10000	693.3	32.5	176.3	56.7	windows
10000	false	1000	1000	195.3	34.6	176.1	56.8	windows
10000	false	1000	1000	199.2	34.3	185.4	53.9	windows

Notes

Notes:

The workflow tested is a simple workflow that accepts a string, invokes an activity w/ said string, and relays back the activity's response as its own
The linux runner is our 4-core org-level one and the windows runner is the GH-provided one
max_concurrent up there applies to both max_concurrent_workflow_tasks and max_concurrent_activities (set as the same number for now)
Due to the nature of Python, single-worker/process benchmarks won't show the true power of the system. Python is inherently single-threaded on CPU-bound tasks such as these. It is entirely likely performance scales up linearly proportional to worker process count.
Much of the larger tests fought Temporalite for resources.
Note how the workflows-per-second number varies wildly on some scenarios. This may be a product of Temporalite running alongside and its performance unpredictability.
Lots of linux warnings and errors happen when stopping Temporalite via the Rust ephemeral server shutdown. We are probably not doing this right.
Adding an import for even a single third party library tripled or more the memory usage. This is because each import is reloaded and isolated per workflow. README updated to discourage workflows from importing non-passthrough, non-standard-library modules from the same file the workflow is defined in. This won't show up much for small workflow counts.
The workflows-per-second times above include the amount of time taken for a client waiting on its response to fetch from server (and convert the response, etc).
By default, max_cached_workflows is 1000, max_concurrent_workflow_tasks is 100, and max_concurrent_activities is 100. In light of these numbers above, should we increase those? Also note wrt activities it affects sync activities too.

Things that could be added but weren't:

Measurement of CPU
Multiple workers in separate processes
Better result output for trending

The goal of this project was to just ensure the SDK was good enough performance wise. We'll need to spend more time optimizing.

Checklist

Closes Stress tests #23

bergundy · 2022-11-01T21:17:30Z

scripts/run_bench.py

+        try:
+            yield None
+        finally:
+            report_mem_task.cancel()


Is it worth waiting for the task to be cancelled here?

Just interrupts a sleep, no value I don't think, but I guess I could

cretz force-pushed the benchmarks branch 9 times, most recently from a490673 to 4e63da4 Compare November 1, 2022 20:03

Add simple benchmark script

9a9fc81

cretz force-pushed the benchmarks branch from 4e63da4 to 9a9fc81 Compare November 1, 2022 20:40

cretz requested a review from a team November 1, 2022 20:43

cretz marked this pull request as ready for review November 1, 2022 20:46

bergundy reviewed Nov 1, 2022

View reviewed changes

bergundy approved these changes Nov 1, 2022

View reviewed changes

Merge branch 'main' into benchmarks

7d5f038

cretz merged commit 656b77b into temporalio:main Nov 1, 2022

cretz deleted the benchmarks branch November 1, 2022 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add simple benchmark script #180

Add simple benchmark script #180

Uh oh!

cretz commented Nov 1, 2022 •

edited

Loading

Uh oh!

bergundy Nov 1, 2022

Uh oh!

cretz Nov 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add simple benchmark script #180

Add simple benchmark script #180

Uh oh!

Conversation

cretz commented Nov 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was changed

Results

Sandboxed

Unsandboxed

Notes

Checklist

Uh oh!

bergundy Nov 1, 2022

Choose a reason for hiding this comment

Uh oh!

cretz Nov 1, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cretz commented Nov 1, 2022 •

edited

Loading