Skip to content

Conversation

@iritkatriel
Copy link
Member

This emits "opcode, oparg, 0, 0" for each instruction.

Still debugging some test failures related to line numbers/tracing etc. But this works well enough to benchmark with pyperformance:

+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| Benchmark               | /home/benchmarking/BENCH/REQUESTS/req-compile-bench-1670439089-iritkatriel-linux/pyperformance-results.json.gz | /home/benchmarking/BENCH/REQUESTS/req-compile-bench-1670428040-iritkatriel-linux/pyperformance-results.json.gz |
+=========================+================================================================================================================+================================================================================================================+
| 2to3                    | 247 ms                                                                                                         | 255 ms: 1.03x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| async_generators        | 356 ms                                                                                                         | 360 ms: 1.01x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| async_tree_none         | 533 ms                                                                                                         | 541 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| async_tree_cpu_io_mixed | 741 ms                                                                                                         | 762 ms: 1.03x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| async_tree_io           | 1.33 sec                                                                                                       | 1.34 sec: 1.01x slower                                                                                         |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| async_tree_memoization  | 636 ms                                                                                                         | 677 ms: 1.06x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| chameleon               | 6.57 ms                                                                                                        | 6.30 ms: 1.04x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| chaos                   | 67.3 ms                                                                                                        | 69.4 ms: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| bench_thread_pool       | 769 us                                                                                                         | 785 us: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| coroutines              | 25.2 ms                                                                                                        | 25.9 ms: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| crypto_pyaes            | 77.0 ms                                                                                                        | 74.9 ms: 1.03x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| deepcopy                | 329 us                                                                                                         | 335 us: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| deepcopy_reduce         | 2.86 us                                                                                                        | 2.95 us: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| deepcopy_memo           | 34.3 us                                                                                                        | 34.9 us: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| deltablue               | 3.24 ms                                                                                                        | 3.44 ms: 1.06x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| django_template         | 32.7 ms                                                                                                        | 33.3 ms: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| docutils                | 2.49 sec                                                                                                       | 2.52 sec: 1.01x slower                                                                                         |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| dulwich_log             | 61.0 ms                                                                                                        | 61.9 ms: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| fannkuch                | 380 ms                                                                                                         | 387 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| float                   | 72.8 ms                                                                                                        | 76.6 ms: 1.05x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| genshi_text             | 20.6 ms                                                                                                        | 20.7 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| genshi_xml              | 47.9 ms                                                                                                        | 47.4 ms: 1.01x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| go                      | 137 ms                                                                                                         | 143 ms: 1.05x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| hexiom                  | 6.11 ms                                                                                                        | 6.35 ms: 1.04x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| html5lib                | 59.0 ms                                                                                                        | 62.1 ms: 1.05x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| json_dumps              | 9.29 ms                                                                                                        | 9.34 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| logging_format          | 6.27 us                                                                                                        | 6.43 us: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| logging_silent          | 91.6 ns                                                                                                        | 94.8 ns: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| logging_simple          | 5.71 us                                                                                                        | 5.81 us: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| mako                    | 9.73 ms                                                                                                        | 9.62 ms: 1.01x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| mdp                     | 2.51 sec                                                                                                       | 2.59 sec: 1.03x slower                                                                                         |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| nbody                   | 94.3 ms                                                                                                        | 90.2 ms: 1.05x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| nqueens                 | 83.3 ms                                                                                                        | 81.1 ms: 1.03x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pickle                  | 10.1 us                                                                                                        | 10.2 us: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pickle_dict             | 30.9 us                                                                                                        | 31.1 us: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pickle_list             | 4.16 us                                                                                                        | 4.06 us: 1.02x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pickle_pure_python      | 280 us                                                                                                         | 290 us: 1.04x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pycparser               | 1.13 sec                                                                                                       | 1.12 sec: 1.02x faster                                                                                         |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pyflate                 | 405 ms                                                                                                         | 425 ms: 1.05x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| python_startup          | 8.56 ms                                                                                                        | 8.59 ms: 1.00x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| python_startup_no_site  | 6.28 ms                                                                                                        | 6.31 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| raytrace                | 278 ms                                                                                                         | 284 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| regex_compile           | 130 ms                                                                                                         | 133 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| regex_dna               | 206 ms                                                                                                         | 202 ms: 1.02x faster                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| regex_effbot            | 3.76 ms                                                                                                        | 3.62 ms: 1.04x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| regex_v8                | 22.2 ms                                                                                                        | 21.9 ms: 1.02x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| richards                | 42.3 ms                                                                                                        | 43.4 ms: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| scimark_fft             | 315 ms                                                                                                         | 310 ms: 1.02x faster                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| scimark_lu              | 106 ms                                                                                                         | 109 ms: 1.03x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| scimark_monte_carlo     | 68.3 ms                                                                                                        | 69.2 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| scimark_sor             | 105 ms                                                                                                         | 119 ms: 1.13x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| scimark_sparse_mat_mult | 4.24 ms                                                                                                        | 3.99 ms: 1.06x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| spectral_norm           | 99.4 ms                                                                                                        | 95.8 ms: 1.04x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sqlglot_parse           | 1.34 ms                                                                                                        | 1.36 ms: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sqlglot_transpile       | 1.63 ms                                                                                                        | 1.65 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sqlglot_optimize        | 50.9 ms                                                                                                        | 51.3 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sqlglot_normalize       | 105 ms                                                                                                         | 106 ms: 1.01x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sqlite_synth            | 2.59 us                                                                                                        | 2.64 us: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sympy_expand            | 454 ms                                                                                                         | 463 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sympy_integrate         | 20.4 ms                                                                                                        | 20.9 ms: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sympy_sum               | 163 ms                                                                                                         | 165 ms: 1.01x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sympy_str               | 281 ms                                                                                                         | 287 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| telco                   | 6.32 ms                                                                                                        | 6.58 ms: 1.04x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| thrift                  | 763 us                                                                                                         | 750 us: 1.02x faster                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| unpack_sequence         | 42.1 ns                                                                                                        | 43.8 ns: 1.04x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| unpickle_list           | 4.93 us                                                                                                        | 4.98 us: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| unpickle_pure_python    | 202 us                                                                                                         | 214 us: 1.06x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| xml_etree_iterparse     | 106 ms                                                                                                         | 103 ms: 1.03x faster                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| xml_etree_generate      | 76.7 ms                                                                                                        | 77.2 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| xml_etree_process       | 53.1 ms                                                                                                        | 53.8 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| Geometric mean          | (ref)                                                                                                          | 1.01x slower                                                                                                   |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+

Benchmark hidden because not significant (13): bench_mp_pool, coverage, generators, json, json_loads, meteor_contest, mypy, pathlib, pidigits, pprint_safe_repr, pprint_pformat, unpickle, xml_etree_parse
Ignored benchmarks (3) of /home/benchmarking/BENCH/REQUESTS/req-compile-bench-1670428040-iritkatriel-linux/pyperformance-results.json.gz: aiohttp, gunicorn, tornado_http

@netlify
Copy link

netlify bot commented Dec 8, 2022

Deploy Preview for python-cpython-preview canceled.

Name Link
🔨 Latest commit 414665b
🔍 Latest deploy log https://app.netlify.com/sites/python-cpython-preview/deploys/63970f4f73026d0008111626

@iritkatriel iritkatriel marked this pull request as draft December 8, 2022 10:45
Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool work. So the doubling of the instruction size only costs us 1%. That means if we can realize the removal of LOAD/STORE_FAST and LOAD_CONST we should be able to gain quite a bit.

Do you envision we could do a gradual transition to the register world, where some instructions use registers and others still use the stack?

@iritkatriel
Copy link
Member Author

Do you envision we could do a gradual transition to the register world, where some instructions use registers and others still use the stack?

I think so. A register can be an index into the stack, and some opcodes can just push and pop as before. This makes the transition incremental.

@gvanrossum
Copy link
Member

I think so. A register can be an index into the stack, and some opcodes can just push and pop as before. This makes the transition incremental.

Sounds good. Maybe we should add that to faster-cpython/ideas#485 (or one of the other issues about registers?)

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to start making one simple instruction use an extra oparg? Without even optimizing LOAD/STORE -- we could just tackle UNARY_NEGATIVE and give it a second oparg that designates the destination, and make the compiler write the bytecode like that.

#define NB_INPLACE_XOR 25

/* number of codewords for opcode+oparg(s) */
#define OPSIZE 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for now we're not contemplating the size depending on the opcode. Probably just as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it won’t be hard to change this macro if we decide to do that.

@iritkatriel
Copy link
Member Author

I made a new PR with this stuff on today's version of main: #100276.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants