Skip to content

Performance regressions porting Jetscii from inline assembly to intrinsics #401

@shepmaster

Description

@shepmaster

I ported Jetscii to use stdsimd with the belief that it will be stabilized sooner 😜.

There's a stdsimd branch in case you are interested in following along at home.

The initial port is roughly 60% of the original speed:

 name                                      inline-asm ns/iter     intrinsics ns/iter     diff ns/iter  diff %  speedup
 bench::space_asciichars                   1,023,795 (5121 MB/s)  1,643,905 (3189 MB/s)       620,110  60.57%   x 0.62
 bench::space_asciichars_as_pattern        1,044,517 (5019 MB/s)  1,716,374 (3054 MB/s)       671,857  64.32%   x 0.61
 bench::space_asciichars_macro             993,105 (5279 MB/s)    1,658,466 (3161 MB/s)       665,361  67.00%   x 0.60
 bench::space_find_byte                    3,610,758 (1452 MB/s)  3,526,808 (1486 MB/s)       -83,950  -2.32%   x 1.02
 bench::space_find_char                    633,608 (8274 MB/s)    636,607 (8235 MB/s)           2,999   0.47%   x 1.00
 bench::space_find_char_set                10,600,525 (494 MB/s)  10,561,106 (496 MB/s)       -39,419  -0.37%   x 1.00
 bench::space_find_closure                 10,156,759 (516 MB/s)  10,072,882 (520 MB/s)       -83,877  -0.83%   x 1.01
 bench::space_find_string                  7,506,830 (698 MB/s)   7,507,111 (698 MB/s)            281   0.00%   x 1.00
 bench::substring_as_pattern               1,082,652 (4842 MB/s)  1,496,699 (3502 MB/s)       414,047  38.24%   x 0.72
 bench::substring_find                     1,670,638 (3138 MB/s)  1,687,034 (3107 MB/s)        16,396   0.98%   x 0.99
 bench::substring_with_cached_searcher     997,570 (5255 MB/s)    1,520,424 (3448 MB/s)       522,854  52.41%   x 0.66
 bench::substring_with_created_searcher    1,007,291 (5204 MB/s)  1,533,745 (3418 MB/s)       526,454  52.26%   x 0.66
 bench::xml_delim_3_asciichars             1,014,110 (5169 MB/s)  1,637,181 (3202 MB/s)       623,071  61.44%   x 0.62
 bench::xml_delim_3_asciichars_as_pattern  984,594 (5324 MB/s)    1,628,740 (3218 MB/s)       644,146  65.42%   x 0.60
 bench::xml_delim_3_asciichars_macro       1,023,173 (5124 MB/s)  1,623,991 (3228 MB/s)       600,818  58.72%   x 0.63
 bench::xml_delim_3_find_byte_closure      2,237,287 (2343 MB/s)  2,211,426 (2370 MB/s)       -25,861  -1.16%   x 1.01
 bench::xml_delim_3_find_char_closure      14,359,362 (365 MB/s)  14,204,971 (369 MB/s)      -154,391  -1.08%   x 1.01
 bench::xml_delim_3_find_char_set          17,588,694 (298 MB/s)  17,769,736 (295 MB/s)       181,042   1.03%   x 0.99
 bench::xml_delim_5_asciichars             1,032,586 (5077 MB/s)  1,790,343 (2928 MB/s)       757,757  73.38%   x 0.58
 bench::xml_delim_5_asciichars_as_pattern  1,034,084 (5070 MB/s)  1,612,350 (3251 MB/s)       578,266  55.92%   x 0.64
 bench::xml_delim_5_asciichars_macro       986,644 (5313 MB/s)    1,666,725 (3145 MB/s)       680,081  68.93%   x 0.59
 bench::xml_delim_5_find_byte_closure      2,257,573 (2322 MB/s)  2,408,606 (2176 MB/s)       151,033   6.69%   x 0.94
 bench::xml_delim_5_find_char_closure      8,009,474 (654 MB/s)   7,453,402 (703 MB/s)       -556,072  -6.94%   x 1.07
 bench::xml_delim_5_find_char_set          23,184,513 (226 MB/s)  23,272,996 (225 MB/s)        88,483   0.38%   x 1.00

Takeaways

  • Make sure to use #[target_feature] (and/or -C target-feature)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions