33[ ![ Build Status] ( https://github.com/aklomp/base64/actions/workflows/test.yml/badge.svg )] ( https://github.com/aklomp/base64/actions/workflows/test.yml )
44
55This is an implementation of a base64 stream encoding/decoding library in C99
6- with SIMD (AVX2, NEON, AArch64/NEON, SSSE3, SSE4.1, SSE4.2, AVX) and
6+ with SIMD (AVX2, AVX512, NEON, AArch64/NEON, SSSE3, SSE4.1, SSE4.2, AVX) and
77[ OpenMP] ( http://www.openmp.org ) acceleration. It also contains wrapper functions
88to encode/decode simple length-delimited strings. This library aims to be:
99
@@ -19,6 +19,10 @@ will pick an optimized codec that lets it encode/decode 12 or 24 bytes at a
1919time, which gives a speedup of four or more times compared to the "plain"
2020bytewise codec.
2121
22+ AVX512 support is only for encoding at present, utilizing the AVX512 VL and VBMI
23+ instructions. Decoding part reused AVX2 implementations. For CPUs later than
24+ Cannonlake (manufactured in 2018) supports these instructions.
25+
2226NEON support is hardcoded to on or off at compile time, because portable
2327runtime feature detection is unavailable on ARM.
2428
@@ -59,6 +63,9 @@ optimizations described by Wojciech Muła in a
5963[ articles] ( http://0x80.pl/notesen/2016-01-17-sse-base64-decoding.html ) .
6064His own code is [ here] ( https://github.com/WojciechMula/toys/tree/master/base64 ) .
6165
66+ The AVX512 encoder is based on code from Wojciech Muła's
67+ [ base64simd] ( https://github.com/WojciechMula/base64simd ) library.
68+
6269The OpenMP implementation was added by Ferry Toth (@htot ) from [ Exalon Delft] ( http://www.exalondelft.nl ) .
6370
6471## Building
@@ -76,8 +83,8 @@ To compile just the "plain" library without SIMD codecs, type:
7683make lib/libbase64.o
7784```
7885
79- Optional SIMD codecs can be included by specifying the ` AVX2_CFLAGS ` , ` NEON32_CFLAGS ` , ` NEON64_CFLAGS ` ,
80- ` SSSE3_CFLAGS ` , ` SSE41_CFLAGS ` , ` SSE42_CFLAGS ` and/or ` AVX_CFLAGS ` environment variables.
86+ Optional SIMD codecs can be included by specifying the ` AVX2_CFLAGS ` , ` AVX512_CFLAGS ` ,
87+ ` NEON32_CFLAGS ` , ` NEON64_CFLAGS ` , ` SSSE3_CFLAGS ` , ` SSE41_CFLAGS ` , ` SSE42_CFLAGS ` and/or ` AVX_CFLAGS ` environment variables.
8188A typical build invocation on x86 looks like this:
8289
8390``` sh
@@ -93,6 +100,15 @@ Example:
93100AVX2_CFLAGS=-mavx2 make
94101```
95102
103+ ### AVX512
104+
105+ To build and include the AVX512 codec, set the ` AVX512_CFLAGS ` environment variable to a value that will turn on AVX512 support in your compiler, typically ` -mavx512vl -mavx512vbmi ` .
106+ Example:
107+
108+ ``` sh
109+ AVX512_CFLAGS=" -mavx512vl -mavx512vbmi" make
110+ ```
111+
96112The codec will only be used if runtime feature detection shows that the target machine supports AVX2.
97113
98114### SSSE3
@@ -208,6 +224,7 @@ Mainly there for testing purposes, this is also useful on ARM where the only way
208224The following constants can be used:
209225
210226- ` BASE64_FORCE_AVX2 `
227+ - ` BASE64_FORCE_AVX512 `
211228- ` BASE64_FORCE_NEON32 `
212229- ` BASE64_FORCE_NEON64 `
213230- ` BASE64_FORCE_PLAIN `
@@ -434,7 +451,7 @@ x86 processors
434451| i7-4770 @ 3.4 GHz DDR1600 OPENMP 4 thread | 4884\* | 7099\* | 4917\* | 7057\* | 4799\* | 7143\* | 4902\* | 7219\* |
435452| i7-4770 @ 3.4 GHz DDR1600 OPENMP 8 thread | 5212\* | 8849\* | 5284\* | 9099\* | 5289\* | 9220\* | 4849\* | 9200\* |
436453| i7-4870HQ @ 2.5 GHz | 1471\* | 3066\* | 6721\* | 6962\* | 7015\* | 8267\* | 8328\* | 11576\* |
437- | i5-4590S @ 3.0 GHz | 3356 | 3197 | 4363 | 6104 | 4243 | 6233 | 4160 | 6344 |
454+ | i5-4590S @ 3.0 GHz | 3356 | 3197 | 4363 | 6104 | 4243\* | 6233 | 4160\* | 6344 |
438455| Xeon X5570 @ 2.93 GHz | 2161 | 1508 | 3160 | 3915 | - | - | - | - |
439456| Pentium4 @ 3.4 GHz | 896 | 740 | - | - | - | - | - | - |
440457| Atom N270 | 243 | 266 | 508 | 387 | - | - | - | - |
0 commit comments