Set of tools for compressing netCDF files with Zarr.
The tools use the following compression libraries:
- Numcodecs: Zarr native library [documentation]
- numcodecs-wasm: Compression for codecs compiled to WebAssembly [documentation]
- EBCC: Error Bounded Climate Compressor [documentation]
System Prerequisites
- C/C++ compiler toolchain (required to build mpi4py)
- MPI implementation (required for mpi4py)
In Santis@ALPS:
export UENV_NAME="prgenv-gnu/25.06:rc5"
uenv image pull $UENV_NAME
uenv start --view=default $UENV_NAME
once the above is complete (just for Santis, locally it is not needed):
git clone [email protected]:C2SM/data-compression.git
python -m venv venv
source venv/bin/activate
bash install_data_compression.sh
--------------------------------------------------------------------------------
Usage: dc_toolkit --help # List of available commands
Usage: dc_toolkit COMMAND --help # Documentation per command
Example:
dc_toolkit \ # CLI-tool
evaluate_combos \ # command
netCDF_files/tigge_pl_t_q_dx=2_2024_08_02.nc \ # netCDF file to compress
./dump \ # where to write the compressed file(s)
--field-to-compress t # field of netCDF to compress
--------------------------------------------------------------------------------
Two User Interfaces have been implemented to make the file compression process more user-friendly. Both UIs provide functionlaities for compressors similarity metrics and file compression.
compression_analysis_ui_web.py is the web app. Outside of the mutual UI functionalities, this UI allows users to download similarity metrics plots and tweak parameters more dynamically, though it is a bit slower.
streamlit run ./src/dc_toolkit/compression_analysis_ui_web.py [OPTIONAL] --server.maxUploadSize=FILE_SIZE_MB --server.maxMessageSize=FILE_SIZE_MB
if launched from santis, make sure to ssh correctly:
ssh -L 8501:localhost:8501 santis
dc_toolkit run_web_ui_santis --user_account "d75" --uploaded_file "./netCDF_files/tigge_pl_t_q_dx=2_2024_08_02.nc" --time "00:15:00" --nodes "1" --ntasks-per-node "72"
Local web-versions and non are also available:
dc_toolkit run_local_ui
dc_toolkit run_web_ui