Skip to content

C2SM/data-compression

Repository files navigation

Data Compression Project

Set of tools for compressing netCDF files with Zarr.

The tools use the following compression libraries:

Installation

System Prerequisites

  • C/C++ compiler toolchain (required to build mpi4py)
  • MPI implementation (required for mpi4py)

In Santis@ALPS:

export UENV_NAME="prgenv-gnu/25.06:rc5"
uenv image pull $UENV_NAME
uenv start --view=default $UENV_NAME

once the above is complete (just for Santis, locally it is not needed):

git clone [email protected]:C2SM/data-compression.git
python -m venv venv
source venv/bin/activate
bash install_data_compression.sh

Usage

--------------------------------------------------------------------------------

Usage: dc_toolkit --help # List of available commands

Usage: dc_toolkit COMMAND --help # Documentation per command

Example:

dc_toolkit \ # CLI-tool
  evaluate_combos \ # command
  netCDF_files/tigge_pl_t_q_dx=2_2024_08_02.nc \ # netCDF file to compress
  ./dump \ # where to write the compressed file(s)
  --field-to-compress t # field of netCDF to compress

--------------------------------------------------------------------------------

UI implementation

Two User Interfaces have been implemented to make the file compression process more user-friendly. Both UIs provide functionlaities for compressors similarity metrics and file compression.

compression_analysis_ui_web.py is the web app. Outside of the mutual UI functionalities, this UI allows users to download similarity metrics plots and tweak parameters more dynamically, though it is a bit slower.

streamlit run ./src/dc_toolkit/compression_analysis_ui_web.py [OPTIONAL] --server.maxUploadSize=FILE_SIZE_MB --server.maxMessageSize=FILE_SIZE_MB

if launched from santis, make sure to ssh correctly:

ssh -L 8501:localhost:8501 santis
dc_toolkit run_web_ui_santis --user_account "d75" --uploaded_file "./netCDF_files/tigge_pl_t_q_dx=2_2024_08_02.nc" --time "00:15:00" --nodes "1" --ntasks-per-node "72"

Local web-versions and non are also available:

dc_toolkit run_local_ui
dc_toolkit run_web_ui

About

Utilities to facilitate testing of different data compression algorithms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •