Skip to content

[Bug]: Inconsistent jax/jaxlib versioning leads to crashes #253

@sssangha

Description

@sssangha

Checked for duplicates

Yes - I've already checked

Describe the bug

When running dolphin (e.g. dolphin run dolphin_config.yaml), the following error is raised through jaxlib:

Traceback (most recent call last):
  File "/u/leffe-data/ssangha/conda_installation/stable_july30_2020/envs/dolphin-env/bin/dolphin", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/u/leffe-data/ssangha/conda_installation/stable_july30_2020/envs/dolphin-env/lib/python3.11/site-packages/dolphin/cli.py", line 28, in main
    run_func(**arg_dict)
  File "/u/leffe-data/ssangha/conda_installation/stable_july30_2020/envs/dolphin-env/lib/python3.11/site-packages/dolphin/workflows/_cli_run.py", line 30, in run
    displacement.run(cfg, debug=debug)
  File "/u/leffe-data/ssangha/conda_installation/stable_july30_2020/envs/dolphin-env/lib/python3.11/site-packages/dolphin/_log.py", line 118, in wrapper
    result = f(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^
  File "/u/leffe-data/ssangha/conda_installation/stable_july30_2020/envs/dolphin-env/lib/python3.11/site-packages/dolphin/workflows/displacement.py", line 64, in run
    utils.disable_gpu()
  File "/u/leffe-data/ssangha/conda_installation/stable_july30_2020/envs/dolphin-env/lib/python3.11/site-packages/dolphin/utils.py", line 137, in disable_gpu
    jax.config.update("jax_platform_name", "cpu")

Upon closer inspection, it looks like my jax/jaxlib modules are out of sync as so:

(dolphin-env) [ssangha@leffe standard_dolphin]$ pip list | grep jax
jax                0.4.25
jaxlib             0.4.23.dev20240125

It was advised through the jax documentation to ensure compatibility between the two packages, but didn't advise how to automate this (i.e. an alternate to simply hardcoding the versioning of the two manually).

However, a separate discussion thread advised to install the most recent version through pip, which worked to clear this issue on my end.

Please refer to the PR #252 I've posted.

*Note while I have demonstrated this isn't an issue associated with the config/run parameters themselves, I've gone ahead and attached the config I used here anyways.
dolphin_config_yaml.txt

What did you expect?

I expected jax/jaxlib to work without leading to a crash.

Reproducible steps

1. dolphin config step: `dolphin config --slc-files CSLCs/*.h5 -sds /data/VV --mask-file /u/trappist-r0/bato/work/OPERA_Applications/CSLC/Discover/BigIsland/WBD/sub_waterMask_utm.tif --threads-per-worker 8 -s 6 3 --unwrap-method phass --work-directory dolphin_config_phass_stride63 --n-parallel-unwrap 2 -o dolphin_config.yaml`
2. dolphin run step `dolphin run dolphin_config.yaml`

Environment

dolphin/isce info:
     dolphin: 0.16.0
 opera_utils: 0.3.0
       isce3: 0.19.1
       tophu: None

System:
      python: 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:43:09) [GCC 12.3.0]
  executable: /u/leffe-data/ssangha/conda_installation/stable_july30_2020/envs/dolphin-env/bin/python
     machine: Linux-3.10.0-1160.59.1.el7.x86_64-x86_64-with-glibc2.17

Python deps:
       numpy: 1.26.4
       numba: 0.58.1
         jax: 0.4.23
  osgeo.gdal: 3.8.3
        h5py: 3.10.0
 ruamel_yaml: None
    pydantic: 2.6.0
  setuptools: 69.0.3
optional GPU info:
gpu_is_available() = True
         jax: 0.4.23
gpu_is_available: True
None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions