Skip to content

find_usable_cuda_devices default value raises RuntimeError #16816

@connesy

Description

@connesy

Bug description

When calling find_usable_cuda_devices with the default value num_devices=-1, the function raises RuntimeError even though the documentation states that it should work.

Example

>>> from lightning_fabric.accelerators import find_usable_cuda_devices
>>> find_usable_cuda_devices()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../site-packages/lightning_fabric/accelerators/cuda.py", line 122, in find_usable_cuda_devices
    raise RuntimeError(
RuntimeError: You requested to find -1 devices but only 1 are currently available. The devices [] are occupied by other processes and can't be used at the moment.

Expected:
The function should return a list of all available cude devices.

How to reproduce the bug

>>> from lightning_fabric.accelerators import find_usable_cuda_devices
>>> find_usable_cuda_devices()

Error messages and logs

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../site-packages/lightning_fabric/accelerators/cuda.py", line 122, in find_usable_cuda_devices
    raise RuntimeError(
RuntimeError: You requested to find -1 devices but only 1 are currently available. The devices [] are occupied by other processes and can't be used at the moment.

Environment

Current environment
* CUDA:
	- GPU:
		- Quadro RTX 5000
	- available:         True
	- version:           11.7
* Lightning:
	- lightning-utilities: 0.6.0.post0
	- pytorch-lightning: 1.9.2
	- torch:             1.13.1
	- torchmetrics:      0.11.1
* Packages:
	- aiohttp:           3.8.4
	- aiosignal:         1.3.1
	- alembic:           1.9.4
	- anyio:             3.6.2
	- argon2-cffi:       21.3.0
	- argon2-cffi-bindings: 21.2.0
	- arrow:             1.2.3
	- asn1crypto:        1.5.1
	- astor:             0.8.1
	- asttokens:         2.2.1
	- async-timeout:     4.0.2
	- attrs:             22.2.0
	- azure-common:      1.1.28
	- azure-core:        1.26.3
	- azure-identity:    1.12.0
	- azure-keyvault-secrets: 4.6.0
	- azure-storage-blob: 12.14.1
	- backcall:          0.2.0
	- beautifulsoup4:    4.11.2
	- black:             23.1.0
	- bleach:            6.0.0
	- cachetools:        5.3.0
	- catboost:          1.1.1
	- certifi:           2022.12.7
	- cffi:              1.15.1
	- charset-normalizer: 2.1.1
	- click:             8.1.3
	- cloudpickle:       2.2.1
	- cmaes:             0.9.1
	- cmdstanpy:         1.1.0
	- colorlog:          6.7.0
	- comm:              0.1.2
	- contourpy:         1.0.7
	- convertdate:       2.4.0
	- coverage:          7.1.0
	- cryptography:      38.0.4
	- cycler:            0.11.0
	- cython:            0.29.33
	- darts:             0.23.0
	- darts-tools:       0.3.1
	- debugpy:           1.6.6
	- decorator:         5.1.1
	- defusedxml:        0.7.1
	- ephem:             4.1.4
	- et-xmlfile:        1.1.0
	- exceptiongroup:    1.1.0
	- executing:         1.2.0
	- fastapi:           0.92.0
	- fastapi-utils:     0.2.1
	- fastjsonschema:    2.16.2
	- filelock:          3.9.0
	- flynt:             0.77
	- fonttools:         4.38.0
	- fqdn:              1.5.1
	- frozenlist:        1.3.3
	- fsspec:            2023.1.0
	- graphviz:          0.20.1
	- greenlet:          2.0.2
	- gunicorn:          20.1.0
	- h11:               0.14.0
	- hijri-converter:   2.2.4
	- holidays:          0.19
	- httplib2:          0.21.0
	- idna:              3.4
	- iniconfig:         2.0.0
	- ipykernel:         6.21.2
	- ipympl:            0.9.3
	- ipython:           8.10.0
	- ipython-genutils:  0.2.0
	- ipywidgets:        8.0.4
	- isodate:           0.6.1
	- isoduration:       20.11.0
	- isort:             5.12.0
	- jedi:              0.18.2
	- jinja2:            3.1.2
	- joblib:            1.2.0
	- jsonpointer:       2.3
	- jsonschema:        4.17.3
	- jupyter-client:    8.0.3
	- jupyter-core:      5.2.0
	- jupyter-events:    0.6.3
	- jupyter-server:    2.3.0
	- jupyter-server-terminals: 0.4.4
	- jupyterlab-pygments: 0.2.2
	- jupyterlab-widgets: 3.0.5
	- kiwisolver:        1.4.4
	- korean-lunar-calendar: 0.3.1
	- lightgbm:          3.3.5
	- lightning-utilities: 0.6.0.post0
	- llvmlite:          0.39.1
	- lunarcalendar:     0.0.9
	- mako:              1.2.4
	- markupsafe:        2.1.2
	- matplotlib:        3.7.0
	- matplotlib-inline: 0.1.6
	- mistune:           2.0.5
	- ml-framework:      0.9.16
	- msal:              1.21.0
	- msal-extensions:   1.0.0
	- msrest:            0.7.1
	- multidict:         6.0.4
	- mypy:              1.0.0
	- mypy-extensions:   1.0.0
	- nbclassic:         0.5.1
	- nbclient:          0.7.2
	- nbconvert:         7.2.9
	- nbformat:          5.7.3
	- nest-asyncio:      1.5.6
	- nfoursid:          1.0.1
	- notebook:          6.5.2
	- notebook-shim:     0.2.2
	- numba:             0.56.4
	- numpy:             1.23.5
	- nvidia-cublas-cu11: 11.10.3.66
	- nvidia-cuda-nvrtc-cu11: 11.7.99
	- nvidia-cuda-runtime-cu11: 11.7.99
	- nvidia-cudnn-cu11: 8.5.0.96
	- oauth2client:      4.1.3
	- oauthlib:          3.2.2
	- openpyxl:          3.1.1
	- optuna:            3.1.0
	- oscrypto:          1.3.0
	- packaging:         23.0
	- pandas:            1.5.3
	- pandocfilters:     1.5.0
	- parso:             0.8.3
	- pathspec:          0.11.0
	- patsy:             0.5.3
	- pexpect:           4.8.0
	- pickleshare:       0.7.5
	- pillow:            9.4.0
	- pip:               23.0
	- platformdirs:      3.0.0
	- plotly:            5.13.0
	- pluggy:            1.0.0
	- pmdarima:          2.0.2
	- portalocker:       2.7.0
	- prometheus-client: 0.16.0
	- prompt-toolkit:    3.0.36
	- prophet:           1.1.2
	- psutil:            5.9.4
	- ptyprocess:        0.7.0
	- pure-eval:         0.2.2
	- py:                1.11.0
	- pyarrow:           8.0.0
	- pyasn1:            0.4.8
	- pyasn1-modules:    0.2.8
	- pycparser:         2.21
	- pycryptodomex:     3.17
	- pydantic:          1.10.5
	- pygments:          2.14.0
	- pyjwt:             2.6.0
	- pymeeus:           0.5.12
	- pymysql:           1.0.2
	- pyod:              1.0.7
	- pyopenssl:         22.1.0
	- pyparsing:         3.0.9
	- pyrsistent:        0.19.3
	- pytest:            7.2.1
	- pytest-cov:        4.0.0
	- pytest-html:       3.2.0
	- pytest-metadata:   2.0.4
	- python-dateutil:   2.8.2
	- python-json-logger: 2.0.6
	- pytorch-lightning: 1.9.2
	- pytz:              2022.7.1
	- pyyaml:            6.0
	- pyzmq:             25.0.0
	- requests:          2.28.2
	- requests-oauthlib: 1.3.1
	- rfc3339-validator: 0.1.4
	- rfc3986-validator: 0.1.1
	- rsa:               4.9
	- scikit-learn:      1.2.1
	- scipy:             1.10.0
	- seaborn:           0.12.2
	- send2trash:        1.8.0
	- setuptools:        67.3.2
	- shap:              0.41.0
	- six:               1.16.0
	- slicer:            0.0.7
	- sniffio:           1.3.0
	- snowflake-connector-python: 2.9.0
	- snowflake-sqlalchemy: 1.4.6
	- soupsieve:         2.4
	- sqlalchemy:        1.4.46
	- stack-data:        0.6.2
	- starlette:         0.25.0
	- statsforecast:     1.4.0
	- statsmodels:       0.13.5
	- tbats:             1.1.2
	- tenacity:          8.2.1
	- terminado:         0.17.1
	- threadpoolctl:     3.1.0
	- tinycss2:          1.2.1
	- toml:              0.10.2
	- tomli:             2.0.1
	- torch:             1.13.1
	- torchmetrics:      0.11.1
	- tornado:           6.2
	- tqdm:              4.64.1
	- traitlets:         5.9.0
	- typing-extensions: 4.5.0
	- uri-template:      1.2.0
	- urllib3:           1.26.14
	- uvicorn:           0.20.0
	- vulture:           2.7
	- wcwidth:           0.2.6
	- webcolors:         1.12
	- webencodings:      0.5.1
	- websocket-client:  1.5.1
	- wheel:             0.38.4
	- widgetsnbextension: 4.0.5
	- xarray:            2023.2.0
	- xgboost:           1.7.4
	- yarl:              1.8.2
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- ELF
	- processor:         x86_64
	- python:            3.10.9
	- version:           #58~20.04.1-Ubuntu SMP Thu Oct 13 13:09:46 UTC 2022

More info

No response

cc @Borda @justusschock @awaelchli

Metadata

Metadata

Assignees

Labels

accelerator: cudaCompute Unified Device Architecture GPUbugSomething isn't workinggood first issueGood for newcomershelp wantedOpen to be worked on

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions