-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
accelerator: cudaCompute Unified Device Architecture GPUCompute Unified Device Architecture GPUbugSomething isn't workingSomething isn't workinggood first issueGood for newcomersGood for newcomershelp wantedOpen to be worked onOpen to be worked on
Milestone
Description
Bug description
When calling find_usable_cuda_devices with the default value num_devices=-1, the function raises RuntimeError even though the documentation states that it should work.
Example
>>> from lightning_fabric.accelerators import find_usable_cuda_devices
>>> find_usable_cuda_devices()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../site-packages/lightning_fabric/accelerators/cuda.py", line 122, in find_usable_cuda_devices
raise RuntimeError(
RuntimeError: You requested to find -1 devices but only 1 are currently available. The devices [] are occupied by other processes and can't be used at the moment.Expected:
The function should return a list of all available cude devices.
How to reproduce the bug
>>> from lightning_fabric.accelerators import find_usable_cuda_devices
>>> find_usable_cuda_devices()Error messages and logs
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../site-packages/lightning_fabric/accelerators/cuda.py", line 122, in find_usable_cuda_devices
raise RuntimeError(
RuntimeError: You requested to find -1 devices but only 1 are currently available. The devices [] are occupied by other processes and can't be used at the moment.
Environment
Current environment
* CUDA:
- GPU:
- Quadro RTX 5000
- available: True
- version: 11.7
* Lightning:
- lightning-utilities: 0.6.0.post0
- pytorch-lightning: 1.9.2
- torch: 1.13.1
- torchmetrics: 0.11.1
* Packages:
- aiohttp: 3.8.4
- aiosignal: 1.3.1
- alembic: 1.9.4
- anyio: 3.6.2
- argon2-cffi: 21.3.0
- argon2-cffi-bindings: 21.2.0
- arrow: 1.2.3
- asn1crypto: 1.5.1
- astor: 0.8.1
- asttokens: 2.2.1
- async-timeout: 4.0.2
- attrs: 22.2.0
- azure-common: 1.1.28
- azure-core: 1.26.3
- azure-identity: 1.12.0
- azure-keyvault-secrets: 4.6.0
- azure-storage-blob: 12.14.1
- backcall: 0.2.0
- beautifulsoup4: 4.11.2
- black: 23.1.0
- bleach: 6.0.0
- cachetools: 5.3.0
- catboost: 1.1.1
- certifi: 2022.12.7
- cffi: 1.15.1
- charset-normalizer: 2.1.1
- click: 8.1.3
- cloudpickle: 2.2.1
- cmaes: 0.9.1
- cmdstanpy: 1.1.0
- colorlog: 6.7.0
- comm: 0.1.2
- contourpy: 1.0.7
- convertdate: 2.4.0
- coverage: 7.1.0
- cryptography: 38.0.4
- cycler: 0.11.0
- cython: 0.29.33
- darts: 0.23.0
- darts-tools: 0.3.1
- debugpy: 1.6.6
- decorator: 5.1.1
- defusedxml: 0.7.1
- ephem: 4.1.4
- et-xmlfile: 1.1.0
- exceptiongroup: 1.1.0
- executing: 1.2.0
- fastapi: 0.92.0
- fastapi-utils: 0.2.1
- fastjsonschema: 2.16.2
- filelock: 3.9.0
- flynt: 0.77
- fonttools: 4.38.0
- fqdn: 1.5.1
- frozenlist: 1.3.3
- fsspec: 2023.1.0
- graphviz: 0.20.1
- greenlet: 2.0.2
- gunicorn: 20.1.0
- h11: 0.14.0
- hijri-converter: 2.2.4
- holidays: 0.19
- httplib2: 0.21.0
- idna: 3.4
- iniconfig: 2.0.0
- ipykernel: 6.21.2
- ipympl: 0.9.3
- ipython: 8.10.0
- ipython-genutils: 0.2.0
- ipywidgets: 8.0.4
- isodate: 0.6.1
- isoduration: 20.11.0
- isort: 5.12.0
- jedi: 0.18.2
- jinja2: 3.1.2
- joblib: 1.2.0
- jsonpointer: 2.3
- jsonschema: 4.17.3
- jupyter-client: 8.0.3
- jupyter-core: 5.2.0
- jupyter-events: 0.6.3
- jupyter-server: 2.3.0
- jupyter-server-terminals: 0.4.4
- jupyterlab-pygments: 0.2.2
- jupyterlab-widgets: 3.0.5
- kiwisolver: 1.4.4
- korean-lunar-calendar: 0.3.1
- lightgbm: 3.3.5
- lightning-utilities: 0.6.0.post0
- llvmlite: 0.39.1
- lunarcalendar: 0.0.9
- mako: 1.2.4
- markupsafe: 2.1.2
- matplotlib: 3.7.0
- matplotlib-inline: 0.1.6
- mistune: 2.0.5
- ml-framework: 0.9.16
- msal: 1.21.0
- msal-extensions: 1.0.0
- msrest: 0.7.1
- multidict: 6.0.4
- mypy: 1.0.0
- mypy-extensions: 1.0.0
- nbclassic: 0.5.1
- nbclient: 0.7.2
- nbconvert: 7.2.9
- nbformat: 5.7.3
- nest-asyncio: 1.5.6
- nfoursid: 1.0.1
- notebook: 6.5.2
- notebook-shim: 0.2.2
- numba: 0.56.4
- numpy: 1.23.5
- nvidia-cublas-cu11: 11.10.3.66
- nvidia-cuda-nvrtc-cu11: 11.7.99
- nvidia-cuda-runtime-cu11: 11.7.99
- nvidia-cudnn-cu11: 8.5.0.96
- oauth2client: 4.1.3
- oauthlib: 3.2.2
- openpyxl: 3.1.1
- optuna: 3.1.0
- oscrypto: 1.3.0
- packaging: 23.0
- pandas: 1.5.3
- pandocfilters: 1.5.0
- parso: 0.8.3
- pathspec: 0.11.0
- patsy: 0.5.3
- pexpect: 4.8.0
- pickleshare: 0.7.5
- pillow: 9.4.0
- pip: 23.0
- platformdirs: 3.0.0
- plotly: 5.13.0
- pluggy: 1.0.0
- pmdarima: 2.0.2
- portalocker: 2.7.0
- prometheus-client: 0.16.0
- prompt-toolkit: 3.0.36
- prophet: 1.1.2
- psutil: 5.9.4
- ptyprocess: 0.7.0
- pure-eval: 0.2.2
- py: 1.11.0
- pyarrow: 8.0.0
- pyasn1: 0.4.8
- pyasn1-modules: 0.2.8
- pycparser: 2.21
- pycryptodomex: 3.17
- pydantic: 1.10.5
- pygments: 2.14.0
- pyjwt: 2.6.0
- pymeeus: 0.5.12
- pymysql: 1.0.2
- pyod: 1.0.7
- pyopenssl: 22.1.0
- pyparsing: 3.0.9
- pyrsistent: 0.19.3
- pytest: 7.2.1
- pytest-cov: 4.0.0
- pytest-html: 3.2.0
- pytest-metadata: 2.0.4
- python-dateutil: 2.8.2
- python-json-logger: 2.0.6
- pytorch-lightning: 1.9.2
- pytz: 2022.7.1
- pyyaml: 6.0
- pyzmq: 25.0.0
- requests: 2.28.2
- requests-oauthlib: 1.3.1
- rfc3339-validator: 0.1.4
- rfc3986-validator: 0.1.1
- rsa: 4.9
- scikit-learn: 1.2.1
- scipy: 1.10.0
- seaborn: 0.12.2
- send2trash: 1.8.0
- setuptools: 67.3.2
- shap: 0.41.0
- six: 1.16.0
- slicer: 0.0.7
- sniffio: 1.3.0
- snowflake-connector-python: 2.9.0
- snowflake-sqlalchemy: 1.4.6
- soupsieve: 2.4
- sqlalchemy: 1.4.46
- stack-data: 0.6.2
- starlette: 0.25.0
- statsforecast: 1.4.0
- statsmodels: 0.13.5
- tbats: 1.1.2
- tenacity: 8.2.1
- terminado: 0.17.1
- threadpoolctl: 3.1.0
- tinycss2: 1.2.1
- toml: 0.10.2
- tomli: 2.0.1
- torch: 1.13.1
- torchmetrics: 0.11.1
- tornado: 6.2
- tqdm: 4.64.1
- traitlets: 5.9.0
- typing-extensions: 4.5.0
- uri-template: 1.2.0
- urllib3: 1.26.14
- uvicorn: 0.20.0
- vulture: 2.7
- wcwidth: 0.2.6
- webcolors: 1.12
- webencodings: 0.5.1
- websocket-client: 1.5.1
- wheel: 0.38.4
- widgetsnbextension: 4.0.5
- xarray: 2023.2.0
- xgboost: 1.7.4
- yarl: 1.8.2
* System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.10.9
- version: #58~20.04.1-Ubuntu SMP Thu Oct 13 13:09:46 UTC 2022
More info
No response
awaelchli
Metadata
Metadata
Assignees
Labels
accelerator: cudaCompute Unified Device Architecture GPUCompute Unified Device Architecture GPUbugSomething isn't workingSomething isn't workinggood first issueGood for newcomersGood for newcomershelp wantedOpen to be worked onOpen to be worked on