cuDF

cuDF (pronounced "KOO-dee-eff") is an Apache 2.0 licensed, GPU-accelerated DataFrame library for tabular data processing. The cuDF library is one part of the RAPIDS GPU Accelerated Data Science suite of libraries.

About

cuDF is composed of multiple libraries including:

libcudf: A CUDA C++ library with Apache Arrow compliant data structures and fundamental algorithms for tabular data.
pylibcudf: A Python library providing Cython bindings for libcudf.
cudf: A Python library providing
- A DataFrame library mirroring the pandas API
- A zero-code change accelerator, cudf.pandas, for existing pandas code.
cudf-polars: A Python library providing a GPU engine for Polars
dask-cudf: A Python library providing a GPU backend for Dask DataFrames

Notable projects that use cuDF include:

Spark RAPIDS: A GPU accelerator plugin for Apache Spark
Velox-cuDF: A Velox extension module to execute Velox plans on the GPU
Sirius: A GPU-native SQL engine providing extensions for libraries like DuckDB

Installation

System Requirements

Operating System, GPU driver, and supported CUDA version information can be found at the RAPIDS Installation Guide

pip

A stable release of each cudf library is available on PyPI. You will need to match the major version number of your installed CUDA version with a -cu## suffix when installing from PyPI.

A development version of each library is available as a nightly release by including the -i https://pypi.anaconda.org/rapidsai-wheels-nightly/simple index.

# CUDA 13
pip install libcudf-cu13
pip install pylibcudf-cu13
pip install cudf-cu13
pip install cudf-polars-cu13
pip install dask-cudf-cu13

# CUDA 12
pip install libcudf-cu12
pip install pylibcudf-cu12
pip install cudf-cu12
pip install cudf-polars-cu12
pip install dask-cudf-cu12

conda

A stable release of each cudf library is available to be installed with the conda package manager by specifying the -c rapidsai channel.

A development version of each library is available as a nightly release by specifying the -c rapidsai-nightly channel instead.

conda install -c rapidsai libcudf
conda install -c rapidsai pylibcudf
conda install -c rapidsai cudf
conda install -c rapidsai cudf-polars
conda install -c rapidsai dask-cudf

source

To install cuDF from source, please follow the contribution guide detailing how to setup the build environment.

Examples

The following examples showcase reading a parquet file, dropping missing rows with a null value, and performing a groupby aggregation on the data.

import cudf and the APIs are largely similar to pandas.

import cudf

df = cudf.read_parquet("data.parquet")
df.dropna().groupby(["A", "B"]).mean()

cudf.pandas

With a Python file containing pandas code:

import pandas as pd

df = cudf.read_parquet("data.parquet")
df.dropna().groupby(["A", "B"]).mean()

Use cudf.pandas by invoking python with -m cudf.pandas

$ python -m cudf.pandas script.py

If running the pandas code in an interactive Jupyter environment, call %load_ext cudf.pandas before importing pandas.

In [1]: %load_ext cudf.pandas

In [2]: import pandas as pd

In [3]: df = cudf.read_parquet("data.parquet")

In [4]: df.dropna().groupby(["A", "B"]).mean()

cudf-polars

Using Polars' lazy API, call collect with engine="gpu" to run the operation on the GPU

import polars as pl

lf = pl.scan_parquet("data.parquet")
lf.drop_nulls().group_by(["A", "B"]).mean().collect(engine="gpu")

Questions and Discussion

For bug reports or feature requests, please file an issue on the GitHub issue tracker.

For questions or discussion about cuDF and GPU data processing, feel free to post in the RAPIDS Slack workspace.

Contributing

cuDF is open to contributions from the community! Please see our guide for contributing to cuDF for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 41,934 Commits
.devcontainer		.devcontainer
.github		.github
ci		ci
cmake		cmake
conda		conda
cpp		cpp
docs		docs
img		img
java		java
notebooks		notebooks
python		python
.clang-format		.clang-format
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
RAPIDS_BRANCH		RAPIDS_BRANCH
README.md		README.md
VERSION		VERSION
build.sh		build.sh
codecov.yml		codecov.yml
dependencies.yaml		dependencies.yaml
print_env.sh		print_env.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

cuDF - A GPU-accelerated DataFrame library for tabular data processing

About

Installation

System Requirements

pip

conda

source

Examples

cudf

cudf.pandas

cudf-polars

Questions and Discussion

Contributing

About

Uh oh!

Releases 65

Packages

Uh oh!

Contributors 262

Languages

License

rapidsai/cudf

Folders and files

Latest commit

History

Repository files navigation

cuDF - A GPU-accelerated DataFrame library for tabular data processing

About

Installation

System Requirements

pip

conda

source

Examples

cudf

cudf.pandas

cudf-polars

Questions and Discussion

Contributing

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 65

Packages 0

Uh oh!

Contributors 262

Languages

Packages