Caution
This release is an early-access software technology preview. Running production workloads is not recommended.
Note
This README is derived from the original RAPIDSAI project's README. More care is necessary to remove/modify parts that are only applicable to the original version.
Note
This repository will be eventually moved to the ROCm-DS Github organization.
Note
This ROCm™ port is derived work based on the NVIDIA RAPIDS® cuDF project (version 23.10). It aims to follow the latter's directory structure and API naming as closely as possible to minimize porting friction for users that are interested in using both projects.
- cuDF Reference Documentation: Python API reference, tutorials, and topic guides.
- libcudf Reference Documentation: C/C++ GPU library API reference.
- Getting Started: Instructions for installing cuDF.
- RAPIDS Community: Get help, contribute, and collaborate.
- GitHub repository: Download the cuDF source code.
- Issue tracker: Report issues or request features.
Built based on the Apache Arrow columnar memory format, hipDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
hipDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of HIP programming.
For example, the following snippet downloads a CSV, then uses the GPU to parse it into rows and columns and run calculations:
import cudf, requests
from io import StringIO
url = "https://github.com/plotly/datasets/raw/master/tips.csv"
content = requests.get(url).content.decode('utf-8')
tips_df = cudf.read_csv(StringIO(content))
tips_df['tip_percentage'] = tips_df['tip'] / tips_df['total_bill'] * 100
# display average tip by dining party size
print(tips_df.groupby('size').tip_percentage.mean())
Output:
size
1 21.729201548727808
2 16.571919173482897
3 15.215685473711837
4 14.594900639351332
5 14.149548965142023
6 15.622920072028379
Name: tip_percentage, dtype: float64
For additional examples, browse the complete cuDF API documentation, or check out the more detailed cuDF notebooks.
Note
Currently, a docker image is not available for AMD GPUs.
Caution
Incompatibility notice: Mixing RAPIDS and ROCmDS packages/installations is not supported. To avoid conflicts, strictly separate and isolated environments must be maintained if it is required to install both RAPIDS and ROCm-DS packages on the same system.
Note
We support only AMD GPUs. Use the RAPIDS package for NVIDIA GPUs.
- ROCm HIP SDK compilers version 6.4+
- Build requirements:
rocthrust-dev
,rocm-llvm-dev
,hipcub
(Ubuntu) - Runtime requirements:
rocm-llvm-dev
(Ubuntu) - Officially supported architecture (gfx90a, gfx942).
- Ubuntu 22.04+
See build instructions.
The ROCm-DS suite of open source software libraries aims to enable execution of end-to-end data science and analytics pipelines entirely on AMD GPUs. It relies on ROCm HIP primitives for low-level compute optimization, but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
The GPU version of Apache Arrow is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, hipDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.