An Awesome list of oneAPI projects
A curated list of awesome oneAPI and SYCL projects for solutions across industry and community. Inspired by awesome-machine-learning.
oneAPI is an open, cross-industry, standards-based, unified, multiarchitecture, multi-vendor programming model that delivers a common developer experience across accelerator architectures – for faster application performance, more productivity, and greater innovation. See, https://oneapi.io/ for more information.
- AI - Computer Vision
- AI - Data Science
- AI - Machine Learning
- AI - Natural Language Processing
- AI - Frameworks and Toolkits
- Autonomous Systems
- Data Visualization and Rendering
- Energy
- Gaming
- Manufacturing
- Mathematics and Science
- Tools & Development
- Tutorials
- DPCPP-image-Blurring-with-SYCL - A program developed with DPC++ SYCL for parallelizing the Image Blurring process.
- daal4py - A simplified API to Intel® DAAL that allows for fast usage of the framework suited for Data Scientists or Machine Learning users. Built to help provide an abstraction to Intel® DAAL for either direct usage or integration into one's own framework.
- Performance and Portability Evaluation of the K-Means Algorithm on SYCL with CPU-GPU architectures - This work uses the k-means algorithm to asses the performance portability of one of the most advanced implementations of the literature He-Vialle over different programming models (DPC++ CUDA OpenMP) and multi-vendor CPU-GPU architectures.
- dpcpp-svm - A DPC++ version of ThunderSVM. The mission of ThunderSVM is to help users easily and efficiently apply SVMs to solve problems. ThunderSVM exploits GPU and multi-core CPUs to achieve high efficiency.
- PLSSVM - Implementation of a parallel least squares support vector machine using multiple backends for different GPU vendors.
- HETU - Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, developed by DAIR Lab at Peking University. It takes account of both high availability in industry and innovation in academia.
- lc0 - Lc0 is a UCI-compliant chess engine designed to play chess via neural network, specifically those of the LeelaChessZero project.
- Singa - Apache SINGA is an Apache Top Level Project, focusing on distributed training of deep learning and machine learning models.
- PaddlePaddle - PaddlePaddle, as the first independent R&D deep learning platform in China, has been officially open-sourced to professional communities since 2016. It is an industrial platform with advanced technologies and rich features that cover core deep learning frameworks, basic model libraries, end-to-end development kits, tools & components as well as service platforms.
- XLA - XLA (Accelerated Linear Algebra) is an open-source machine learning (ML) compiler for GPUs, CPUs, and ML accelerators. The XLA compiler takes models from popular ML frameworks such as PyTorch, TensorFlow, and JAX, and optimizes them for high-performance execution across different hardware platforms including GPUs, CPUs, and ML accelerators.
- TPU-MLIR - TPU-MLIR is an open-source machine-learning compiler based on MLIR for TPU. This project provides a complete toolchain, which can convert pre-trained neural networks from different frameworks into binary files bmodel that can be efficiently operated on TPUs.
- Px0 - Px0 is a UCI-compliant xiangqi engine designed to play xiangqi via neural network, specifically those of the PikaXiangqiZero project.
- OAP MLlib - OAP MLlib is an optimized package to accelerate machine learning algorithms in Apache Spark MLlib. It is compatible with Spark MLlib and leverages open source Intel® oneAPI Data Analytics Library (oneDAL) to provide highly optimized algorithms and get most out of CPU and GPU capabilities. It also take advantage of open source Intel® oneAPI Collective Communications Library (oneCCL) to provide efficient communication patterns in multi-node multi-GPU clusters.
- CTranslate2 - CTranslate2 is a C and Python library that optimizes inference with transformer models, supporting models trained in various frameworks. It implements various performance optimization techniques such as weights quantization, layers fusion, batch reordering, and more for benchmarks of transformer models on CPU and GPU.
- hachi - Hachi is a locally hosted web app that enables natural language search for videos and images, using an AI-based machine learning model powered by OpenAI CLIP.
- ik_llama.cpp - This repository is a fork of llama.cpp with better CPU and hybrid GPU/CPU performance, new SOTA quantization types, first-class Bitnet support, better DeepSeek performance via MLA, FlashMLA, fused MoE operations and tensor overrides for hybrid GPU/CPU inference, row-interleaved quant packing, etc.
- deeplearning4j - The Eclipse DeepLearning4J ecosystem supports all the needs for JVM-based deep learning applications with various libraries.
- DeepRec - DeepRec is a recommendation deep learning framework based on TensorFlow, which has been developed since 2016 and supports core businesses such as Taobao search recommendation and advertising.
- dlstreamer - The Intel Deep Learning Streamer is an open source streaming media analytics framework based on the GStreamer multimedia framework. It is optimized for performance and functional interoperability between GStreamer plugins built on various backend libraries, with support for over 70 pre-trained models for various use cases.
- flashlight - Flashlight is a machine learning library written in C and created by Facebook AI Research. It features internal APIs for tensor computation, high performance defaults using just-in-time kernel compilation, and scalability
- intel-extension-for-tensorflow - Intel Extension for TensorFlow is a plugin based on TensorFlow PluggableDevice, which aims to bring devices such as Intel XPU, GPU, and CPU into TensorFlow.
- intel-extension-for-transformers - Intel Extension for Transformers is a toolkit designed to efficiently accelerate transformer-based models on Intel platforms, optimized for 4th gen Intel Xeon Scalable Processor (codename Sapphire Rapids).
- intel-extension-for-pytorch - Intel Extension for PyTorch provides features optimizations for an extra performance boost on Intel hardware including CPUs and Discrete GPUs and offers easy GPU acceleration for Intel Discrete GPUs with PyTorch.
- KernelAbstractions.jl - KernelAbstractions (KA) is a package that enables you to write GPU-like kernels targetting different execution backends.
- neural-compressor - Intel Neural Compressor is an open-source Python library for applying popular model compression techniques, such as pruning, quantization, sparsity, and distillation, on all mainstream deep learning frameworks and Intel extensions.
- optimum-intel - Optimum Intel is an interface between the Transformers and Diffusers libraries and Intel's different tools and libraries that help accelerate end-to-end pipelines on Intel architectures.
- portDNN - portDNN is a library implementing neural network algorithms written using SYCL.
- PPLNN - PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing. It can run various ONNX models and has better support for OpenMMLab.
- pynufft - The pynufft library is a Python package for non-uniform fast Fourier transform, based on a min-max interpolator, with experimental support for CuPy, PyTorch, and TensorFlow Eager mode
- scikit-learn-intelex - Intel r Extension for scikit learn is a free AI accelerator that can accelerate existing scikit learn code without the need to change the existing code. It offers patching and replacing the stock scikit learn algorithms with their optimized versions provided by the extension, which results in over 10-100x acceleration across a variety of applications.
- shumai - The Shumai project is a differentiable tensor library for TypeScript and JavaScript built with Bun and Flashlight. It provides standard array utilities, gradients, and supported operators.
- webnn-native- WebNN Native is an implementation of the Web Neural Network API, providing building blocks, headers, and backends for ML platforms including DirectML, OpenVINO, and XNNPACK.
- ZenDNN - Zen deep neural network library ZendNN is a powerful library for deep learning inference applications on AMD CPUs. It includes APIs for basic neural network building blocks and is optimized for AMD CPUs.
- InfiniTensor - InfiniTensor is a high-performance inference engine tailored for GPUs and AI accelerators. Its design focuses on effective deployment and swift academic validation.
- ThundeRiNG ThundeRiNG is a high performance and high quality FPGA-based pseudo random number generator (PRNG) that can concurrently generate massive number of independent sequences of random numbers.
- AmgT - AmgT, a new AMG solver that utilizes the tensor core and mixed precision ability of the latest GPUs during multiple phases of the AMG algorithm.
- nndeploy - nndeploy is an easy-to-use, high-performance, multi-platform AI inference deployment framework.
- Neurenix - Neurenix is an AI framework optimized for embedded devices (Edge AI), with support for multiple GPUs and distributed clusters. The framework specializes in AI agents, with native support for multi-agent, reinforcement learning, and autonomous AI.
- LBANN: Livermore Big Artificial Neural Network Toolkit - The Livermore Big Artificial Neural Network toolkit (LBANN) is an open-source, HPC-centric, deep learning training framework that is optimized to compose multiple levels of parallelism. LBANN provides model-parallel acceleration through domain decomposition to optimize for strong scaling of network training. It also allows for composition of model-parallelism with both data parallelism and ensemble training methods for training large neural networks with massive amounts of data. LBANN is able to advantage of tightly-coupled accelerators, low-latency high-bandwidth networking, and high-bandwidth parallel file systems.
- DiffKt - A Differentiable Programming Framework for Kotlin - DiffKt is a general-purpose, functional, differentiable programming framework for Kotlin. It can automatically differentiate through functions of tensors, scalars, and user-defined types. It supports forward-mode and reverse-mode differentiation including Jacobian-vector and vector-Jacobian products, which can be composed for higher-order differentiation.
- MegEngine - MegEngine is a fast, scalable and easy-to-use deep learning framework, with auto-differentiation.
- OneFlow - OneFlow is a performance-centered and open-source deep learning framework.
- MagmaDNN - A neural network library in c++ aimed at providing a simple, modularized framework for deep learning that is accelerated for heterogeneous architectures.
- FastChat - FastChat is an open platform for training, serving, and evaluating large language model based chatbots.
- Blender - Blender is the free and open source 3D creation suite. It supports the entirety of the 3D pipeline-modeling, rigging, animation, simulation, rendering, compositing, motion tracking and video editing.
- Brayns - Brayns is a large scientific visualization platform based on CPU ray tracing, using an extension plugin architecture. It comes with several pre-made plugins, such as CircuitExplorer and MoleculeExplorer, and requires several dependencies to build
- ChameleonRT - ChameleonRT is an example path tracer that runs on multiple ray tracing backends including Embree, SYCL, DXR, Optix, Vulkan, Metal, and Ospray.
- embree - Embree is a high performance ray tracing library developed by Intel that targets graphics application developers to improve the performance of photo-realistic rendering applications. It includes various primitive types such as triangles, quads, grids, and curve primitives, and supports dynamic scenes. Embree also offers support for both CPUs and GPUs, while maintaining one code base to improve productivity and eliminate inconsistencies between the two versions of the renderer.
- f3d - F3D is a fast and minimalist 3D viewer that supports multiple file formats and can show animations, supporting thumbnails and many rendering and texturing options including real-time physically based rendering and raytracing.
- hdospray - The ospray for hydra is an open-source plugin for Pixar's USD to extend the hydra rendering framework with Intel Ospray. It is highly optimized for Intel CPU architectures ranging from laptops to large-scale distributed HPC systems.
- LightWave Explorer - Lightwave explorer is an open source nonlinear optics simulator, intended to be fast, visual, and flexible for students and researchers to play with ultrashort laser pulses and nonlinear optics without having to buy a laser first.
- oidn - Intel Open Image Denoise is an open-source library for image denoising in ray tracing rendering applications with high quality and performance, thanks to efficient deep learning-based filters that can be trained using the included toolkit and user-provided image datasets.
- openpgl - The Intel Open Path Guiding Library (Open PGL) implements path guiding into a renderer, offering implementations of current state-of-the-art path guiding methods which increase the sampling quality and renderer efficiency.
- ospray - Ospray is an open source, scalable and portable ray tracing engine designed for high fidelity visualization on Intel architecture CPUs. It allows users to easily build interactive applications using ray-tracing based rendering for both surface and volume-based visualizations.
- ospray_studio - Ospray Studio is an open-source, interactive visualization and ray tracing application that utilizes Intel Ospray as its core rendering engine. Users can create scene graphs to render complex scenes with high-fidelity or very large scenes requiring supercomputing resources.
- vistle - Vistle is a modular data-parallel visualization system. It requires a C++14 compatible compiler that supports ISO/IEC 14882:2014, alongside compiling requirements of Boost, CMake and MPI. Additionally, it supports Covise, OpenCover, OpenSceneGraph and Qt 5 libraries, and also provides support code, rendering libraries, controlling code for Vistle session and visualization algorithm modules.
- Accelerating 3D Gaussian Splatting Rendering through Level-of-Detail Structure - The 3D Gaussian Splatting method for 3D environment reconstruction from images brought significant advancements to photorealistic novel-view synthesis. It combines the advantages of primitive-based rendering with a differentiable renderer, thus obtaining state-of-the-art image quality and surpassing neural methods for scene representation in optimization and rendering speed.
- Hyperspectral imaging parallelization - Hyperspectral imaging parallelization with different programming models such as OpenMP, SYCL or Kokkos.
- A DPC++ Backend for the OCCA Portability Framework - OCCA—an open source portable and vendor neutral framework for parallel programming on heterogeneous platforms—is used by mission critical computational science and engineering applications of public and private sector organizations including the U.S. Department of Energy and Shell.
- NovelRT - NovelRT is a cross-platform game engine for visual novels and 2D games. It is still in the early alpha stage, but currently supports graphics and audio.
- S3_DeformFDM - The S3 Slicer is a framework for achieving support-free strength reinforcement and surface quality in multi-axis 3D printing by computing the rotation-driven deformation for the input model.
- SYCL-samples - A collection of samples written using the SYCL standard for C++.
- FIRESTARTER- FIRESTARTER, a processer stress test utility, maximizes the energy consumption of 64-Bit x86 processors by generating heavy load on the execution units as well as transferring data between the cores and multiple levels of the memory hierarchy.
- 1D Heat Transfer Simulation - (C++ based, from Intel) This 1D-Heat-Transfer sample is an application that simulates the heat propagation on a one-dimensional isotropic and homogeneous medium. The code sample includes both parallel and serial calculations of heat propagation.
- 3D Wave Simulation - (C++ based, from Intel) The ISO3DFD sample refers to Three-Dimensional Finite-Difference Wave Propagation in Isotropic Media; it is a three-dimensional stencil to simulate a wave propagating in a 3D isotropic medium. Starts with a simple serial implementation and shows how to use SYCL to offload to the GPU. Then shows how to optimize.
- ACTS GPU Ramp - Demonstrator tracking chain on accelerators
- arpack-ng - Arpack ng is a collection of Fortran77 subroutines designed to solve large scale eigenvalue problems and is a community project maintained by volunteers.
- Amber Amber is a high-performance molecular dynamics (MD) code used by thousands of scientists in academia, national labs, and industry for computational drug discovery and related research.
- ATLAS Charged Particle Seed Finding with DPC++ - The ATLAS Experiment is one of the general-purpose particle physics experiments built at the Large Hadron Collider (LHC) at CERN in Geneva. Its goal is to study the behavior of elementary particles at the highest energies ever produced in a laboratory help us better understand universe.
- bfs-sycl-fpga - The Breadth-First Search algorithm implementations memoryBFS and streamingBFS using Intel oneAPI (SYCL2020) on Intel FPGAs
- dedekind-MKL - Selected BLAS and LAPACK Java bindings for Intel's oneAPI Math Kernel Library (oneMKL) on Windows and Linux.
- Discrete Cosine Transform Imeage Compression - (C++ based, from Intel) The Discrete Cosine Transform (DCT) sample demonstrates how DCT and Quantizing stages can be implemented to run faster using SYCL* by offloading image processing work to a GPU or other device.
- GinkgoOneAPI - In this project we want to explore the potential of having an Intel OneAPI backend for the Gingko software package: https://ginkgo-project.github.io/
- GROMACS A free and open-source software suite for high-performance molecular dynamics and output analysis.
- repulsive-surfaces - A numerical framework for optimization of surface geometry while avoiding (self-)collision.
- Grid - Data parallel C++ mathematical object library.
- gtensor - gtensor is a multi-dimensional array C++14 header-only library for hybrid GPU development. It was inspired by xtensor, and designed to support the GPU port of the GENE fusion code.
- Jacobi Iterative Solver for Multi-GPU - (C++ based, from Intel) Illustrates how to use the Jacobi Iterative method to solve linear equations. This sample starts with a CPU-oriented application and shows how to use SYCL to offload regions of the code to a GPU. The sample walks through developing an optimization strategy by iteratively optimizing the code and ultimately targetting multi-GPUs if available.
- LAMMPS - LAMMPS is a classical molecular dynamics simulation code designed to run efficiently on parallel computers. It was developed at Sandia National Laboratories, a US Department of Energy facility, with funding from the DOE. It is an open-source code, distributed freely under the terms of the GNU Public License (GPL) version 2.
- mapmap_cpu - MapMap CPU is a massively parallel generic MRF map solver with minimal input assumptions, capable of solving a large class of MRF problems.
- MF-LBM - This is a lattice Boltzmann code designed for direct numerical simulation of flow in porous media. It is written in Fortran 90 and optimized for vectorization and parallel programming. code to SYCL.
- Monte Carlo Based Finanical Simulation for Multi-GPU - (C++ based, from Intel) Evaluates fair call price for a given set of European options using the Monte Carlo approach. MonteCarlo simulation is one of the most important algorithms in quantitative finance. This sample uses a single CPU Thread to control multiple GPUs. Shows how to migrate CUDA based code to SYCL.
- mt-kahypar - MT-KaHyPar is a multi-threaded algorithm for partitioning graphs and hypergraphs. It aims to minimize an objective function defined on the hyperedges while balancing block sizes and optimizing connectivity. It can partition extremely large graphs and hypergraphs with comparable solution quality to the best sequential graph partitioners while being more than an order of magnitude faster with only ten threads.
- NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems.
- NWGraph - The Northwest Graph Library (NWGraph) is a high-performance header-only generic C++ graph library based on C++20 concepts and ranges. It includes multiple graph algorithms for well-known graph kernels and supporting data structures.
- Odd Even Merge and Sorting - (C++ based, from Intel) Demonstrates how to use the odd-even mergesort algorithm (also known as "Batcher's odd–even mergesort") which may benefit whenn working with batches of short-sized to mid-sized (key, value) array pairs. Shows how to migrate CUDA based code to SYCL.
- Optical Flow Method - (C++ based, from Intel) The HSOpticalFlow sample is a computation of per-pixel motion estimation between two consecutive image frames caused by movement of object or camera. Shows how to migrate CUDA based code to SYCL.
- portBLAS - An implementation of BLAS using the SYCL open standard.
- PyPardisoProject - Pypardiso is a Python package for solving large sparse linear systems of equations using the Intel oneAPI Math Kernel Library Pardiso solver. It provides the same functionality as Scipy's spsolve but is faster in many cases.
- qmckl_sycl - SYCL GPU port of the QMCkl: Quantum Monte Carlo Kernel Library.
- repulsive-surfaces - A numerical framework for optimization of surface geometry while avoiding (self-)collision.
- SPHinxXsys - SPHinXsys provides C++ APIs for physically accurate simulation and optimization. It aims to handle coupled industrial dynamic systems including fluid, solid, multi-body dynamics and beyond. The multi-physics library is based a unique and unified computational framework by which strong couplings have been achieved for all involved physics.
- suanPan - suanPan is a finite element method (FEM) simulation platform for applications in fields such as solid mechanics and civil/structural/seismic engineering. The name suanPan (in some places such as suffix it is also abbreviated as suPan) comes from the term Suan Pan (算盤), which is Chinese abacus.
- sycl-collision-sim - Demo 3D simulation of rigid body physics with different shapes bouncing off each other confined in a box. Two implementations are provided, one sequential with standard C++ code compiled for CPU, and parallel SYCL implementation which can be compiled for any target device (e.g. a GPU) supported by a SYCL compiler.
- PW-DFT - Plane-Wave density-functional theory (DFT) development for NWChemEx electronic structure software. An easy way to generate input decks, check your output decks against a large database of calculations, perform simple thermochemistry calculations, calculate the NMR and IR spectra of modest size molecule using NWChem.
- xpm - xpm (Extensive Pore Modelling) is a software for predicting flow properties in multi-scale porous media. It uses a pore network model derived from image data, specifically using Pnextract to extract this network.
- stan-dev - The Stan Math Library is a C++, reverse-mode automatic differentiation library designed to be usable, extensive and extensible, efficient, scalable, stable, portable, and redistributable in order to facilitate the construction and utilization of algorithms that utilize derivatives.
- COGENT - COGENT is a continuum (Eulerian) plasma simulation code. It is primarily focused on tokamak edge plasma geometries, but includes options for, and is extensible to, other configurations. This repository contains the COGENT code (COGENT/) as well as Chombo (Chombo/), the adaptive mesh refinement application framework from Lawrence Berkeley National Laboratory upon which COGENT is built.
- Trilinos - The Trilinos Project is an effort to develop algorithms and enabling technologies within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems. A unique design feature of Trilinos is its focus on packages.
- LAPACK - The Linear Algebra PACKage (LAPACK) is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares problems, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky, etc. LAPACK was originally written in FORTRAN 77, and moved to Fortran 90 in version 3.2 (2008). LAPACK provides routines for handling both real and complex matrices in both single and double precision.
- elpa - The computation of selected or all eigenvalues and eigenvectors of a symmetric (Hermitian) matrix has high relevance for various scientific disciplines. For the calculation of a significant part of the eigensystem typically direct eigensolvers are used.The ELPA project was initiated with the aim to develop and implement an efficient eigenvalue solver for petaflop applications.
- esys-escript - esys-escript is a module for implementing mathematical models in Python using the finite element method (FEM). As users do not access the underlying data structures it is very easy to use and scripts can run on desktop computers as well as massive parallel supercomputers without changes. Application areas for esys-escript include geophysical inversion, earthquakes, porous media flow, reactive transport, plate subduction, erosion, earth mantle convection, and tsunamis.
- code_saturne - The basic capabilities of code_saturne enable the handling of either incompressible or expandable flows with or without heat transfer and turbulence. Dedicated modules are available for specific physics such as radiative heat transfer, combustion (gas, coal, heavy fuel oil, ...), magneto-hydrodynamics, compressible flows, two-phase flows (Euler-Lagrange approach with two-way coupling), or atmospheric flows.
- TASMANIAN - The Toolkit for Adaptive Stochastic Modeling and Non-Intrusive ApproximatioN is a collection of robust libraries for high dimensional integration and interpolation as well as parameter calibration.
- COSMA - COSMA is a parallel, high-performance, GPU-accelerated, matrix-matrix mutliplication algorithm that is communication-optimal for all combinations of matrix dimensions, number of processors and memory sizes, without the need for any parameter tuning.
- Apache MXNet - Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, scalable to many GPUs and machines.
- HeavyDB - HeavyDB is an open source SQL-based, relational, columnar database engine that leverages the full performance and parallelism of modern hardware (both CPUs and GPUs) to enable querying of multi-billion row datasets in milliseconds, without the need for indexing, pre-aggregation, or downsampling
- MFLib - A Matched Filtering Library. In principle, the algorithm is quite simple in that it computes a Pearson correlation coefficient at every sample in a time series corresponding to a template. However, the actual implementation in a compiled language is tedious.
- RMGDFT - RMG is an Open Source code for electronic structure calculations and modeling of materials and molecules. It is based on density functional theory and uses a real space basis and pseudopotentials.
- ITPP - IT++ is a C++ library of mathematical, signal processing and communication classes and functions. Its main use is in simulation of communication systems and for performing research in the area of communications. The kernel of the library consists of generic vector and matrix classes, and a set of accompanying routines. Such a kernel makes IT++ similar to MATLAB or GNU Octave.
- WarpX - WarpX is an advanced electromagnetic & electrostatic Particle-In-Cell code. It supports many features including Perfectly-Matched Layers (PML), mesh refinement, and the boosted-frame technique.
- Highly Efficient FFT for Exascale - The Highly Efficient FFT for Exascale (HeFFTe) library is being developed as part of the Exascale Computing Project (ECP), which is a joint project of the U.S. Department of Energy's Office of Science and National Nuclear Security Administration (NNSA). HeFFTe delivers algorithms for distributed fast-Fourier transforms in on a heterogeneous systems, targeting the upcoming exascale machines.
- Lightwave Explorer - Lightwave explorer is an open source nonlinear optics simulator, intended to be fast, visual, and flexible for students and researchers to play with ultrashort laser pulses and nonlinear optics without having to buy a laser first.
- NESO (Neptune Exploratory SOftware) - This is a work-in-progress respository for exploring the implementation of a series of tokamak exhaust relevant models combining high order finite elements with particles, written in C++ and SYCL.
- Ewald-Splitting-with-Prolates - This fork includes custom modifications for the ESP (Ewald summation with prolate spheroidal wave functions) method.
- CUTLASS - CUTLASS is a collection of CUDA C++ template abstractions for implementing high-performance matrix-matrix multiplication (GEMM) and related computations at all levels and scales within CUDA. It incorporates strategies for hierarchical decomposition and data movement similar to those used to implement cuBLAS and cuDNN.
- ArrayFire - oneAPI Backend - ArrayFire is a general-purpose tensor library that simplifies the process of software development for the parallel architectures found in CPUs GPUs and other hardware acceleration devices. This project is to develop a oneAPI backend to the library which currently supports CUDA OpenCL and x86.
- chip-spv - The "chip spv" project allows for the portability of HIP and CUDA applications to platforms supporting SPIR-V. Currently, it offers support for OpenCL and Level-Zero as low-level runtime alternatives. Selected BLAS and LAPACK Java bindings for Intel's oneAPI Math Kernel Library on Windows and Linux
- dedekind-MKL - Selected BLAS and LAPACK Java bindings for Intel's oneAPI Math Kernel Library (oneMKL) on Windows and Linux.
- dpctl - Python SYCL bindings and SYCL-based Python Array API library.
- formulog** - Formulog is a logic programming language that supports Datalog, SMT queries, and first-order functional programming. It requires JRE 11 and a supported SMT solver, such as Z3, Boolector, CVC4, or Yices.
- HeCBench - The hecbench repository contains a collection of benchmarks for studying performance portability and productivity with various heterogeneous computing languages.The benchmarks are divided into categories like computer vision, bioinformatics, and finance.
- HPCToolKit - HPCToolkit is an open-source performance tool that is in some respects similar to VTune� though it also works on Power and ARM architectures. It also works on NVIDIA and AMD GPUs. Our aim is to also use it for performance analysis of Intel GPUs with Intel’s OpenCL to our targets as a prelude to A0
- kharma - Kokkos-based High-Accuracy Relativistic Magnetohydrodynamics with AMR. KHARMA is an implementation of the HARM scheme for gerneral relativistic magnetohydrodynamics (GRMHD) in C++. It is based on the Parthenon AMR infrastructure, using Kokkos for parallelism and GPU support.
- Kokkos - Kokkos Core implements a programming model in C++ for writing performance portable applications targeting all major HPC platforms. For that purpose it provides abstractions for both parallel execution of code and data management. Kokkos is designed to target complex node architectures with N-level memory hierarchies and multiple types of execution resources. It currently can use CUDA, HIP, SYCL, HPX, OpenMP and C++ threads as backend programming models with several other backends in development.
- levelzero-jni - Intel LevelZero JNI library for TornadoVM. This project is a Java Native Interface (JNI) binding for Intel's Level Zero. This library is as designed to be as closed as possible to the LevelZero API for C++.
- libxsmm - LIBXSMM is a library for specialized dense and sparse matrix operations as well as for deep learning primitives such as small convolutions.
- mixbench - A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
- numba-dpex - Numba dpex is an extension for the Numba Python JIT compiler that provides a kernel programming API and an offload feature. It supports devices including Intel CPUs, integrated GPUs, and discrete GPUs.
- oneapi-asp - Intel® oneAPI Accelerator Support Package (ASP) for Open FPGA Stack (OFS)
- oneapi-containers - The Intel OneAPI Containers simplify programming by delivering the tools to deploy applications and solutions on various architectures. These containers allow developers to set up and distribute environments for profiling and execute applications built with OneAPI toolkits.
- oneAPI.jl - The oneapi.jl GitHub project provides support for working with the oneapi unified programming model and offers low-level wrappers for the level zero library, kernel programming, and high-level array programming capabilities.
- Open-source Scientific Applications and Benchmarks - This repository contains a collection of data-parallel programs for evaluating oneAPI direct programming. Each program is written with CUDA, SYCL, and OpenMP target offloading. Intel® DPC++ Compatibility Tool (DPCT) can convert a CUDA program to a SYCL program.
- p2rng - A modern header-only C++ library for parallel algorithmic (pseudo) random number generation supporting OpenMP, CUDA, ROCm and oneAPI.
- PTXprofiler - A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
- RayBNN_Raytrace - Ray tracing library using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
- RcppParallel - The rcppparallel project provides high-level functions for parallel programming with Rcpp and supports using Intel TBB for performance on Windows, macOS, and Linux systems.
- Spyker - High-performance Spiking Neural Networks Library Written From Scratch with C++ and Python Interfaces.
- SYCLomatic - The SycloMatic project helps developers migrate code to the SYCL heterogeneous programming model. Daily builds are available, but not rigorously tested for production quality control.
- SYnergy - Energy Measurement and Frequency Scaling for SYCL applications.
- SYCLops - A SYCL-specific LLVM-to-MLIR converter.
- TAU Performance System - The TAU Performance System® supports profiling and tracing of programs written using the Intel OneAPI. Intel OneAPI provides two interfaces for programming - OpenCL and DPC++/SYCL for CPUs and GPUs. TAU supports both - the OpenCL profiling interface and Intel Level Zero API to observe performance.
- TornadoVM - TornadoVM is an open-source software technology that automatically accelerates Java programs on multi-core CPUs GPUs and FPGAs.
- toyBrot - toyBrot is a raymarching fractal generator that is used both as a simple benchmarking tool and a study tool for parallelisation. The code is is implemented with over 10 different technologies including Intel TBB� ISPC and SYCL (with support for oneAPI)
- XFluids - a unified cross-architecture heterogeneous CFD solver that suports Nvidia, Amd and Intel GPUs.
- Hypre - Parallel solvers for sparse linear systems featuring multigrid methods.
- compadre - The Compadre Toolkit provides a performance portable solution for the parallel evaluation of computationally dense kernels. The toolkit specifically targets the Generalized Moving Least Squares (GMLS) approach, which requires the inversion of small dense matrices. The result is a set of weights that provide the information needed for remap or entries that constitute the rows of some globally sparse matrix.
- TESSE - This is the C++ API for the Template Task Graph (TTG) programming model for flowgraph-based composition of high-performance algorithms executable on distributed heterogeneous computer platforms. The TTG API abstracts out the details of the underlying task and data flow runtime; the current realization is implemented using MADNESS and PaRSEC runtimes as backends.
- DistributedFFT - A Highly Efficient GPU Framework for Fast Fourier Transform across Mixed Nodes. This library is being developed to support highly efficient distributed FFT computation on multicore CPU and GPU architectures.
- HPAT - High Performance Analytics Toolkit (HPAT) scales analytics/ML codes in Python to bare-metal cluster/cloud performance automatically. It compiles a subset of Python (Pandas/Numpy) to efficient parallel binaries with MPI, requiring only minimal code changes
- XJoin - Portable, parallel hash join across diverse XPU architectures with oneAPI. XJoin implements the hash join database operator. It is the DPC++ implementation of the hash join operator originally written in CUDA.
- Quinoa - Quinoa is a set of computational tools that enables research and numerical analysis in fluid dynamics. Using the Charm++ runtime system, it employs asynchronous (or non-blocking) parallel programming and decompose computational problems into a large number of work units (that may be more than the available number of processors) enabling arbitrary overlap of parallel computation, communication, input, and output.
- Altis-SYCL - Altis-SYCL is a SYCL-based implementation of the Altis GPGPU benchmark suite (originally written in CUDA) for CPUs, GPUs, and FPGAs.
- OneAPI Crystal - This library is the DPC++ porting (and extension) of the Crystal library, originally written in CUDA. It Implements a collection of block-wide device functions that can be used to implement high performance implementations of SQL queries on CPUs and GPUs.
- data-parallel-CPP - The Data Parallel C Book Source Samples repository contains code that accompanies the Data Parallel C: Mastering DPC for Programming of Heterogeneous Systems using C++ and SYCL book.
- Jurassic - Hunting Dinosaur bones using AI
- syclacademy - SYCL Academy, a set of learning materials for SYCL heterogeneous programming
- Molecular Dynamics tutorial using GROMACS and PACKMOL - A tutorial step by step to run a classic Molecular Dynamics Simulation of a protein using GROMACS and building the ions box with PACKMOL.
- saxpy - Parallel saxpy done in several ways as an exercise for understanding the differences between CUDA and SYCL.