This repository contains functions to aggregate statistics for groups of cells or sets of genes from a gene-by-cell matrix of expression values. It was primarily developed for computing pseudo-bulk expression profiles for clusters of cells, which can then be used for differential expression analysis with packages like edgeR. The code itself was originally derived from the scran R package, factored out into a separate C++ library for easier re-use.
Given a tatami::Matrix
and an array of group assignments,
the aggregate_across_cells()
function will compute the aggregate statistics across all genes for each group.
#include "scran_aggregate/scran_aggregate.hpp"
const tatami::Matrix<double, int>& mat = some_data_source();
// Array of groupings should contain integer assignments to groups 0, 1, 2, etc.
std::vector<int> groupings = some_groupings();
scran_aggregate::AggregateAcrossCellsOptions opt;
auto res = scran_aggregate::aggregate_across_cells(mat, groupings.data(), opt);
res.sums; // vector of vectors of per-group sums across genes.
res.sums[0]; // vector of sums for the first group across genes.
res.detected; // vector of vectors of the number of detected cells per gene.
We can also use the aggregate_across_genes()
function to sum expression values across gene sets, e.g., to compute the activity of a gene signature.
This can be done with any number of gene sets, possibly with a different weight for each gene in each set.
std::vector<std::tuple<size_t, const int*, const double*> > gene_sets;
std::vector<int> set1 { 0, 5, 10, 20 };
gene_sets.emplace_back(set1.size(), set1.data(), static_cast<double*>(NULL)); // no weight
std::vector<int> set2 { 0, 2, 4, 6, 8, 10 };
std::vector<double> weight2 { 0.1, 0.3, 0.3, 0.2, 0.1, 0.05 };
gene_sets.emplace_back(set2.size(), set2.data(), weight2.data()); // weighted
scran_aggregate::AggregateAcrossGenesOptions g_opt;
auto g_res = scran_aggregate::aggregate_across_genes(
mat,
gene_sets,
g_opt
);
g_res.sum[0]; // vector of sums for set 1 in each cell.
Check out the reference documentation for more details.
If you're using CMake, you just need to add something like this to your CMakeLists.txt
:
include(FetchContent)
FetchContent_Declare(
scran_aggregate
GIT_REPOSITORY https://github.com/libscran/scran_aggregate
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(scran_aggregate)
Then you can link to scran_aggregate to make the headers available during compilation:
# For executables:
target_link_libraries(myexe libscran::scran_aggregate)
# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_aggregate)
find_package(libscran_scran_aggregate CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_aggregate)
To install the library, use:
mkdir build && cd build
cmake .. -DSCRAN_AGGREGATE_TESTS=OFF
cmake --build . --target install
By default, this will use FetchContent
to fetch all external dependencies.
If you want to install them manually, use -DSCRAN_AGGREGATE_FETCH_EXTERN=OFF
.
See the tags in extern/CMakeLists.txt
to find compatible versions of each dependency.
If you're not using CMake, the simple approach is to just copy the files in include/
- either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I
.
This also requires the external dependencies listed in extern/CMakeLists.txt
.