Google Summer of Code is an initiative to support students to learn about and contribute to open-source software projects, while getting payed. The R community proposed a project on the matrixStats package and, as a student, I am interested in working on this project.
I completed all the tasks proposed on Skill Tests.
- 1. Easy: Installing R packages with C code
- 2. Easy: Git and R package testing
- 3. Easy: Prototyping in R
- 4. Medium: Simple support for name attributes
- 5. Medium: A related, slightly different case
- 6. Medium/Hard: Implement in C code
- 7. Hard: Begin to work on the project.
-
Move naming support down to C-code: functions that return a matrix: AngelPn/matrixStats.
-
The package passes
R CMD check
with all OKs.
The matrixStats package provides highly optimized functions for
computing common summaries over rows and columns of matrices,
e.g. rowQuantiles()
. There are also functions that operate on vectors,
e.g. logSumExp()
. Their implementations strive to minimize both memory
usage and processing time. They are often remarkably faster compared
to good old apply()
solutions. The calculations are mostly implemented
in C, which allow us to optimize beyond what is possible to do in
plain R. The package installs out-of-the-box on all common operating
systems, including Linux, macOS and Windows.
With a matrix
> x <- matrix(rnorm(20 * 500), nrow = 20, ncol = 500)
it is many times faster to calculate medians column by column using
> mu <- matrixStats::colMedians(x)
than using
> mu <- apply(x, MARGIN = 2, FUN = median)
Moreover, if performing calculations on a subset of rows and/or columns, using
> mu <- colMedians(x, rows = 33:158, cols = 1001:3000)
is much faster and more memory efficient than
> mu <- apply(x[33:158, 1001:3000], MARGIN = 2, FUN = median)
For formal benchmarking of matrixStats functions relative to alternatives, see the Benchmark reports.
R package matrixStats is available on CRAN and can be installed in R as:
install.packages("matrixStats")
To install the pre-release version that is available in Git branch develop
on GitHub, use:
remotes::install_github("//github.com/AngelPn/matrixStats", ref="develop")
This will install the package from source. Because of this and because this package also compiles native code, Windows users need to have Rtools installed and macOS users need to have Xcode installed.
To contribute to this package, please see CONTRIBUTING.md.