diff --git a/README.md b/README.md index de2d11372..9b19937c8 100644 --- a/README.md +++ b/README.md @@ -17,24 +17,28 @@ The aim is to make the API as model-agnostic as possible while still being user- ## Examples ```julia - X = reshape(collect(range(-3.0,3.0,length=100)),:,1) - # Set simple scaling of the data - k₁ = SqExponentialKernel() - K₁ = kernelmatrix(k₁,X,obsdim=1) +x = range(-3.0, 3.0; length=100) - # Set a function transformation on the data - k₂ = TransformedKernel(Matern32Kernel(),FunctionTransform(x->sin.(x))) - K₂ = kernelmatrix(k₂,X,obsdim=1) +# A simple standardised squared-exponential / exponentiated-quadratic kernel. +k₁ = SqExponentialKernel() +K₁ = kernelmatrix(k₁, x) - # Set a matrix premultiplication on the data - k₃ = transform(PolynomialKernel(c=2.0,d=2.0),LowRankTransform(randn(4,1))) - K₃ = kernelmatrix(k₃,X,obsdim=1) +# Set a function transformation on the data +k₂ = Matern32Kernel() ∘ FunctionTransform(sin) +K₂ = kernelmatrix(k₂, x) - # Add and sum kernels - k₄ = 0.5*SqExponentialKernel()*LinearKernel(c=0.5) + 0.4*k₂ - K₄ = kernelmatrix(k₄,X,obsdim=1) +# Set a matrix premultiplication on the data +k₃ = PolynomialKernel(; c=2.0, degree=2) ∘ LinearTransform(randn(4, 1)) +K₃ = kernelmatrix(k₃, x) - plot(heatmap.([K₁,K₂,K₃,K₄],yflip=true,colorbar=false)...,layout=(2,2),title=["K₁" "K₂" "K₃" "K₄"]) +# Add and sum kernels +k₄ = 0.5 * SqExponentialKernel() * LinearKernel(; c=0.5) + 0.4 * k₂ +K₄ = kernelmatrix(k₄, x) + +plot( + heatmap.([K₁, K₂, K₃, K₄]; yflip=true, colorbar=false)...; + layout=(2, 2), title=["K₁" "K₂" "K₃" "K₄"], +) ```
@@ -43,10 +47,9 @@ The aim is to make the API as model-agnostic as possible while still being user-
## Packages goals (by priority)
- Ensure AD Compatibility (already the case for Zygote, ForwardDiff)
- Toeplitz Matrices compatibility
-- BLAS backend
Directly inspired by the [MLKernels](https://github.com/trthatcher/MLKernels.jl) package.
## Issues/Contributing
-If you notice a problem or would like to contribute by adding more kernel functions or features please [submit an issue](https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/issues).
+If you notice a problem or would like to contribute by adding more kernel functions or features please [submit an issue](https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/issues), or open a PR (please see the [ColPrac](https://github.com/SciML/ColPrac) contribution guidelines).
diff --git a/docs/make.jl b/docs/make.jl
index 2949cb7e1..c2c14f8f9 100644
--- a/docs/make.jl
+++ b/docs/make.jl
@@ -33,9 +33,15 @@ makedocs(;
"create_kernel.md",
"API" => "api.md",
"Examples" => "example.md",
+ "Design" => "design.md",
],
strict=true,
checkdocs=:exports,
+ doctestfilters=[
+ r"{([a-zA-Z0-9]+,\s?)+[a-zA-Z0-9]+}",
+ r"(Array{[a-zA-Z0-9]+,\s?1}|Vector{[a-zA-Z0-9]+})",
+ r"(Array{[a-zA-Z0-9]+,\s?2}|Matrix{[a-zA-Z0-9]+})",
+ ],
)
deploydocs(;
diff --git a/docs/src/api.md b/docs/src/api.md
index e2579ff5c..2891f2aa7 100644
--- a/docs/src/api.md
+++ b/docs/src/api.md
@@ -1,38 +1,78 @@
# API Library
----
-```@contents
-Pages = ["api.md"]
-```
-
```@meta
CurrentModule = KernelFunctions
```
## Functions
+The KernelFunctions API comprises the following four functions.
```@docs
kernelmatrix
kernelmatrix!
kernelmatrix_diag
kernelmatrix_diag!
-kernelpdmat
-nystrom
```
-## Utilities
+## Input Types
+
+The above API operates on collections of inputs.
+All collections of inputs in KernelFunctions.jl are represented as `AbstractVector`s.
+To understand this choice, please see the [design notes on collections of inputs](@ref why_abstract_vectors).
+The length of any such `AbstractVector` is equal to the number of inputs in the collection.
+For example, this means that
+```julia
+size(kernelmatrix(k, x)) == (length(x), length(x))
+```
+is always true, for some `Kernel` `k`, and `AbstractVector` `x`.
+
+### Univariate Inputs
+
+If each input to your kernel is `Real`-valued, then any `AbstractVector{<:Real}` is a valid
+representation for a collection of inputs.
+More generally, it's completely fine to represent a collection of inputs of type `T` as, for
+example, a `Vector{T}`.
+However, this may not be the most efficient way to represent collection of inputs.
+See [Vector-Valued Inputs](@ref) for an example.
+
+### Vector-Valued Inputs
+
+We recommend that collections of vector-valued inputs are stored in an
+`AbstractMatrix{<:Real}` when possible, and wrapped inside a `ColVecs` or `RowVecs` to make
+their interpretation clear:
```@docs
ColVecs
RowVecs
+```
+These types are specialised upon to ensure good performance e.g. when computing Euclidean distances between pairs of elements.
+The benefit of using this representation, rather than using a `Vector{Vector{<:Real}}`, is that
+optimised matrix-matrix multiplication functionality can be utilised when computing
+pairwise distances between inputs, which are needed for `kernelmatrix` computation.
+
+### Inputs for Multiple Outputs
+
+KernelFunctions.jl views multi-output GPs as GPs on an extended input domain.
+For an explanation of this design choice, see [the design notes on multi-output GPs](@ref inputs_for_multiple_outputs).
+
+An input to a multi-output `Kernel` should be a `Tuple{T, Int}`, whose first element specifies a location in the domain of the multi-output GP, and whose second element specifies which output the inputs corresponds to.
+The type of collections of inputs for multi-output GPs is therefore `AbstractVector{<:Tuple{T, Int}}`.
+
+KernelFunctions.jl provides the following type or situations in which all outputs are observed all of the time:
+```@docs
MOInput
-NystromFact
```
+As with [`ColVecs`](@ref) and [`RowVecs`](@ref) for vector-valued input spaces, this
+type enables specialised implementations of e.g. [`kernelmatrix`](@ref) for
+[`MOInput`](@ref)s in some situations.
-## Index
+To find out more about the background, read this [review of kernels for vector-valued functions](https://arxiv.org/pdf/1106.6251.pdf).
-```@index
-Pages = ["api.md"]
-Module = ["KernelFunctions"]
-Order = [:type, :function]
+## Utilities
+
+KernelFunctions also provides miscellaneous utility functions.
+```@docs
+kernelpdmat
+nystrom
+NystromFact
```
diff --git a/docs/src/assets/heatmap_combination.png b/docs/src/assets/heatmap_combination.png
index 4a0daf139..06d7a8590 100644
Binary files a/docs/src/assets/heatmap_combination.png and b/docs/src/assets/heatmap_combination.png differ
diff --git a/docs/src/design.md b/docs/src/design.md
new file mode 100644
index 000000000..f7faba3c0
--- /dev/null
+++ b/docs/src/design.md
@@ -0,0 +1,172 @@
+# Design
+
+## [Why AbstractVectors Everywhere?](@id why_abstract_vectors)
+
+To understand the advantages of using `AbstractVector`s everywhere to represent collections of inputs, first consider the following properties that it is desirable for a collection of inputs to satisfy.
+
+#### Unique Ordering
+
+There must be a clearly-defined first, second, etc element of an input collection.
+If this were not the case, it would not be possible to determine a unique mapping between a collection of inputs and the output of `kernelmatrix`, as it would not be clear what order the rows and columns of the output should appear in.
+
+Moreover, ordering guarantees that if you permute the collection of inputs, the ordering of the rows and columns of the `kernelmatrix` are correspondingly permuted.
+
+#### Generality
+
+There must be no restriction on the domain of the input.
+Collections of `Real`s, vectors, graphs, finite-dimensional domains, or really anything else that you fancy should be straightforwardly representable.
+Moreover, whichever input class is chosen should not prevent optimal performance from being obtained.
+
+#### Unambiguously-Defined Length
+
+Knowing the length of a collection of inputs is important.
+For example, a well-defined length guarantees that the size of the output of `kernelmatrix`,
+and related functions, are predictable.
+It also makes it possible to perform internal error-checking that ensures that e.g. there
+are the same number of inputs in two collections of inputs.
+
+
+
+### AbstractMatrices Do Not Cut It
+
+Notably, while `AbstractMatrix` objects are often used to represent collections of vector-valued
+inputs, they do _not_ immediately satisfy these properties as it is unclear whether a matrix
+of size `P x Q` represents a collection of `P` `Q`-dimensional inputs (each row is an
+input), or `Q` `P`-dimensional inputs (each column is an input).
+
+Moreover, they occassionally add some aesthetic inconvenience.
+For example, a collection of `Real`-valued inputs, which might be straightforwardly
+represented as an `AbstractVector{<:Real}`, must be reshaped into a matrix.
+
+There are two commonly used ways to partly resolve these shortcomings:
+
+#### Resolution 1: Specify a Convention
+
+One way that these shortcomings can be partly resolved is by specifying a convention that
+everyone adheres to regarding the interpretation of rows vs columns.
+However, opinions about the choice of convention are often surprisingly strongly held, and
+users regularly have to remind themselves _which_ convention has been chosen.
+While this resolves the ordering problem, and in principle defines the "length" of a
+collection of inputs, `AbstractMatrix`s already have a `length` defined in Julia, which
+would generally disagree with our internal notion of `length`.
+This isn't a show-stopper, but it isn't an especially clean situation.
+
+There is also the opportunity for some kinds of silent bugs.
+For example, if an input matrix happens to be square because the number of input dimensions
+is the same as the number of inputs, it would be hard to know whether the correct
+`kernelmatrix` has been computed.
+This kind of bug seems unlikely, but it exists regardless.
+
+Finally, suppose that your inputs are some type `T` that is not simply a vector of real
+numbers, say a graph.
+In this situation, how should a collection of inputs be represented?
+A `N x 1` or `1 x N` matrix is the only obvious candidate, but the additional singular
+dimension seems somewhat redundant.
+
+#### Resolution 2: Always Specify An `obsdim` Argument
+
+Another way to partly resolve these problems is to not commit to a convention, and instead
+to propagate some additional information through the codebase that specifies how the input
+data is to be interpretted.
+For example, a kernel `k` that represents the sum of two other kernels might implement
+`kernelmatrix` as follows:
+```julia
+function kernelmatrix(k::KernelSum, x::AbstractMatrix; obsdim=1)
+ return kernelmatrix(k.kernels[1], x; obsdim=obsdim) +
+ kernelmatrix(k.kernels[2], x; obsdim=obsdim)
+end
+```
+While this prevents this package from having to pre-specify a convention, it doesn't resolve
+the `length` issue, or the issue of representing collections of inputs which aren't
+immediately represented as vectors.
+Moreover, it complicates the internals; in contrast, consider what this function looks like
+with an `AbstractVector`:
+```julia
+function kernelmatrix(k::KernelSum, x::AbstractVector)
+ return kernelmatrix(k.kernels[1], x) + kernelmatrix(k.kernels[2], x)
+end
+```
+This code is clearer (less visual noise), and has removed a possible bug -- if the
+implementer of `kernelmatrix` forgets to pass the `obsdim` kwarg into each subsequent
+`kernelmatrix` call, it's possible to get the wrong answer.
+
+This being said, we do support matrix-valued inputs -- see
+[Why We Have Support for Both](@ref).
+
+
+### AbstractVectors
+
+Requiring all collections of inputs to be `AbstractVector`s resolves all of these problems,
+and ensures that the data is self-describing to the extent that KernelFunctions.jl requires.
+
+Firstly, the question of how to interpret the columns and rows of a matrix of inputs is
+resolved.
+Users _must_ wrap matrices which represent collections of inputs in either a `ColVecs` or
+`RowVecs`, both of which have clearly defined semantics which are hard to confuse.
+
+By design, there is also no discrepancy between the number of inputs in the collection, and
+the `length` function -- the `length` of a `ColVecs`, `RowVecs`, or `Vector{<:Real}` is
+equal to the number of inputs.
+
+There is no loss of performance.
+
+A collection of `N` `Real`-valued inputs can be represented by an
+`AbstractVector{<:Real}` of `length` `N`, rather than needing to use an
+`AbstractMatrix{<:Real}` of size either `N x 1` or `1 x N`.
+The same can be said for any other input type `T`, and new subtypes of `AbstractVector` can
+be added if particularly efficient ways exist to store collections of inputs of type `T`.
+A good example of this in practice is using `Tuple{S, Int}`, for some input type `S`, as the
+[Inputs for Multiple Outputs](@ref).
+
+This approach can also lead to clearer user code.
+A user need only wrap their inputs in a `ColVecs` or `RowVecs` once in their code, and this
+specification is automatically re-used _everywhere_ in their code.
+In this sense, it is straightforward to write code in such a way that there is one unique
+source of "truth" about the way in which a particular data set should be interpreted.
+Conversely, the `obsdim` resolution requires that the `obsdim` keyword argument is passed
+around with the data _every_ _single_ _time_ that you use it.
+
+The benefits of the `AbstractVector` approach are likely most strongly felt when writing a substantial amount of code on top of KernelFunctions.jl -- in the same way that using
+`AbstractVector`s inside KernelFunctions.jl removes the need for large amounts of keyword
+argument propagation, the same will be true of other code.
+
+
+
+
+### Why We Have Support for Both
+
+In short: many people like matrices, and are familiar with `obsdim`-style keyword
+arguments.
+
+All internals are implemented using `AbstractVector`s though, and the `obsdim` interface
+is just a thin layer of utility functionality which sits on top of this.
+
+
+
+
+
+## [Kernels for Multiple-Outputs](@id inputs_for_multiple_outputs)
+
+There are two equally-valid perspectives on multi-output kernels: they can either be treated
+as matrix-valued kernels, or standard kernels on an extended input domain.
+Each of these perspectives are convenient in different circumstances, but the latter
+greatly simplifies the incorporation of multi-output kernels in KernelFunctions.
+
+More concretely, let `k_mat` be a matrix-valued kernel, mapping pairs of inputs of type `T` to matrices of size `P x P` to describe the covariance between `P` outputs.
+Given inputs `x` and `y` of type `T`, and integers `p` and `q`, we can always find an
+equivalent standard kernel `k` mapping from pairs of inputs of type `Tuple{T, Int}` to the
+`Real`s as follows:
+```julia
+k((x, p), (y, q)) = k_mat(x, y)[p, q]
+```
+This ability to treat multi-output kernels as single-output kernels is very helpful, as it
+means that there is no need to introduce additional concepts into the API of
+KernelFunctions.jl, just additional kernels!
+This in turn simplifies downstream code as they don't need to "know" about the existence of
+multi-output kernels in addition to standard kernels. For example, GP libraries built on
+top of KernelFunctions.jl just need to know about `Kernel`s, and they get multi-output
+kernels, and hence multi-output GPs, for free.
+
+Where there is the need to specialise _implementations_ for multi-output kernels, this is
+done in an encapsulated manner -- parts of KernelFunctions that have nothing to do with
+multi-output kernels know _nothing_ about the existence of multi-output kernels.
diff --git a/docs/src/userguide.md b/docs/src/userguide.md
index a8e633ea4..b3539dc64 100644
--- a/docs/src/userguide.md
+++ b/docs/src/userguide.md
@@ -1,6 +1,6 @@
# User guide
-## Kernel creation
+## Kernel Creation
To create a kernel object, choose one of the pre-implemented kernels, see [Kernel Functions](@ref), or create your own, see [Creating your own kernel](@ref).
For example, a squared exponential kernel is created by
@@ -10,10 +10,10 @@ For example, a squared exponential kernel is created by
!!! tip "How do I set the lengthscale?"
Instead of having lengthscale(s) for each kernel we use [`Transform`](@ref) objects which act on the inputs before passing them to the kernel. Note that the transforms such as [`ScaleTransform`](@ref) and [`ARDTransform`](@ref) _multiply_ the input by a scale factor, which corresponds to the _inverse_ of the lengthscale.
- For example, a lengthscale of 0.5 is equivalent to premultiplying the input by 2.0, and you can create the corresponding kernel as follows:
+ For example, a lengthscale of 0.5 is equivalent to premultiplying the input by 2.0, and you can create the corresponding kernel in either of the following equivalent ways:
```julia
- k = transform(SqExponentialKernel(), ScaleTransform(2.0))
- k = transform(SqExponentialKernel(), 2.0) # implicitly constructs a ScaleTransform(2.0)
+ k = SqExponentialKernel() ∘ ScaleTransform(2.0)
+ k = compose(SqExponentialKernel(), ScaleTransform(2.0))
```
Check the [Input Transforms](@ref input_transforms) page for more details.
@@ -32,74 +32,97 @@ For example, a squared exponential kernel is created by
use a squared exponential kernel together with a [`LinearTransform`](@ref) of
the inputs:
```julia
- k = transform(SqExponentialKernel(), LinearTransform(sqrt(2) .* Q))
+ k = SqExponentialKernel() ∘ LinearTransform(sqrt(2) .* Q)
```
Analogously, you can combine other kernels such as the
[`PiecewisePolynomialKernel`](@ref) with a [`LinearTransform`](@ref) of the
inputs to obtain a kernel that is a function of the Mahalanobis distance
between inputs.
-## Using a kernel function
+## Using a Kernel Function
To evaluate the kernel function on two vectors you simply call the kernel object:
```julia
- k = SqExponentialKernel()
- x1 = rand(3)
- x2 = rand(3)
- k(x1, x2)
+k = SqExponentialKernel()
+x1 = rand(3)
+x2 = rand(3)
+k(x1, x2)
```
-## Creating a kernel matrix
+## Creating a Kernel Matrix
Kernel matrices can be created via the `kernelmatrix` function or `kernelmatrix_diag` for only the diagonal.
-An important argument to give is the data layout of the input `obsdim`. It specifies whether the number of observed data points is along the first dimension (`obsdim=1`, i.e. the matrix shape is number of samples times number of features) or along the second dimension (`obsdim=2`, i.e. the matrix shape is number of features times number of samples), similarly to [Distances.jl](https://github.com/JuliaStats/Distances.jl). If not given explicitly, `obsdim` defaults to `2`.
-For example:
+For example, for a collection of 10 `Real`-valued inputs:
```julia
- k = SqExponentialKernel()
- A = rand(10, 5)
- kernelmatrix(k, A, obsdim=1) # returns a 10x10 matrix
- kernelmatrix(k, A, obsdim=2) # returns a 5x5 matrix
- k(A, obsdim=1) # Syntactic sugar
+k = SqExponentialKernel()
+x = rand(10)
+kernelmatrix(k, x) # 10x10 matrix
+```
+If your inputs are multi-dimensional, it is common to represent them as a matrix.
+For example
+```julia
+X = rand(10, 5)
+```
+However, it is ambiguous whether this represents a collection of 10 5-dimensional row-vectors, or 5 10-dimensional column-vectors.
+Therefore, we require users to provide some more information.
+
+You can write `RowVecs(X)` to declare that `X` contains 10 5-dimensional row-vectors, or `ColVecs(X)` to declare that `X` contains 5 10-dimensional column-vectors, then
+```julia
+kernelmatrix(k, RowVecs(X)) # returns a 10×10 matrix -- each row of X treated as input
+kernelmatrix(k, ColVecs(X)) # returns a 5×5 matrix -- each column of X treated as input
```
+This is the mechanism used throughout KernelFunctions.jl to handle multi-dimensional inputs.
+
+You can also utilise the `obsdim` keyword argument if you prefer:
+```julia
+kernelmatrix(k, X; obsdim=1) # same as RowVecs(X)
+kernelmatrix(k, X; obsdim=2) # same as ColVecs(X)
+```
+This is similar to the convention used in [Distances.jl](https://github.com/JuliaStats/Distances.jl).
+
+See [Input Types](@ref) for a more thorough discussion of these two approaches.
+
+
We also support specific kernel matrix outputs:
-- For a positive-definite matrix object`PDMat` from [`PDMats.jl`](https://github.com/JuliaStats/PDMats.jl), you can call the following:
+- For a positive-definite matrix object of type `PDMat` from [`PDMats.jl`](https://github.com/JuliaStats/PDMats.jl), you can call the following:
```julia
- using PDMats
- k = SqExponentialKernel()
- K = kernelpdmat(k, A, obsdim=1) # PDMat
+using PDMats
+k = SqExponentialKernel()
+K = kernelpdmat(k, RowVecs(X)) # PDMat
+K = kernelpdmat(k, X; obsdim=1) # PDMat
```
It will create a matrix and in case of bad conditioning will add some diagonal noise until the matrix is considered positive-definite; it will then return a `PDMat` object. For this method to work in your code you need to include `using PDMats` first.
- For a Kronecker matrix, we rely on [`Kronecker.jl`](https://github.com/MichielStock/Kronecker.jl). Here are two examples:
```julia
using Kronecker
-x = range(0, 1, length=10)
-y = range(0, 1, length=50)
+x = range(0, 1; length=10)
+y = range(0, 1; length=50)
K = kernelkronmat(k, [x, y]) # Kronecker matrix
K = kernelkronmat(k, x, 5) # Kronecker matrix
```
Make sure that `k` is a kernel compatible with such constructions (with `iskroncompatible(k)`). Both methods will return a Kronecker matrix. For those methods to work in your code you need to include `using Kronecker` first.
- For a Nystrom approximation: `kernelmatrix(nystrom(k, X, ρ, obsdim=1))` where `ρ` is the fraction of data samples used in the approximation.
-## Composite kernels
+## Composite Kernels
Sums and products of kernels are also valid kernels. They can be created via `KernelSum` and `KernelProduct` or using simple operators `+` and `*`.
For example:
```julia
- k1 = SqExponentialKernel()
- k2 = Matern32Kernel()
- k = 0.5 * k1 + 0.2 * k2 # KernelSum
- k = k1 * k2 # KernelProduct
+k1 = SqExponentialKernel()
+k2 = Matern32Kernel()
+k = 0.5 * k1 + 0.2 * k2 # KernelSum
+k = k1 * k2 # KernelProduct
```
-## Kernel parameters
+## Kernel Parameters
What if you want to differentiate through the kernel parameters? This is easy even in a highly nested structure such as:
```julia
- k = transform(
- 0.5 * SqExponentialKernel() * Matern12Kernel()
- + 0.2 * (transform(LinearKernel(), 2.0) + PolynomialKernel()),
- [0.1, 0.5])
+k = (
+ 0.5 * SqExponentialKernel() * Matern12Kernel() +
+ 0.2 * (LinearKernel() ∘ ScaleTransform(2.0) + PolynomialKernel())
+) ∘ ARDTransform([0.1, 0.5])
```
One can access the named tuple of trainable parameters via `Functors.functor` from `Functors.jl`.
This means that in practice you can implicitly optimize the kernel parameters by calling:
diff --git a/src/matrix/kernelmatrix.jl b/src/matrix/kernelmatrix.jl
index 6824db66a..da42bec37 100644
--- a/src/matrix/kernelmatrix.jl
+++ b/src/matrix/kernelmatrix.jl
@@ -1,41 +1,84 @@
"""
- kernelmatrix!(K::AbstractMatrix, κ::Kernel, X; obsdim::Integer = 2)
- kernelmatrix!(K::AbstractMatrix, κ::Kernel, X, Y; obsdim::Integer = 2)
+ kernelmatrix!(K::AbstractMatrix, κ::Kernel, x::AbstractVector)
+ kernelmatrix!(K::AbstractMatrix, κ::Kernel, x::AbstractVector, y::AbstractVector)
In-place version of [`kernelmatrix`](@ref) where pre-allocated matrix `K` will be
overwritten with the kernel matrix.
+
+ kernelmatrix!(K::AbstractMatrix, κ::Kernel, X::AbstractMatrix; obsdim::Integer=2)
+ kernelmatrix!(
+ K::AbstractMatrix,
+ κ::Kernel,
+ X::AbstractMatrix,
+ Y::AbstractMatrix;
+ obsdim::Integer=2,
+ )
+
+Equivalent to `kernelmatrix!(K, κ, ColVecs(X))` and
+`kernelmatrix(K, κ, ColVecs(X), ColVecs(Y))` respectively.
+Set `obsdim=1` to get `RowVecs`.
+
+See also: [`ColVecs`](@ref), [`RowVecs`](@ref)
"""
kernelmatrix!
"""
- kernelmatrix(κ::Kernel, X; obsdim::Int = 2)
- kernelmatrix(κ::Kernel, X, Y; obsdim::Int = 2)
+ kernelmatrix(κ::Kernel, x::AbstractVector)
+ kernelmatrix(κ::Kernel, x::AbstractVector, y::AbstractVector)
+
+Calculate the kernel matrix of `x` (and `y`) with respect to kernel `κ`.
+
+ kernelmatrix(κ::Kernel, X::AbstractMatrix; obsdim::Int=2)
+ kernelmatrix(κ::Kernel, X::AbstractMatrix, Y::AbstractMatrix; obsdim::Int=2)
-Calculate the kernel matrix of `X` (and `Y`) with respect to kernel `κ`.
-`obsdim = 1` means the matrix `X` (and `Y`) has size #samples x #dimension
-`obsdim = 2` means the matrix `X` (and `Y`) has size #dimension x #samples
+Equivalent to `kernelmatrix(κ, ColVecs(X))` and `kernelmatrix(κ, ColVecs(X), ColVecs(Y))`
+respectively.
+Set `obsdim=1` to get `RowVecs`.
+
+See also: [`ColVecs`](@ref), [`RowVecs`](@ref)
"""
kernelmatrix
"""
- kernelmatrix_diag!(K::AbstractVector, κ::Kernel, X; obsdim::Int = 2)
- kernelmatrix_diag!(K::AbstractVector, κ::Kernel, X, Y; obsdim::Int = 2)
+ kernelmatrix_diag!(K::AbstractVector, κ::Kernel, x::AbstractVector)
+ kernelmatrix_diag!(K::AbstractVector, κ::Kernel, x::AbstractVector, y::AbstractVector)
+
+In place version of [`kernelmatrix_diag`](@ref).
+
+ kernelmatrix_diag!(K::AbstractVector, κ::Kernel, X::AbstractMatrix; obsdim::Int=2)
+ kernelmatrix_diag!(
+ K::AbstractVector,
+ κ::Kernel,
+ X::AbstractMatrix,
+ Y::AbstractMatrix;
+ obsdim::Int=2,
+ )
+
+Equivalent to `kernelmatrix_diag!(K, κ, ColVecs(X))` and
+`kernelmatrix_diag!(K, κ, ColVecs(X), ColVecs(Y))` respectively.
+Set `obsdim=1` to get `RowVecs`.
-In place version of [`kernelmatrix_diag`](@ref)
+See also: [`ColVecs`](@ref), [`RowVecs`](@ref)
"""
kernelmatrix_diag!
"""
- kernelmatrix_diag(κ::Kernel, X; obsdim::Int = 2)
+ kernelmatrix_diag(κ::Kernel, x::AbstractVector)
+
+Calculate the diagonal matrix of `x` with respect to kernel `κ`.
+
+ kernelmatrix_diag(κ::Kernel, x::AbstractVector, y::AbstractVector)
+
+Calculate the diagonal of `kernelmatrix(κ, x, y)` efficiently.
+Requires that `x` and `y` are the same length.
-Calculate the diagonal matrix of `X` with respect to kernel `κ`
-`obsdim = 1` means the matrix `X` has size #samples x #dimension
-`obsdim = 2` means the matrix `X` has size #dimension x #samples
+ kernelmatrix_diag(κ::Kernel, X::AbstractMatrix; obsdim::Int=2)
+ kernelmatrix_diag(κ::Kernel, X::AbstractMatrix, Y::AbstractMatrix; obsdim::Int=2)
- kernelmatrix_diag(κ::Kernel, X, Y; obsdim::Int = 2)
+Equivalent to `kernelmatrix_diag(κ, ColVecs(X))` and
+`kernelmatrix_diag(κ, ColVecs(X), ColVecs(Y))` respectively.
-Calculate the diagonal of `kernelmatrix(κ, X, Y; obsdim)` efficiently. Requires that `X` and
-`Y` are the same length.
+See also: [`ColVecs`](@ref), [`RowVecs`](@ref)
"""
kernelmatrix_diag
diff --git a/src/mokernels/moinput.jl b/src/mokernels/moinput.jl
index e1de49d06..0c16e5a36 100644
--- a/src/mokernels/moinput.jl
+++ b/src/mokernels/moinput.jl
@@ -2,6 +2,27 @@
MOInput(x::AbstractVector, out_dim::Integer)
A data type to accomodate modelling multi-dimensional output data.
+
+`MOInput(x, out_dim)` has length `length(x) * out_dim`.
+
+```jldoctest
+julia> x = [1, 2, 3];
+
+julia> MOInput(x, 2)
+6-element MOInput{Vector{Int64}}:
+ (1, 1)
+ (2, 1)
+ (3, 1)
+ (1, 2)
+ (2, 2)
+ (3, 2)
+```
+
+As shown above, an `MOInput` represents a vector of tuples.
+The first `length(x)` elements represent the inputs for the first output, the second
+`length(x)` elements represent the inputs for the second output, etc.
+
+See [Inputs for Multiple Outputs](@ref) in the docs for more info.
"""
struct MOInput{T<:AbstractVector} <: AbstractVector{Tuple{Any,Int}}
x::T
diff --git a/src/utils.jl b/src/utils.jl
index 7e28c82ef..33a085c56 100644
--- a/src/utils.jl
+++ b/src/utils.jl
@@ -33,8 +33,34 @@ end
"""
ColVecs(X::AbstractMatrix)
-A lightweight wrapper for an `AbstractMatrix` to make it behave like a vector of vectors.
-Each vector represents a column of the matrix
+A lightweight wrapper for an `AbstractMatrix` which interprets it as a vector-of-vectors, in
+which each _column_ of `X` represents a single vector.
+
+That is, by writing `x = ColVecs(X)`, you are saying "`x` is a vector-of-vectors, each of
+which has length `size(X, 1)`. The total number of vectors is `size(X, 2)`."
+
+Phrased differently, `ColVecs(X)` says that `X` should be interpreted as a vector
+of horizontally-concatenated column-vectors, hence the name `ColVecs`.
+
+```jldoctest
+julia> X = randn(2, 5);
+
+julia> x = ColVecs(X);
+
+julia> length(x) == 5
+true
+
+julia> X[:, 3] == x[3]
+true
+```
+
+`ColVecs` is related to [`RowVecs`](@ref) via transposition:
+```jldoctest
+julia> X = randn(2, 5);
+
+julia> ColVecs(X) == RowVecs(X')
+true
+```
"""
struct ColVecs{T,TX<:AbstractMatrix{T},S} <: AbstractVector{S}
X::TX
@@ -70,8 +96,37 @@ end
"""
RowVecs(X::AbstractMatrix)
-A lightweight wrapper for an `AbstractMatrix` to make it behave like a vector of vectors.
-Each vector represents a row of the matrix
+A lightweight wrapper for an `AbstractMatrix` which interprets it as a vector-of-vectors, in
+which each _row_ of `X` represents a single vector.
+
+That is, by writing `x = RowVecs(X)`, you are saying "`x` is a vector-of-vectors, each of
+which has length `size(X, 2)`. The total number of vectors is `size(X, 1)`."
+
+Phrased differently, `RowVecs(X)` says that `X` should be interpreted as a vector
+of vertically-concatenated row-vectors, hence the name `RowVecs`.
+
+Internally, the data continues to be represented as an `AbstractMatrix`, so using this type
+does not introduce any kind of performance penalty.
+
+```jldoctest
+julia> X = randn(5, 2);
+
+julia> x = RowVecs(X);
+
+julia> length(x) == 5
+true
+
+julia> X[3, :] == x[3]
+true
+```
+
+`RowVecs` is related to [`ColVecs`](@ref) via transposition:
+```jldoctest
+julia> X = randn(5, 2);
+
+julia> RowVecs(X) == ColVecs(X')
+true
+```
"""
struct RowVecs{T,TX<:AbstractMatrix{T},S} <: AbstractVector{S}
X::TX
diff --git a/test/runtests.jl b/test/runtests.jl
index 6dec94201..43c8e24ac 100644
--- a/test/runtests.jl
+++ b/test/runtests.jl
@@ -162,6 +162,13 @@ include("test_utils.jl")
end;
recursive=true,
)
- doctest(KernelFunctions)
+ doctest(
+ KernelFunctions;
+ doctestfilters=[
+ r"{([a-zA-Z0-9]+,\s?)+[a-zA-Z0-9]+}",
+ r"(Array{[a-zA-Z0-9]+,\s?1}|Vector{[a-zA-Z0-9]+})",
+ r"(Array{[a-zA-Z0-9]+,\s?2}|Matrix{[a-zA-Z0-9]+})",
+ ],
+ )
end
end