diff --git a/docs/source/learn/core_notebooks/Gaussian_Processes.rst b/docs/source/learn/core_notebooks/Gaussian_Processes.rst new file mode 100644 index 0000000000..189f6a6a24 --- /dev/null +++ b/docs/source/learn/core_notebooks/Gaussian_Processes.rst @@ -0,0 +1,265 @@ +.. _gp_guide: + +****************** +Gaussian Processes +****************** + +GP Basics +========= + +Sometimes an unknown parameter or variable in a model is not a scalar value or +a fixed-length vector, but a *function*. A Gaussian process (GP) can be used +as a prior probability distribution whose support is over the space of +continuous functions. A GP prior on the function :math:`f(x)` is usually written, + +.. math:: + + f(x) \sim \mathcal{GP}(m(x), \, k(x, x')) \,. + +The function values are modeled as a draw from a multivariate normal +distribution that is parameterized by the mean function, :math:`m(x)`, and the +covariance function, :math:`k(x, x')`. Gaussian processes are a convenient +choice as priors over functions due to the marginalization and conditioning +properties of the multivariate normal distribution. Usually, the marginal +distribution over :math:`f(x)` is evaluated during the inference step. The +conditional distribution is then used for predicting the function values +:math:`f(x_*)` at new points, :math:`x_*`. + +The joint distribution of :math:`f(x)` and :math:`f(x_*)` is multivariate +normal, + +.. math:: + + \begin{bmatrix} f(x) \\ f(x_*) \\ \end{bmatrix} \sim + \text{N}\left( + \begin{bmatrix} m(x) \\ m(x_*) \\ \end{bmatrix} \,, + \begin{bmatrix} k(x,x') & k(x_*, x) \\ + k(x_*, x) & k(x_*, x_*') \\ \end{bmatrix} + \right) \,. + +Starting from the joint distribution, one obtains the marginal distribution +of :math:`f(x)`, as :math:`\text{N}(m(x),\, k(x, x'))`. The conditional +distribution is + +.. math:: + + f(x_*) \mid f(x) \sim \text{N}\left( k(x_*, x) k(x, x)^{-1} [f(x) - m(x)] + m(x_*) ,\, + k(x_*, x_*) - k(x, x_*) k(x, x)^{-1} k(x, x_*) \right) \,. + +.. note:: + + For more information on GPs, check out the book `Gaussian Processes for + Machine Learning `_ by Rasmussen & + Williams, or `this introduction `_ + by D. Mackay. + +PyMC is a great environment for working with fully Bayesian Gaussian Process +models. GPs in PyMC have a clear syntax and are highly composable, and many +predefined covariance functions (or kernels), mean functions, and several GP +implementations are included. GPs are treated as distributions that can be +used within larger or hierarchical models, not just as standalone regression +models. + +Mean and covariance functions +============================= + +Those who have used the GPy or GPflow Python packages will find the syntax for +construction mean and covariance functions somewhat familiar. When first +instantiated, the mean and covariance functions are parameterized, but not +given their inputs yet. The covariance functions must additionally be provided +with the number of dimensions of the input matrix, and a list that indexes +which of those dimensions they are to operate on. The reason for this design +is so that covariance functions can be constructed that are combinations of +other covariance functions. + +For example, to construct an exponentiated quadratic covariance function that +operates on the second and third column of a three column matrix representing +three predictor variables:: + + ls = [2, 5] # the lengthscales + cov_func = pm.gp.cov.ExpQuad(input_dim=3, ls=ls, active_dims=[1, 2]) + +Here the :code:`ls`, or lengthscale, parameter is two dimensional, allowing the second +and third dimension to have a different lengthscale. The reason we have to +specify :code:`input_dim`, the total number of columns of :code:`X`, and +:code:`active_dims`, which of those columns or dimensions the covariance +function will act on, is because :code:`cov_func` hasn't actually seen the +input data yet. The :code:`active_dims` argument is optional, and defaults to +all columns of the matrix of inputs. + +Covariance functions in PyMC closely follow the algebraic rules for kernels, +which allows users to combine covariance functions into new ones, for example: + +- The sum of two covariance functions is also a covariance function:: + + + cov_func = pm.gp.cov.ExpQuad(...) + pm.gp.cov.ExpQuad(...) + +- The product of two covariance functions is also a covariance function:: + + + cov_func = pm.gp.cov.ExpQuad(...) * pm.gp.cov.Periodic(...) + +- The product (or sum) of a covariance function with a scalar is a + covariance function:: + + + cov_func = eta**2 * pm.gp.cov.Matern32(...) + + + +After the covariance function is defined, it is now a function that is +evaluated by calling :code:`cov_func(x, x)` (or :code:`mean_func(x)`). Since +PyMC is built on top of PyTensor, it is relatively easy to define and experiment +with non-standard covariance and mean functions. For more information check out +the :ref:`tutorial ` on covariance functions. + + +GP Implementations +================== + +PyMC includes several GP implementations, including marginal and latent +variable models and also some fast approximations. Their usage all follows a +similar pattern: First, a GP is instantiated with a mean function and a +covariance function. Then, GP objects can be added together, allowing for +function characteristics to be carefully modeled and separated. Finally, one +of `prior`, `marginal_likelihood` or `conditional` methods is called on the GP +object to actually construct the PyMC random variable that represents the +function prior. + +Using :code:`gp.Latent` for the example, the syntax to first specify the GP +is:: + + gp = pm.gp.Latent(mean_func, cov_func) + +The first argument is the mean function and the second is the covariance +function. We've made the GP object, but we haven't made clear which function +it is to be a prior for, what the inputs are, or what parameters it will be +conditioned on. + +.. note:: + + The :code:`gp.Marginal` class and similar don't have a :code:`prior` method. + Instead they have a :code:`marginal_likelihood` method that is used similarly, + but has additional required arguments, such as the observed data, noise, + or other, depending on the implementation. See the notebooks for examples. + The :code:`conditional` method works similarly. + +Calling the `prior` method will create a PyMC random variable that represents +the latent function :math:`f(x) = \mathbf{f}`:: + + f = gp.prior("f", X) + +:code:`f` is a random variable that can be used within a PyMC model like any +other type of random variable. The first argument is the name of the random +variable representing the function we are placing the prior over. +The second argument is the inputs to the function that the prior is over, +:code:`X`. The inputs are usually known and present in the data, but they can +also be PyMC random variables. If the inputs are an PyTensor tensor or a +PyMC random variable, the :code:`shape` needs to be given. + +Usually at this point, inference is performed on the model. The +:code:`conditional` method creates the conditional, or predictive, +distribution over the latent function at arbitrary :math:`x_*` input points, +:math:`f(x_*)`. To construct the conditional distribution we write:: + + f_star = gp.conditional("f_star", X_star) + +.. _additive_gp: + +Additive GPs +============ + +The GP implementation in PyMC is constructed so that it is easy to define +additive GPs and sample from individual GP components. We can write:: + + gp1 = pm.gp.Marginal(mean_func1, cov_func1) + gp2 = pm.gp.Marginal(mean_func2, cov_func2) + gp3 = gp1 + gp2 + +The GP objects have to have the same type, :code:`gp.Marginal` cannot +be added to :code:`gp.Latent`. + +Consider two independent GP distributed functions, :math:`f_1(x) \sim +\mathcal{GP}\left(m_1(x),\, k_1(x, x')\right)` and :math:`f_2(x) \sim +\mathcal{GP}\left( m_2(x),\, k_2(x, x')\right)`. The joint distribution of +:math:`f_1,\, f_1^*,\, f_2,\, f_2^*,\, f_1 + f_2` and :math:`f_1^* + f_2^*` is + +.. math:: + + \begin{bmatrix} f_1 \\ f_1^* \\ f_2 \\ f_2^* + \\ f_1 + f_2 \\ f_1^* + f_2^* \end{bmatrix} \sim + \text{N}\left( + \begin{bmatrix} m_1 \\ m_1^* \\ m_2 \\ m_2^* \\ + m_1 + m_2 \\ m_1^* + m_2^* \\ \end{bmatrix} \,,\, + \begin{bmatrix} + K_1 & K_1^* & 0 & 0 & K_1 & K_1^* \\ + K_1^{*^T} & K_1^{**} & 0 & 0 & K_1^* & K_1^{**} \\ + 0 & 0 & K_2 & K_2^* & K_2 & K_2^{*} \\ + 0 & 0 & K_2^{*^T} & K_2^{**} & K_2^{*} & K_2^{**} \\ + K_1 & K_1^{*} & K_2 & K_2^{*} & K_1 + K_2 & K_1^{*} + K_2^{*} \\ + K_1^{*^T} & K_1^{**} & K_2^{*^T} & K_2^{**} & K_1^{*^T}+K_2^{*^T} & K_1^{**}+K_2^{**} + \end{bmatrix} + \right) \,. + +Using the joint distribution to obtain the conditional distribution of :math:`f_1^*` +with the contribution due to :math:`f_1 + f_2` factored out, we get + +.. math:: + f_1^* \mid f_1 + f_2 \sim \text{N}\left( + m_1^* + K_1^{*^T}(K_1 + K_2)^{-1}\left[f_1 + f_2 - m_1 - m_2\right] \,,\, + K_1^{**} - K_1^{*^T}(K_1 + K_2)^{-1}K_1^* \right) \,. + + +These equations show how to break down GP models into individual components to see how each +contributes to the data. For more information, check out `David Duvenaud's PhD +thesis `_. + +The GP objects in PyMC keeps track of these marginals automatically. The +following code sketch shows how to define the conditional distribution of +:math:`f_2^*`. We use `gp.Marginal` in the example, but the same works for +other implementations. The first block fits the GP prior. We denote +:math:`f_1 + f_2` as just :math:`f` for brevity:: + + with pm.Model() as model: + gp1 = pm.gp.Marginal(mean_func1, cov_func1) + gp2 = pm.gp.Marginal(mean_func2, cov_func2) + + # gp represents f1 + f2. + gp = gp1 + gp2 + + f = gp.marginal_likelihood("f", X, y, sigma) + + idata = pm.sample(1000) + + +To construct the conditional distribution of :code:`gp1` or :code:`gp2`, we +also need to include the additional arguments, :code:`X`, :code:`y`, and +:code:`sigma`:: + + with model: + # conditional distributions of f1 and f2 + f1_star = gp1.conditional("f1_star", X_star, + given={"X": X, "y": y, "sigma": sigma, "gp": gp}) + f2_star = gp2.conditional("f2_star", X_star, + given={"X": X, "y": y, "sigma": sigma, "gp": gp}) + + # conditional of f1 + f2, `given` not required + f_star = gp.conditional("f_star", X_star) + +This second block produces the conditional distributions. Notice that extra +arguments are required for conditionals of :math:`f1` and :math:`f2`, but not +:math:`f`. This is because those arguments are cached when +:code:`.marginal_likelihood` is called on :code:`gp`. + +.. note:: + When constructing conditionals, the additional arguments :code:`X`, :code:`y`, + :code:`sigma` and :code:`gp` must be provided as a dict called `given`! + +Since the marginal likelihoood method of :code:`gp1` or :code:`gp2` weren't called, +their conditionals need to be provided with the required inputs. In the same +fashion as the prior, :code:`f_star`, :code:`f1_star` and :code:`f2_star` are random +variables that can now be used like any other random variable in PyMC. + +Check the :ref:`notebooks ` +for detailed demonstrations of the usage of GP functionality in PyMC. diff --git a/docs/source/learn/core_notebooks/index.md b/docs/source/learn/core_notebooks/index.md index 33e10686ea..af7ff8b246 100644 --- a/docs/source/learn/core_notebooks/index.md +++ b/docs/source/learn/core_notebooks/index.md @@ -10,4 +10,5 @@ model_comparison posterior_predictive dimensionality pymc_pytensor +Gaussian_Processes :::