diff --git a/docs/source/learn/core_notebooks/Gaussian_Processes.rst b/docs/source/learn/core_notebooks/Gaussian_Processes.rst
new file mode 100644
index 0000000000..189f6a6a24
--- /dev/null
+++ b/docs/source/learn/core_notebooks/Gaussian_Processes.rst
@@ -0,0 +1,265 @@
+.. _gp_guide:
+
+******************
+Gaussian Processes
+******************
+
+GP Basics
+=========
+
+Sometimes an unknown parameter or variable in a model is not a scalar value or
+a fixed-length vector, but a *function*. A Gaussian process (GP) can be used
+as a prior probability distribution whose support is over the space of
+continuous functions. A GP prior on the function :math:`f(x)` is usually written,
+
+.. math::
+
+ f(x) \sim \mathcal{GP}(m(x), \, k(x, x')) \,.
+
+The function values are modeled as a draw from a multivariate normal
+distribution that is parameterized by the mean function, :math:`m(x)`, and the
+covariance function, :math:`k(x, x')`. Gaussian processes are a convenient
+choice as priors over functions due to the marginalization and conditioning
+properties of the multivariate normal distribution. Usually, the marginal
+distribution over :math:`f(x)` is evaluated during the inference step. The
+conditional distribution is then used for predicting the function values
+:math:`f(x_*)` at new points, :math:`x_*`.
+
+The joint distribution of :math:`f(x)` and :math:`f(x_*)` is multivariate
+normal,
+
+.. math::
+
+ \begin{bmatrix} f(x) \\ f(x_*) \\ \end{bmatrix} \sim
+ \text{N}\left(
+ \begin{bmatrix} m(x) \\ m(x_*) \\ \end{bmatrix} \,,
+ \begin{bmatrix} k(x,x') & k(x_*, x) \\
+ k(x_*, x) & k(x_*, x_*') \\ \end{bmatrix}
+ \right) \,.
+
+Starting from the joint distribution, one obtains the marginal distribution
+of :math:`f(x)`, as :math:`\text{N}(m(x),\, k(x, x'))`. The conditional
+distribution is
+
+.. math::
+
+ f(x_*) \mid f(x) \sim \text{N}\left( k(x_*, x) k(x, x)^{-1} [f(x) - m(x)] + m(x_*) ,\,
+ k(x_*, x_*) - k(x, x_*) k(x, x)^{-1} k(x, x_*) \right) \,.
+
+.. note::
+
+ For more information on GPs, check out the book `Gaussian Processes for
+ Machine Learning `_ by Rasmussen &
+ Williams, or `this introduction `_
+ by D. Mackay.
+
+PyMC is a great environment for working with fully Bayesian Gaussian Process
+models. GPs in PyMC have a clear syntax and are highly composable, and many
+predefined covariance functions (or kernels), mean functions, and several GP
+implementations are included. GPs are treated as distributions that can be
+used within larger or hierarchical models, not just as standalone regression
+models.
+
+Mean and covariance functions
+=============================
+
+Those who have used the GPy or GPflow Python packages will find the syntax for
+construction mean and covariance functions somewhat familiar. When first
+instantiated, the mean and covariance functions are parameterized, but not
+given their inputs yet. The covariance functions must additionally be provided
+with the number of dimensions of the input matrix, and a list that indexes
+which of those dimensions they are to operate on. The reason for this design
+is so that covariance functions can be constructed that are combinations of
+other covariance functions.
+
+For example, to construct an exponentiated quadratic covariance function that
+operates on the second and third column of a three column matrix representing
+three predictor variables::
+
+ ls = [2, 5] # the lengthscales
+ cov_func = pm.gp.cov.ExpQuad(input_dim=3, ls=ls, active_dims=[1, 2])
+
+Here the :code:`ls`, or lengthscale, parameter is two dimensional, allowing the second
+and third dimension to have a different lengthscale. The reason we have to
+specify :code:`input_dim`, the total number of columns of :code:`X`, and
+:code:`active_dims`, which of those columns or dimensions the covariance
+function will act on, is because :code:`cov_func` hasn't actually seen the
+input data yet. The :code:`active_dims` argument is optional, and defaults to
+all columns of the matrix of inputs.
+
+Covariance functions in PyMC closely follow the algebraic rules for kernels,
+which allows users to combine covariance functions into new ones, for example:
+
+- The sum of two covariance functions is also a covariance function::
+
+
+ cov_func = pm.gp.cov.ExpQuad(...) + pm.gp.cov.ExpQuad(...)
+
+- The product of two covariance functions is also a covariance function::
+
+
+ cov_func = pm.gp.cov.ExpQuad(...) * pm.gp.cov.Periodic(...)
+
+- The product (or sum) of a covariance function with a scalar is a
+ covariance function::
+
+
+ cov_func = eta**2 * pm.gp.cov.Matern32(...)
+
+
+
+After the covariance function is defined, it is now a function that is
+evaluated by calling :code:`cov_func(x, x)` (or :code:`mean_func(x)`). Since
+PyMC is built on top of PyTensor, it is relatively easy to define and experiment
+with non-standard covariance and mean functions. For more information check out
+the :ref:`tutorial ` on covariance functions.
+
+
+GP Implementations
+==================
+
+PyMC includes several GP implementations, including marginal and latent
+variable models and also some fast approximations. Their usage all follows a
+similar pattern: First, a GP is instantiated with a mean function and a
+covariance function. Then, GP objects can be added together, allowing for
+function characteristics to be carefully modeled and separated. Finally, one
+of `prior`, `marginal_likelihood` or `conditional` methods is called on the GP
+object to actually construct the PyMC random variable that represents the
+function prior.
+
+Using :code:`gp.Latent` for the example, the syntax to first specify the GP
+is::
+
+ gp = pm.gp.Latent(mean_func, cov_func)
+
+The first argument is the mean function and the second is the covariance
+function. We've made the GP object, but we haven't made clear which function
+it is to be a prior for, what the inputs are, or what parameters it will be
+conditioned on.
+
+.. note::
+
+ The :code:`gp.Marginal` class and similar don't have a :code:`prior` method.
+ Instead they have a :code:`marginal_likelihood` method that is used similarly,
+ but has additional required arguments, such as the observed data, noise,
+ or other, depending on the implementation. See the notebooks for examples.
+ The :code:`conditional` method works similarly.
+
+Calling the `prior` method will create a PyMC random variable that represents
+the latent function :math:`f(x) = \mathbf{f}`::
+
+ f = gp.prior("f", X)
+
+:code:`f` is a random variable that can be used within a PyMC model like any
+other type of random variable. The first argument is the name of the random
+variable representing the function we are placing the prior over.
+The second argument is the inputs to the function that the prior is over,
+:code:`X`. The inputs are usually known and present in the data, but they can
+also be PyMC random variables. If the inputs are an PyTensor tensor or a
+PyMC random variable, the :code:`shape` needs to be given.
+
+Usually at this point, inference is performed on the model. The
+:code:`conditional` method creates the conditional, or predictive,
+distribution over the latent function at arbitrary :math:`x_*` input points,
+:math:`f(x_*)`. To construct the conditional distribution we write::
+
+ f_star = gp.conditional("f_star", X_star)
+
+.. _additive_gp:
+
+Additive GPs
+============
+
+The GP implementation in PyMC is constructed so that it is easy to define
+additive GPs and sample from individual GP components. We can write::
+
+ gp1 = pm.gp.Marginal(mean_func1, cov_func1)
+ gp2 = pm.gp.Marginal(mean_func2, cov_func2)
+ gp3 = gp1 + gp2
+
+The GP objects have to have the same type, :code:`gp.Marginal` cannot
+be added to :code:`gp.Latent`.
+
+Consider two independent GP distributed functions, :math:`f_1(x) \sim
+\mathcal{GP}\left(m_1(x),\, k_1(x, x')\right)` and :math:`f_2(x) \sim
+\mathcal{GP}\left( m_2(x),\, k_2(x, x')\right)`. The joint distribution of
+:math:`f_1,\, f_1^*,\, f_2,\, f_2^*,\, f_1 + f_2` and :math:`f_1^* + f_2^*` is
+
+.. math::
+
+ \begin{bmatrix} f_1 \\ f_1^* \\ f_2 \\ f_2^*
+ \\ f_1 + f_2 \\ f_1^* + f_2^* \end{bmatrix} \sim
+ \text{N}\left(
+ \begin{bmatrix} m_1 \\ m_1^* \\ m_2 \\ m_2^* \\
+ m_1 + m_2 \\ m_1^* + m_2^* \\ \end{bmatrix} \,,\,
+ \begin{bmatrix}
+ K_1 & K_1^* & 0 & 0 & K_1 & K_1^* \\
+ K_1^{*^T} & K_1^{**} & 0 & 0 & K_1^* & K_1^{**} \\
+ 0 & 0 & K_2 & K_2^* & K_2 & K_2^{*} \\
+ 0 & 0 & K_2^{*^T} & K_2^{**} & K_2^{*} & K_2^{**} \\
+ K_1 & K_1^{*} & K_2 & K_2^{*} & K_1 + K_2 & K_1^{*} + K_2^{*} \\
+ K_1^{*^T} & K_1^{**} & K_2^{*^T} & K_2^{**} & K_1^{*^T}+K_2^{*^T} & K_1^{**}+K_2^{**}
+ \end{bmatrix}
+ \right) \,.
+
+Using the joint distribution to obtain the conditional distribution of :math:`f_1^*`
+with the contribution due to :math:`f_1 + f_2` factored out, we get
+
+.. math::
+ f_1^* \mid f_1 + f_2 \sim \text{N}\left(
+ m_1^* + K_1^{*^T}(K_1 + K_2)^{-1}\left[f_1 + f_2 - m_1 - m_2\right] \,,\,
+ K_1^{**} - K_1^{*^T}(K_1 + K_2)^{-1}K_1^* \right) \,.
+
+
+These equations show how to break down GP models into individual components to see how each
+contributes to the data. For more information, check out `David Duvenaud's PhD
+thesis `_.
+
+The GP objects in PyMC keeps track of these marginals automatically. The
+following code sketch shows how to define the conditional distribution of
+:math:`f_2^*`. We use `gp.Marginal` in the example, but the same works for
+other implementations. The first block fits the GP prior. We denote
+:math:`f_1 + f_2` as just :math:`f` for brevity::
+
+ with pm.Model() as model:
+ gp1 = pm.gp.Marginal(mean_func1, cov_func1)
+ gp2 = pm.gp.Marginal(mean_func2, cov_func2)
+
+ # gp represents f1 + f2.
+ gp = gp1 + gp2
+
+ f = gp.marginal_likelihood("f", X, y, sigma)
+
+ idata = pm.sample(1000)
+
+
+To construct the conditional distribution of :code:`gp1` or :code:`gp2`, we
+also need to include the additional arguments, :code:`X`, :code:`y`, and
+:code:`sigma`::
+
+ with model:
+ # conditional distributions of f1 and f2
+ f1_star = gp1.conditional("f1_star", X_star,
+ given={"X": X, "y": y, "sigma": sigma, "gp": gp})
+ f2_star = gp2.conditional("f2_star", X_star,
+ given={"X": X, "y": y, "sigma": sigma, "gp": gp})
+
+ # conditional of f1 + f2, `given` not required
+ f_star = gp.conditional("f_star", X_star)
+
+This second block produces the conditional distributions. Notice that extra
+arguments are required for conditionals of :math:`f1` and :math:`f2`, but not
+:math:`f`. This is because those arguments are cached when
+:code:`.marginal_likelihood` is called on :code:`gp`.
+
+.. note::
+ When constructing conditionals, the additional arguments :code:`X`, :code:`y`,
+ :code:`sigma` and :code:`gp` must be provided as a dict called `given`!
+
+Since the marginal likelihoood method of :code:`gp1` or :code:`gp2` weren't called,
+their conditionals need to be provided with the required inputs. In the same
+fashion as the prior, :code:`f_star`, :code:`f1_star` and :code:`f2_star` are random
+variables that can now be used like any other random variable in PyMC.
+
+Check the :ref:`notebooks `
+for detailed demonstrations of the usage of GP functionality in PyMC.
diff --git a/docs/source/learn/core_notebooks/index.md b/docs/source/learn/core_notebooks/index.md
index 33e10686ea..af7ff8b246 100644
--- a/docs/source/learn/core_notebooks/index.md
+++ b/docs/source/learn/core_notebooks/index.md
@@ -10,4 +10,5 @@ model_comparison
posterior_predictive
dimensionality
pymc_pytensor
+Gaussian_Processes
:::