Skip to main content

GSoC 2022 : Fast Gaussian process implementation in PyMC

Gaussian processes (GPs) are very useful class of semi-parametric machine learning models. Before their use in more modern classification and regression tasks, they have been very successfully applied in searching for underground oil fields. GPs were called kriging models 1 back then, but the idea was the same.

GPs belong to a general class of models known as kernel methods 2 . Kernel methods use something called the kernel function, denoted as
$k(\bold{x},\bold{x’})$. Where $\bold{x} \in R^{d}$ represents the input data and $k$ can be any function which returns a scalar. For example, the dot-product kernel $$ k(\bold{x}, \bold{x’}) \coloneqq \bold{x}^T\bold{x} $$

Assuming $N$ such vectors $\bold{x} \in R^{d}$ are stacked, then we can write the input data as $X \in R^{NxD}$ and correspondingly the kernel is written as $K_{X,X} \in R^{NxN}$. Let’s say, we are interested in building a regression model where the target values are denoted as $\bold{y} \in R^{n}$ then gaussian process models are trained by optimizing something called the log marginal likelihood 3.

Log marginal likelihood is a function of the input $X, \bold{y}$ and is written as,

\begin{equation} \tag{1} L(\theta | X, \bold{y}) \approx log \left| K_{X,X}\right| - \bold{y}^{T}K_{X,X}^{-1}\bold{y} \end{equation}

If we want to optimize the above function using gradient based methods we need to compute the gradient $ \frac{dL}{d\theta} $ which looks like,

\begin{equation} \tag{2} \frac{dL}{d\theta} = \bold{y}^{T} K_{X,X}^{-1} \frac{dK_{X,X}}{d\theta} K_{X,X}^{-1}\bold{y} + \text{Tr} \left( K_{X,X}^{-1} \frac{dK_{X,X}}{d\theta} \right) \end{equation}

In equation 1 and 2 above, the most expensive compute steps are

  1. The log determinant : $ log \left| K_{X,X}\right| $
  2. Inverse of the kernel or compute the solve : $ K_{X,X}^{-1}\bold{y} $
  3. Trace : $ \text{Tr} \left( K_{X,X}^{-1} \frac{dK_{X,X}}{d\theta} \right) $

In Gardner, et.al 2018 4 they proposed a few algorithms that expresses each of the above three expensive computations to large matrix computations which can be sped-up when running on a GPU.

For my GSoC, I will sub-class MarginalGP and override the _build_conditional() and _build_marginal_likelihood() as prescribed in Gardner, et.al 2018 4 and that should significantly speed up Gaussian process inference in PyMC 😍


  1. Wikipedia: Kriging ↩︎

  2. An overview of kernel methods is out of scope of this post, but a good overview of Gaussian processes can be found in Rassmussen and Williams↩︎

  3. Equation 2.30 in Gaussian Processes for Machine Learning gives the log marginal likelihood for a zero-mean Gaussian process. ↩︎

  4. GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration. arxiv ↩︎ ↩︎