Multi-channel gaussian processes (part 1: introduction)

Machine learning is all the rage these days, it is hard to hear about some new cool thing that doesn’t involve the term, somewhat replacing the more vague and Altered Carbon-invoking term AI (in trying to avoid clichés I should at least attempt to stay away from Blade Runner references), and now-already-outdated ones like data mining, which these days sounds like something from the 1980s. Apparently, most applications still seem to use neural networks, which can be very good at the job, but are actually stuff from the 80s as pointed out by people like Ben Vigoda.

Alternatives to that approach include methods like gaussian processes; there are formal connection between the approaches (as there often are to many others) but the most basic connection is probably that they are all just statistics and inference with different flavors of linear algebra to glue everything together – i.e. “machine learning” is not a separate, more sophisticated class of methods in any real way. Gaussian processes (GPs) are no exception; coming from the side of traditional statistics they are basically linear regression formulated in a clever way that is able to explore useful properties of basis functions and the gaussian distribution to be more flexible than fitting a straight line or an unknowable polynomial. Therefore, besides its representation as jointly normal variables with correlations specified by a kernel, gaussian processes can be formalized as a traditional linear regression, and this is called dual-representation.

While linear models, and/or others based on normally-distributed data are still widely used in experimental science, normally not even the scientists believe their process of interest really conforms to that kind of model. Non-linear parametric models are more sophisticated alternatives, and can be formulated as coupled differential equations (ODEs) that (somewhat) mechanistically represent the processes of interest, and often require numerical approaches to both obtain an output and infer their parameters. Although only a few fields (like epidemiology) have a basic model of that serves as basis for almost every other extension, I believe this kind of models should be the ultimate goal of modeling any system.

Gaussian processes are somewhat non-parametric models (they do have parameters, but they are quite flexible and therefore do not have a very predefined shape of the outputs), and can in some cases be seen as in-between inflexible linear models, and computationally-intensive, complex parametric models. Nevertheless, there are several shortcomings of the regular GP-regression that need to be addressed to make it useful for inference, and prevent scientists from incurring in the same errors of using linear models indiscriminately. One of the limitations is that observations are assumed to be normally distributed; for traditional linear models this is dealt with by extending them to have non-gaussian likelihoods, converting them into what is called generalized linear models (GLMs) – GPs can also be extended in a somewhat similar way, which I should discuss sometime soon. Another thing is that GPs normally describe a univariate function, which does not capture interaction between processes like many models of coupled ODEs – this is what I am going to start to address in this post and its part-two follow-up.

While it is technically possible to have independent functions describing the different observed processes, for many systems the main interest is in determining the interaction parameters: for predator-prey models that would be the predation rate; for epidemiological models the transmission rate between infected and healthy individuals(Ross); or for host-microbe interactions it would be the rate of clearing pathogens by the immune system (Souto-Maior et al.). So apart from assuming independence, there may be different ways of coupling gaussian processes; I am going to describe a formulation that can be made with essentially the same constructs used for single-channel processes.

There are plenty of good references that describe the basics of gaussian processes formally (see Rasmussen, Williams, whose notation I will follow when applicable). More casual explanations can be found, like that in a blog post by Kat Bailey, with some code in Python that I found quite useful for a practical implementation. I recommend getting familiarized with some of the theory and computational implementations of gaussian processes before moving on to multi-channel versions of the method.

Some of the features I will describe in the next post are outlined in two papers from Bonilla et al., and Melkumyan and Ramos. The result are interacting processes with kernels between the different channels, and a covariance matrix that defines the intensity of these interactions. That may be useful to improve inference, and especially to estimate which are the most important interactions – this can be informative of the underlying mechanisms. This may (or may not) sound very fancy, so it’s important to remind yourself (and myself) that like any other statistical method there are situations where “machine learning” is useful and other where it is not; claiming to use artificial intelligence is a fast way to associate to the state-of-the-art, but these days it is more often a buzzword-fueled marketing ploy than a sign of any sort of wisdom.

References 1. Ross R. An application of the theory of probabilities to the study of a priori pathometry–Part I Proc R Soc A 1916;638:204-230
2. Souto-Maior C, Sylvestre G, Dias FBS, Gomes MGM, Maciel-De-Freitas R. Model-based inference from multiple dose, time course data reveals Wolbachia effects on infection profiles of type 1 dengue virus in Aedes aegypti. PLoS Negl Trop Dis 2018;12:e0006339
3. Carl Edward Rasmussen, Christopher K.I. Williams. Gaussian. Processes for Machine Learning. MIT Press. 2006.
4. Bonilla EV, Chai KM, Williams CKI. Multi-task Gaussian Process Prediction
5. Melkumyan A, Ramos F. Multi-Kernel Gaussian Processes

-- caetano, April 11, 2018