Recent Posts

Slides for my presentation at CMStatistics 2021 are available here. The talk was about generalized additive latent and mixed models, which is further described in this post.

Slides for my presentation at the Nordic-Baltic Biometrics Conference are available here.

The organizers of the European R User Meeting 2020 have put together a really impressive event, with lots of opportunities for interaction and stimulating discussions while being fully online. I have particularly enjoyed the good mix of academic presentations focusing on methodology and more business and industry related presentations focusing on use of R in production. Today I presented the BayesMallows package in a five-minute lightning talk, and the slides (with links) are available here.

The European R User Meeting 2020 has so far been a really great event, with interesting talks and online presentations working smoothly. I presented the metagam package this morning, and the slides are available here.

Tonight I gave a presentation of Rcpp at the Oslo UseR! Group. The slides are here.

It was a nice opportunity to meet some of the many R users in town. Thanks to Deemah for organizing!

Publications

We present generalized additive latent and mixed models (GALAMMs) for analysis of clustered data with latent and observed variables depending smoothly on observed variables. A profile likelihood algorithm is proposed, and we derive asymptotic standard errors of both smooth and parametric terms. The work was motivated by applications in cognitive neuroscience, and we show how GALAMMs can successfully model the complex lifespan trajectory of latent episodic memory, along with a discrepant trajectory of working memory, as well as the effect of latent socioeconomic status on hippocampal development.

We address the problem of estimating how different parts of the brain develop and change throughout the lifespan, and how these trajectories are affected by genetic and environmental factors. Estimation of these lifespan trajectories is statistically challenging, since their shapes are typically highly nonlinear, and although true change can only be quantified by longitudinal examinations, as follow-up intervals in neuroimaging studies typically cover less than 10% of the lifespan, use of cross-sectional information is necessary.

Analyzing data from multiple neuroimaging studies has great potential in terms of increasing statistical power, enabling detection of effects of smaller magnitude than would be possible when analyzing each study separately and also allowing to systematically investigate between-study differences. Restrictions due to privacy or proprietary data as well as more practical concerns can make it hard to share neuroimaging datasets, such that analyzing all data in a common location might be impractical or impossible.

BayesMallows is an R package for analyzing preference data in the form of rankings with the Mallows rank model, and its finite mixture extension, in a Bayesian framework. The model is grounded on the idea that the probability density of an observed ranking decreases exponentially with the distance to the location parameter. It is the first Bayesian implementation that allows wide choices of distances, and it works well with a large amount of items to be ranked.

Researchers interested in hemispheric dominance frequently aim to infer latent functional differences between the hemispheres from observed lateral behavioural or brain-activation differences. To be valid, these inferences may not only rely on the observed laterality measures but also need to account for the antecedent probabilities of the studied latent classes. This fact is frequently ignored in the literature, leading to misclassifications especially when considering low probability classes as, for example, “atypical” right hemispheric language dominance.

This is the companion paper to the hdme R package. Link to paper.

Ranking and comparing items is crucial for collecting information about preferences in many areas, from marketing to politics. The Mallows rank model is among the most successful approaches to analyze rank data, but its computational complexity has limited its use to a particular form based on Kendall distance. We develop new computationally tractable methods for Bayesian inference in Mallows models that work with any right-invariant distance. Our method performs inference on the consensus ranking of the items, also when based on partial rankings, such as top-k items or pairwise comparisons.

In many problems involving generalized linear models, the covariates are subject to measurement error. When the number of covariates p exceeds the sample size n, regularized methods like the lasso or Dantzig selector are required. Several recent papers have studied methods which correct for measurement error in the lasso or Dantzig selector for linear models in the p > n setting. We study a correction for generalized linear models, based on Rosenbaum and Tsybakov’s matrix uncertainty selector.

Regression with the lasso penalty is a popular tool for performing dimension reduction when the number of covariates is large. In many applications of the lasso, like in genomics, covariates are subject to measurement error. We study the impact of measurement error on linear regression with the lasso penalty, both analytically and in simulation experiments. A simple method of correction for measurement error in the lasso is then considered. In the large sample limit, the corrected lasso yields sign consistent covariate selection under conditions very similar to the lasso with perfect measurements, whereas the uncorrected lasso requires much more stringent conditions on the covariance structure of the data.

Software

R package available from CRAN. See also the accompanying Shiny App. Functional differences between the cerebral hemispheres are a fundamental characteristic of the human brain. Researchers interested in studying these differences often infer underlying hemispheric dominance for a certain function (e.g., language) from laterality indices calculated from observed performance or brain activation measures. However, any inference from observed measures to latent (unobserved) classes has to consider the prior probability of class membership in the population.

R package available from CRAN. Meta-analysis of generalized additive models and generalized additive mixed models. A typical use case is when data cannot be shared across locations, and an overall meta-analytic fit is sought. ‘metagam’ provides functionality for removing individual participant data from models computed using the ‘mgcv’ and ‘gamm4’ packages such that the model objects can be shared without exposing individual data. Furthermore, methods for meta-analysing these fits are provided.

R package available from CRAN. An implementation of the Bayesian version of the Mallows rank model. Both Cayley, footrule, Hamming, Kendall, Spearman, and Ulam distances are supported in the models. The rank data to be analyzed can be in the form of complete rankings, top-k rankings, partially missing rankings, as well as consistent and inconsistent pairwise preferences. Several functions for plotting and studying the posterior distributions of parameters are provided. The package also provides functions for estimating the partition function (normalizing constant) of the Mallows rank model, both with the importance sampling algorithm of Vitelli et al.

R package available from CRAN.

Penalized regression for generalized linear models for measurement error problems (aka. errors-in-variables). The package contains a version of the lasso (L1-penalization) which corrects for measurement error. It also contains an implementation of the Generalized Matrix Uncertainty Selector, which is a version the (Generalized) Dantzig Selector for the case of measurement error.