Model Evaluation and Inference

In this step, we evaluate a set of models on a set of data RDMs and estimate our statistical uncertainty about these evaluations to draw inferences about the underlying hypotheses. Because different forms of inference require different evaluation schemes, the topics of evaluation and inference are tightly intertwined and are discussed together here.

There are two main consideration for choosing the right evaluation procedure. First, how far we want the inference to generalize, and second, how far the models should generalize with the same parameters. We control how far the inference aims to generalize by changing the bootstrap procedure. For the models’ generalization we change the cross-validation procedure. For both generalizations, we can aim for generalization across subjects and/or conditions.

Below we start with the classical fixed evaluation scheme of RSA, which applies only to models that require no fitting, then explain cross-validation and bootstrapping, and finally explain their combination.

All inference methods are implemented in rsatoolbox.inference. They are named eval_* and take a list of models and a RDMs object of data RDMs as primary inputs and return a results object. Additionally, the string input method allows you to specify which RDM comparison measure to use. This results object contains all information required for further tests including the (co-)variance estimates for model evaluations and potentially the bootstrap samples computed. The result of the evaluation is stored in the results object described below.

All examples on this page assume that your models are saved in a list of model called models and your measured data RDMs are saved as an RDMs object rdms. A plot of the results can be generated by using rsatoolbox.vis.plot_model_comparison, which is explained further here.

A more interactive introduction on evaluations requiring bootstrap is given by this Demo.

Fixed evaluation and inference

This fastest form of inference evaluates a set of models with fixed predictions and estimates the uncertainty based on the variability across data RDMs. This warrants only generalization to new subjects, not to new conditions. Thus, it is formally only applicable to situations where the used conditions cover the whole set of conditions we are interested in. An example would be RDMs across movements of the 5 fingers. In this situation, there are no other conditions we want to generalize to.

For this type of inference use rsatoolbox.inference.eval_fixed. This method will evaluate all models and estimate the variances based on the variance across data RDMs.


results = rsatoolbox.inference.eval_fixed(models, rdms, method='corr_cov')

This function takes one additional argument: theta. This argument allows you to set a fixed parameter for flexible models meant to enter this type of evaluation. It should then be a list of numpy array parameter values in the same order as the models.


When evaluating models with flexible parameters, we need to control for the different complexity of the models. For example, a weighted model with 3 components will fit the data less good than a weighted model with that has those 3 components plus 2 extra components. A good way to evaluate those models is to use cross-validation. First, we first split the dataset into subsets or “folds”. Then we choose each of these subsets as the test-set once using all others as a training-set, i.e. we fit the models on all but one of the folds and evaluate on the left out fold.

In this toolbox this is split into two function calls: One for choosing the folds and one for performing the cross-validation.

The calls for creating the sets all start with rsatoolbox.inference.sets. There are three types: leave_one_out, which use each individual observation as a fold. k_fold, which set the number of folds and try to make them as equal as possible in size, and of_k, which try to assign a given number of observations to each fold.

For the second step there is a single function rsatoolbox.inference.crossval, which takes the created sets and various options for the fitting of the models as input and returns the evaluations of the models.

To control how far the models need to generalize without overfitting, we can choose whether the data are split along the subject dimension, along the condition dimension or along both.

In the following example using leave-one-out cross-validation the model will be fit to N-1 subjects (or data sets) and then evaluated on the Nth subject:

train_set, test_set, ceil_set = rsatoolbox.inference.sets_leave_one_out_rdm(rdms_data)
results_cv = rsatoolbox.inference.crossval(models, rdms_data, train_set, test_set, ceil_set=ceil_set, method='corr')


There is no direct formula for the variance caused by random sampling of the conditions. Thus, we resort to bootstrapping to estimate the variance of experimental results whenever we want to generalize to new conditions or fitted models are involved.

The variance caused by random sampling of the subjects can also be estimated by using bootstrapping. This is implemented in rsatoolbox.inference.eval_bootstrap_rdm. In expectation the variance computed by this method is the same as the one computed by eval_fixed. For this type of analysis it is thus not recommended to use bootstrapping.

Generalization over conditions

If we want to generalize only to the population of conditions for the exact subjects measured we can use rsatoolbox.inference.eval_bootstrap_pattern. This method will perform a bootstrap resampling of the conditions to estimate the uncertainty. This method takes the following inputs additionally to the ones of eval_fixed: N sets the number of bootstrap samples to use, pattern_descriptor is an optional argument to group patterns together. If a name of a pattern_descriptor is passed, all patterns with an equal entry are included or excluded together. And boot_noise_ceil switches bootstrapping of the noise ceiling on or off. Bootstrapping the noise ceiling (boot_noise_ceil=true) is slightly more accurate as average performance over subsampled RDMs can be different from overall performance, but this takes noticeably more computation time.

Generalization over conditions and subjects

If we want to generalize over both subjects and conditions/stimuli, we need to apply our novel 2D bootstrap method. This method evaluates the variances under resampling subjects and conditions both simultaneously and separately and combines these estimates into an estimate of the overall variances of the estimates. This methods is implemented as rsatoolbox.inference.dual_bootstrap. The only additional parameter relevant for this computation is rdm_descriptor, which allows sampling rdms together the same way pattern_descriptor allows sampling conditions together.


For evaluating flexible models and estimating our uncertainty about these evaluations we can combine cross-validation and the bootstrap. This is also included in dual_bootstrap. This requires a few more inputs that can be ignored when all inputs are fixed models. In particular, k_pattern and k_rdm control how many cross-validation folds are used along the two dimensions. These can be set to None to use the default number of folds, to 1 to turn off cross-validation.

Results objects

A results object contains all information about the analysis that requires substantial computation time. The intended use is to pass this object directly to visualization functions, test function etc. and do not need to consult the contents directly often. They are accessible for direct access nonetheless.

The results object contains the following information:


a string specifying the inference method used


variances estimates for all pairwise model differences as a 2D numpy array


Degrees of freedom for t-tests. The number of levels of the smaller factor generalization is attempted over minus 1. For a dataset with 20 stimuli and 10 subjects this would be 9 for generalization over both or subjects only and 19 for generalization over stimuli only.


all evaluation values computed. This is an up to 4 dimensional numpy array (boostrap samples x models x cross-validation folds (rdm + pattern)).


the RDM similarity measure used for evaluation.


variance estimate for each model


the number of models evaluated


noise ceiling estimate


variance estimate for the noise ceiling


internal covariance matrix over models and the noise ceiling. Usually, model_var, diff_var, and noise_ceil_var, which are derived from this matrix are meant for user access.