Model Evaluation and Inference¶
In this step, we evaluate a set of models on a set of data RDMs and estimate our statistical uncertainty about these evaluations to draw inferences about the underlying hypotheses. Because different forms of inference require different evaluation schemes, the topics of evaluation and inference are tightly intertwined and are discussed together here.
There are two main consideration for choosing the right evaluation procedure. First, how far we want the inference to generalize, and second, how far the models should generalize with the same parameters. We control how far the inference aims to generalize by changing the bootstrap procedure. For the models’ generalization we change the cross-validation procedure. For both generalizations, we can aim for generalization across subjects and/or conditions.
Below we start with the classical fixed evaluation scheme of RSA, which applies only to models that require no fitting, then explain cross-validation and bootstrapping, and finally explain their combination.
All inference methods are implemented in
rsatoolbox.inference. They are named
eval_* and take a list of models
and a RDMs object of data RDMs as primary inputs and return a results object. Additionally, the string input method
allows you to specify which RDM comparison measure to use.
This results object contains all information required for further tests including the (co-)variance estimates for model evaluations and potentially the bootstrap samples computed.
The result of the evaluation is stored in the results object described below.
All examples on this page assume that your models are saved in a list of model called
models and your measured data RDMs
are saved as an RDMs object
A plot of the results can be generated by using
rsatoolbox.vis.plot_model_comparison, which is explained further here.
A more interactive introduction on evaluations requiring bootstrap is given by this Demo.
Fixed evaluation and inference¶
This fastest form of inference evaluates a set of models with fixed predictions and estimates the uncertainty based on the variability across data RDMs. This warrants only generalization to new subjects, not to new conditions. Thus, it is formally only applicable to situations where the used conditions cover the whole set of conditions we are interested in. An example would be RDMs across movements of the 5 fingers. In this situation, there are no other conditions we want to generalize to.
For this type of inference use
rsatoolbox.inference.eval_fixed. This method will evaluate all models and estimate the variances based on
the variance across data RDMs.
results = rsatoolbox.inference.eval_fixed(models, rdms, method='corr_cov') rsatoolbox.vis.plot_model_comparison(results)
This function takes one additional argument:
theta. This argument allows you to set a fixed parameter for flexible models meant to enter this
type of evaluation. It should then be a list of numpy array parameter values in the same order as the models.
When evaluating models with flexible parameters, we need to control for the different complexity of the models. For example, a weighted model with 3 components will fit the data less good than a weighted model with that has those 3 components plus 2 extra components. A good way to evaluate those models is to use cross-validation. First, we first split the dataset into subsets or “folds”. Then we choose each of these subsets as the test-set once using all others as a training-set, i.e. we fit the models on all but one of the folds and evaluate on the left out fold.
In this toolbox this is split into two function calls: One for choosing the folds and one for performing the cross-validation.
The calls for creating the sets all start with rsatoolbox.inference.sets. There are three types: leave_one_out, which use each individual observation as a fold. k_fold, which set the number of folds and try to make them as equal as possible in size, and of_k, which try to assign a given number of observations to each fold.
For the second step there is a single function rsatoolbox.inference.crossval, which takes the created sets and various options for the fitting of the models as input and returns the evaluations of the models.
To control how far the models need to generalize without overfitting, we can choose whether the data are split along the subject dimension, along the condition dimension or along both.
In the following example using leave-one-out cross-validation the model will be fit to N-1 subjects (or data sets) and then evaluated on the Nth subject:
train_set, test_set, ceil_set = rsatoolbox.inference.sets_leave_one_out_rdm(rdms_data) results_cv = rsatoolbox.inference.crossval(models, rdms_data, train_set, test_set, ceil_set=ceil_set, method='corr')
There is no direct formula for the variance caused by random sampling of the conditions. Thus, we resort to bootstrapping to estimate the variance of experimental results whenever we want to generalize to new conditions or fitted models are involved.
The variance caused by random sampling of the subjects can also be estimated by using bootstrapping.
This is implemented in
rsatoolbox.inference.eval_bootstrap_rdm. In expectation the variance computed by this method is the same as the one
eval_fixed. For this type of analysis it is thus not recommended to use bootstrapping.
Generalization over conditions¶
If we want to generalize only to the population of conditions for the exact subjects measured we can use
This method will perform a bootstrap resampling of the conditions to estimate the uncertainty. This method takes the following inputs additionally
to the ones of
N sets the number of bootstrap samples to use,
pattern_descriptor is an optional argument to group patterns together.
If a name of a pattern_descriptor is passed, all patterns with an equal entry are included or excluded together. And
bootstrapping of the noise ceiling on or off. Bootstrapping the noise ceiling (
boot_noise_ceil=true) is slightly more accurate as average performance over subsampled RDMs
can be different from overall performance, but this takes noticeably more computation time.
Generalization over conditions and subjects¶
If we want to generalize over both subjects and conditions/stimuli, we need to apply our novel 2D bootstrap method. This method evaluates the variances
under resampling subjects and conditions both simultaneously and separately and combines these estimates into an estimate of the overall variances
of the estimates. This methods is implemented as
rsatoolbox.inference.dual_bootstrap. The only additional parameter relevant for this computation
rdm_descriptor, which allows sampling rdms together the same way
pattern_descriptor allows sampling conditions together.
For evaluating flexible models and estimating our uncertainty about these evaluations we can combine cross-validation and the bootstrap.
This is also included in
dual_bootstrap. This requires a few more inputs that can be ignored when all inputs are fixed models.
k_rdm control how many cross-validation folds are used along the two dimensions. These can be set to
None to use the default number of folds, to 1 to turn off cross-validation.
A results object contains all information about the analysis that requires substantial computation time. The intended use is to pass this object directly to visualization functions, test function etc. and do not need to consult the contents directly often. They are accessible for direct access nonetheless.
The results object contains the following information:
a string specifying the inference method used
variances estimates for all pairwise model differences as a 2D numpy array
Degrees of freedom for t-tests. The number of levels of the smaller factor generalization is attempted over minus 1. For a dataset with 20 stimuli and 10 subjects this would be 9 for generalization over both or subjects only and 19 for generalization over stimuli only.
all evaluation values computed. This is an up to 4 dimensional numpy array (boostrap samples x models x cross-validation folds (rdm + pattern)).
the RDM similarity measure used for evaluation.
variance estimate for each model
the number of models evaluated
noise ceiling estimate
variance estimate for the noise ceiling
internal covariance matrix over models and the noise ceiling. Usually,
noise_ceil_var, which are derived from this matrix are meant for user access.