# Model Evaluation and Inference¶

In this step, we evaluate a set of models on a set of data RDMs and estimate our statistical uncertainty about these evaluations to draw inferences about the underlying hypotheses. Because different forms of inference require different evaluation schemes, the topics of evaluation and inference are tightly intertwined and are discussed together here.

There are two main consideration for choosing the right evaluation procedure. First, how far we want the *inference* to generalize, and second,
how far the *models* should generalize with the same parameters. We control how far the inference aims to generalize by changing the bootstrap procedure.
For the models’ generalization we change the cross-validation procedure. For both generalizations, we can aim for generalization across subjects
and/or conditions.

Below we start with the classical fixed evaluation scheme of RSA, which applies only to models that require no fitting, then explain cross-validation and bootstrapping, and finally explain their combination.

All inference methods are implemented in `rsatoolbox.inference`

. They are named `eval_*`

and take a list of models
and a RDMs object of data RDMs as primary inputs and return a results object. Additionally, the string input method
allows you to specify which RDM comparison measure to use.
This results object contains all information required for further tests including the (co-)variance estimates for model evaluations and potentially the bootstrap samples computed.
The result of the evaluation is stored in the results object described below.

All examples on this page assume that your models are saved in a list of model called `models`

and your measured data RDMs
are saved as an RDMs object `rdms`

.
A plot of the results can be generated by using `rsatoolbox.vis.plot_model_comparison`

, which is explained further here.

A more interactive introduction on evaluations requiring bootstrap is given by this Demo.

## Fixed evaluation and inference¶

This fastest form of inference evaluates a set of models with fixed predictions and estimates the uncertainty based on the variability across data RDMs. This warrants only generalization to new subjects, not to new conditions. Thus, it is formally only applicable to situations where the used conditions cover the whole set of conditions we are interested in. An example would be RDMs across movements of the 5 fingers. In this situation, there are no other conditions we want to generalize to.

For this type of inference use `rsatoolbox.inference.eval_fixed`

. This method will evaluate all models and estimate the variances based on
the variance across data RDMs.

Example:

```
results = rsatoolbox.inference.eval_fixed(models, rdms, method='corr_cov')
rsatoolbox.vis.plot_model_comparison(results)
```

This function takes one additional argument: `theta`

. This argument allows you to set a fixed parameter for flexible models meant to enter this
type of evaluation. It should then be a list of numpy array parameter values in the same order as the models.

## Cross-validation¶

When evaluating models with flexible parameters, we need to control for the different complexity of the models.
For example, a weighted model with 3 components will fit the data less good than a weighted model with that has those 3 components
plus 2 extra components. A good way to evaluate those models is to use *cross-validation*. First, we first split the dataset
into subsets or “folds”. Then we choose each of these subsets as the *test-set* once using all others as a *training-set*,
i.e. we fit the models on all but one of the folds and evaluate on the left out fold.

In this toolbox this is split into two function calls: One for choosing the folds and one for performing the cross-validation.

The calls for creating the sets all start with rsatoolbox.inference.sets. There are three types: leave_one_out, which use each individual observation as a fold. k_fold, which set the number of folds and try to make them as equal as possible in size, and of_k, which try to assign a given number of observations to each fold.

For the second step there is a single function rsatoolbox.inference.crossval, which takes the created sets and various options for the fitting of the models as input and returns the evaluations of the models.

To control how far the models need to generalize without overfitting, we can choose whether the data are split along the subject dimension, along the condition dimension or along both.

In the following example using leave-one-out cross-validation the model will be fit to N-1 subjects (or data sets) and then evaluated on the Nth subject:

```
train_set, test_set, ceil_set = rsatoolbox.inference.sets_leave_one_out_rdm(rdms_data)
results_cv = rsatoolbox.inference.crossval(models, rdms_data, train_set, test_set, ceil_set=ceil_set, method='corr')
```

## Bootstrapping¶

There is no direct formula for the variance caused by random sampling of the conditions. Thus, we resort to bootstrapping to estimate the variance of experimental results whenever we want to generalize to new conditions or fitted models are involved.

The variance caused by random sampling of the subjects can also be estimated by using bootstrapping.
This is implemented in `rsatoolbox.inference.eval_bootstrap_rdm`

. In expectation the variance computed by this method is the same as the one
computed by `eval_fixed`

. For this type of analysis it is thus not recommended to use bootstrapping.

### Generalization over conditions¶

If we want to generalize only to the population of conditions for the exact subjects measured we can use `rsatoolbox.inference.eval_bootstrap_pattern`

.
This method will perform a bootstrap resampling of the conditions to estimate the uncertainty. This method takes the following inputs additionally
to the ones of `eval_fixed`

: `N`

sets the number of bootstrap samples to use, `pattern_descriptor`

is an optional argument to group patterns together.
If a name of a pattern_descriptor is passed, all patterns with an equal entry are included or excluded together. And `boot_noise_ceil`

switches
bootstrapping of the noise ceiling on or off. Bootstrapping the noise ceiling (`boot_noise_ceil=true`

) is slightly more accurate as average performance over subsampled RDMs
can be different from overall performance, but this takes noticeably more computation time.

### Generalization over conditions and subjects¶

If we want to generalize over both subjects and conditions/stimuli, we need to apply our novel 2D bootstrap method. This method evaluates the variances
under resampling subjects and conditions both simultaneously and separately and combines these estimates into an estimate of the overall variances
of the estimates. This methods is implemented as `rsatoolbox.inference.dual_bootstrap`

. The only additional parameter relevant for this computation
is `rdm_descriptor`

, which allows sampling rdms together the same way `pattern_descriptor`

allows sampling conditions together.

## Bootstrap-cross-validation¶

For evaluating flexible models and estimating our uncertainty about these evaluations we can combine cross-validation and the bootstrap.
This is also included in `dual_bootstrap`

. This requires a few more inputs that can be ignored when all inputs are fixed models.
In particular, `k_pattern`

and `k_rdm`

control how many cross-validation folds are used along the two dimensions. These can be set to
`None`

to use the default number of folds, to 1 to turn off cross-validation.

## Results objects¶

A results object contains all information about the analysis that requires substantial computation time. The intended use is to pass this object directly to visualization functions, test function etc. and do not need to consult the contents directly often. They are accessible for direct access nonetheless.

The results object contains the following information:

`cv_method`

:

a string specifying the inference method used

`diff_var`

:

variances estimates for all pairwise model differences as a 2D numpy array

`dof`

:

Degrees of freedom for t-tests. The number of levels of the smaller factor generalization is attempted over minus 1. For a dataset with 20 stimuli and 10 subjects this would be 9 for generalization over both or subjects only and 19 for generalization over stimuli only.

`evaluations`

:

all evaluation values computed. This is an up to 4 dimensional numpy array (boostrap samples x models x cross-validation folds (rdm + pattern)).

`method`

:

the RDM similarity measure used for evaluation.

`model_var`

:

variance estimate for each model

`n_model`

:

the number of models evaluated

`noise_ceiling`

:

noise ceiling estimate

`noise_ceil_var`

:

variance estimate for the noise ceiling

`variances`

:

internal covariance matrix over models and the noise ceiling. Usually,

`model_var`

,`diff_var`

, and`noise_ceil_var`

, which are derived from this matrix are meant for user access.