Defining the data set

The first step in an RSA is to bring the data into the correct format. The RSA toolbox uses an own Class, rsatoolbox.data.Dataset. The main content of such a dataset object is a measurement by channel matrix of measured data. Additionally it allows for descriptor variables for the measurements, channels and the whole data object, which are added as python dictionaries.

The simplest method for generating a dataset object is based on a numpy array of data in the right format. Then you can simply call the Dataset constructor to generate the object. For example, the following code creates a dataset with 10 random observations of 6 channels:

import numpy, rsatoolbox
data = rsatoolbox.data.Dataset(numpy.random.rand(10, 6))

To add descriptors to the dataset, we need to define a dictionary of them with lists with one entry for each measurement of channel. As an example, the following variation of the code above adds a descriptor which says that the 10 measurements were taken from 5 stimuli and which ones correspond to which stimulus and adds a label ‘l’ vs. ‘r’ for left and right measurement channels:

import numpy, rsatoolbox
side = ['l', 'l', 'l', 'r', 'r', 'r']
stimulus = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
data = rsatoolbox.data.Dataset(
    numpy.random.rand(10, 6),
    channel_descriptors={'side': side},
    obs_descriptors={'stimulus': stimulus})

These descriptors are used by donwnstream processing of the data to define how the measurements are combined into RDMs and can be used for manipulating the data before RDM creation as well. It is thus convenient to add all meta-information you might need to the dataset object.

To manipulate the datasets, have a look at the functions of the dataset object sort_by, split_channel, split_obs, subset_channel, subset_obs.

Datasets can also be created (and converted to) DataFrame objects from the pandas library:

df = data_in.to_DataFrame()
data_out = Dataset.from_DataFrame(df)

The dataset objects can also be saved to hdf5 files using their method save as in and loaded with the rsatoolbox.data.load_dataset function:

data.save('test.hdf5')
data_loaded = rsatoolbox.data.load_dataset('test.hdf5')

Temporal data sets

Datasets with a temporal dimension are represented by the class rsatoolbox.data.TemporalDataset. This class is a subclass of the rsatoolbox.data.Dataset class. The main difference is that the TemporalDataset expects measurements of shape (n_observations, n_channels, n_timepoints) and has descriptors for the temporal dimension (time_descriptor).

As an example, we assume to have measured data from 10 trials, each with six EEG channels and a timecourse of 2s (from -.5 to 1.5 seconds, stimulus onset at 0 seconds).

import numpy, rsatoolbox

channel_names = ['Oz', 'O1', 'O2', 'PO3', 'PO4', 'POz']  # channel names
stimulus = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4] # stimulus idx, each stimulus was presented twice

sampling_rate = 30 # in Hz
t = numpy.arange(-.5, 1.5, 1/sampling_rate) # time vector

n_observations = len(stimulus)
n_channels = len(channel_names)
n_timepoints = len(t)

measurements = numpy.random.randn(n_observations, n_channels, n_timepoints)  # random data

data = rsatoolbox.data.TemporalDataset(
    measurements,
    channel_descriptors={'names': channel_names},
    obs_descriptors={'stimulus': stimulus},
    time_descriptors={'time': t}
    )

Beyond the functions to manipulate the data provided by rsatoolbox.data.Dataset, the rsatoolbox.data.TemporalDataset class provides the following functions: split_time, subset_time, bin_time, convert_to_dataset.