Defining the data set¶
The first step in an RSA is to bring the data into the correct format. The RSA toolbox uses an own Class, rsatoolbox.data.Dataset
.
The main content of such a dataset object is a measurement by channel matrix of measured data. Additionally it allows for descriptor variables
for the measurements, channels and the whole data object, which are added as python dictionaries.
The simplest method for generating a dataset object is based on a numpy array of data in the right format. Then you can simply call the Dataset constructor to generate the object. For example, the following code creates a dataset with 10 random observations of 6 channels:
import numpy, rsatoolbox
data = rsatoolbox.data.Dataset(numpy.random.rand(10, 6))
To add descriptors to the dataset, we need to define a dictionary of them with lists with one entry for each measurement of channel. As an example, the following variation of the code above adds a descriptor which says that the 10 measurements were taken from 5 stimuli and which ones correspond to which stimulus and adds a label ‘l’ vs. ‘r’ for left and right measurement channels:
import numpy, rsatoolbox
side = ['l', 'l', 'l', 'r', 'r', 'r']
stimulus = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
data = rsatoolbox.data.Dataset(
numpy.random.rand(10, 6),
channel_descriptors={'side': side},
obs_descriptors={'stimulus': stimulus})
These descriptors are used by donwnstream processing of the data to define how the measurements are combined into RDMs and can be used for manipulating the data before RDM creation as well. It is thus convenient to add all meta-information you might need to the dataset object.
To manipulate the datasets, have a look at the functions of the dataset object
sort_by
, split_channel
, split_obs
, subset_channel
, subset_obs
.
Datasets can also be created (and converted to) DataFrame objects from the pandas library:
df = data_in.to_DataFrame()
data_out = Dataset.from_DataFrame(df)
The dataset objects can also be saved to hdf5 files using their method save
as in and loaded with the rsatoolbox.data.load_dataset
function:
data.save('test.hdf5')
data_loaded = rsatoolbox.data.load_dataset('test.hdf5')
Temporal data sets¶
Datasets with a temporal dimension are represented by the class rsatoolbox.data.TemporalDataset
. This class is a subclass of the
rsatoolbox.data.Dataset
class. The main difference is that the TemporalDataset expects measurements
of shape
(n_observations, n_channels, n_timepoints)
and has descriptors for the temporal dimension (time_descriptor
).
As an example, we assume to have measured data from 10 trials, each with six EEG channels and a timecourse of 2s (from -.5 to 1.5 seconds, stimulus onset at 0 seconds).
import numpy, rsatoolbox
channel_names = ['Oz', 'O1', 'O2', 'PO3', 'PO4', 'POz'] # channel names
stimulus = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4] # stimulus idx, each stimulus was presented twice
sampling_rate = 30 # in Hz
t = numpy.arange(-.5, 1.5, 1/sampling_rate) # time vector
n_observations = len(stimulus)
n_channels = len(channel_names)
n_timepoints = len(t)
measurements = numpy.random.randn(n_observations, n_channels, n_timepoints) # random data
data = rsatoolbox.data.TemporalDataset(
measurements,
channel_descriptors={'names': channel_names},
obs_descriptors={'stimulus': stimulus},
time_descriptors={'time': t}
)
Beyond the functions to manipulate the data provided by rsatoolbox.data.Dataset
, the rsatoolbox.data.TemporalDataset
class provides the following functions:
split_time
, subset_time
, bin_time
, convert_to_dataset
.