.. _datasets: Defining the data set ===================== The first step in an RSA is to bring the data into the correct format. The RSA toolbox uses an own Class, ``rsatoolbox.data.Dataset``. The main content of such a dataset object is a measurement by channel matrix of measured data. Additionally it allows for descriptor variables for the measurements, channels and the whole data object, which are added as python dictionaries. The simplest method for generating a dataset object is based on a numpy array of data in the right format. Then you can simply call the `Dataset` constructor to generate the object. For example, the following code creates a dataset with 10 random observations of 6 channels: .. code-block:: python import numpy, rsatoolbox data = rsatoolbox.data.Dataset(numpy.random.rand(10, 6)) To add descriptors to the dataset, we need to define a dictionary of them with lists with one entry for each measurement of channel. As an example, the following variation of the code above adds a descriptor which says that the 10 measurements were taken from 5 stimuli and which ones correspond to which stimulus and adds a label 'l' vs. 'r' for left and right measurement channels: .. code-block:: python import numpy, rsatoolbox side = ['l', 'l', 'l', 'r', 'r', 'r'] stimulus = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4] data = rsatoolbox.data.Dataset( numpy.random.rand(10, 6), channel_descriptors={'side': side}, obs_descriptors={'stimulus': stimulus}) These descriptors are used by donwnstream processing of the data to define how the measurements are combined into RDMs and can be used for manipulating the data before RDM creation as well. It is thus convenient to add all meta-information you might need to the dataset object. To manipulate the datasets, have a look at the functions of the dataset object ``sort_by``, ``split_channel``, ``split_obs``, ``subset_channel``, ``subset_obs``. Datasets can also be created (and converted to) DataFrame objects from the pandas library: .. code-block:: python df = data_in.to_DataFrame() data_out = Dataset.from_DataFrame(df) The dataset objects can also be saved to hdf5 files using their method ``save`` as in and loaded with the ``rsatoolbox.data.load_dataset`` function: .. code-block:: python data.save('test.hdf5') data_loaded = rsatoolbox.data.load_dataset('test.hdf5') .. _TemporalDatasets: Temporal data sets -------------------- Datasets with a temporal dimension are represented by the class ``rsatoolbox.data.TemporalDataset``. This class is a subclass of the ``rsatoolbox.data.Dataset`` class. The main difference is that the TemporalDataset expects ``measurements`` of shape ``(n_observations, n_channels, n_timepoints)`` and has descriptors for the temporal dimension (``time_descriptor``). As an example, we assume to have measured data from 10 trials, each with six EEG channels and a timecourse of 2s (from -.5 to 1.5 seconds, stimulus onset at 0 seconds). .. code-block:: python import numpy, rsatoolbox channel_names = ['Oz', 'O1', 'O2', 'PO3', 'PO4', 'POz'] # channel names stimulus = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4] # stimulus idx, each stimulus was presented twice sampling_rate = 30 # in Hz t = numpy.arange(-.5, 1.5, 1/sampling_rate) # time vector n_observations = len(stimulus) n_channels = len(channel_names) n_timepoints = len(t) measurements = numpy.random.randn(n_observations, n_channels, n_timepoints) # random data data = rsatoolbox.data.TemporalDataset( measurements, channel_descriptors={'names': channel_names}, obs_descriptors={'stimulus': stimulus}, time_descriptors={'time': t} ) Beyond the functions to manipulate the data provided by ``rsatoolbox.data.Dataset``, the ``rsatoolbox.data.TemporalDataset`` class provides the following functions: ``split_time``, ``subset_time``, ``bin_time``, ``convert_to_dataset``.