rsatoolbox.data.dataset module¶
Definition of RSA Dataset class and TemporalDataset
@author: baihan, jdiedrichsen, bpeters, adkipnis
- class rsatoolbox.data.dataset.Dataset(measurements, descriptors=None, obs_descriptors=None, channel_descriptors=None, check_dims=True)[source]¶
Bases:
DatasetBase
Dataset class is a standard version of DatasetBase. It contains one data set - or multiple data sets with the same structure
- copy() Dataset [source]¶
Return a copy of this object, with all properties equal to the original’s
- Returns
Value copy
- Return type
- static from_df(df: DataFrame, channels: Optional[List] = None, channel_descriptor: Optional[str] = None) Dataset [source]¶
Create a Dataset from a Pandas DataFrame
Float columns are interpreted as channels, and their names stored as a channel descriptor “name”. Columns of any other datatype will be interpreted as observation descriptors, unless they have the same value throughout, in which case they will be interpreted as Dataset descriptor.
- Parameters
df (DataFrame) – a long-format DataFrame
channels (list) – list of column names to interpret as channels. By default all float columns are considered channels.
channel_descriptor (str) – Name of the channel descriptor to create on the Dataset which contains the column names. Default is “name”.
- Returns
Dataset representing the data from the DataFrame
- Return type
- get_measurements_tensor(by)[source]¶
Returns a tensor version of the measurements array, split by an observation descriptor. This procedure will keep the order of measurements the same as it is in the dataset.
- Parameters
by (String) – the descriptor by which the splitting is made
- Returns
n_obs_rest x n_channel x n_obs_by 3d-array, where n_obs_by is are the unique values that the obs_descriptor “by” takes, and n_obs_rest is the remaining number of observations per unique instance of “by”
- Return type
measurements_tensor (numpy.ndarray)
- nested_odd_even_split(l1_obs_desc, l2_obs_desc)[source]¶
Nested version of odd_even_split, where dataset is first partitioned according to the l1_obs_desc and each partition is again partitioned according to the l2_obs_desc (after which the actual oe-split occurs).
Useful for balancing, especially if the order of your measurements is inconsistent, or if the two descriptors are not orthogonalized. It’s advised to apply .sort_by(l2_obs_desc) to the output of this function.
- Parameters
l1_obs_desc (str) – Observation descriptor, basis for level 1 partitioning (must contained in keys of dataset.obs_descriptors)
- Returns
- subset of the Dataset with odd list-indices after partitioning
according to obs_desc
- even_split (Dataset):
subset of the Dataset with even list-indices after partitioning according to obs_desc
- Return type
odd_split (Dataset)
- odd_even_split(obs_desc)[source]¶
Perform a simple odd-even split on an rsatoolbox dataset. It will be partitioned into n different datasets, where n is the number of distinct values on dataset.obs_descriptors[obs_desc]. The resulting list will be split into odd and even (index) subset. The datasets contained in these subsets will then be merged.
- Parameters
obs_desc (str) – Observation descriptor, basis for partitioning (must contained in keys of dataset.obs_descriptors)
- Returns
- subset of the Dataset with odd list-indices after partitioning
according to obs_desc
- even_split (Dataset):
subset of the Dataset with even list-indices after partitioning according to obs_desc
- Return type
odd_split (Dataset)
- sort_by(by)[source]¶
sorts the dataset by a given observation descriptor
- Parameters
by (String) – the descriptor by which the dataset shall be sorted
- Returns
—
- split_channel(by)[source]¶
Returns a list Datasets splited by channels
- Parameters
by (String) – the descriptor by which the split is done
- Returns
list of Datasets, split by the selected channel_descriptor
- split_obs(by)[source]¶
Returns a list Datasets splited by obs
- Parameters
by (String) – the descriptor by which the splitting is made
- Returns
list of Datasets, split by the selected obs_descriptor
- subset_channel(by, value)[source]¶
Returns a subsetted Dataset defined by certain channel value
- Parameters
by (String) – the descriptor by which the subset selection is made from channel dimension
value – the value by which the subset selection is made from channel dimension
- Returns
Dataset, with subset defined by the selected channel_descriptor
- subset_obs(by, value)[source]¶
Returns a subsetted Dataset defined by certain obs value
- Parameters
by (String) – the descriptor by which the subset selection is made from obs dimension
value – the value by which the subset selection is made from obs dimension
- Returns
Dataset, with subset defined by the selected obs_descriptor
- to_df(channel_descriptor: Optional[str] = None) DataFrame [source]¶
returns a Pandas DataFrame representing this Dataset
Channels, observation descriptors and Dataset descriptors make up the columns. Rows represent observations.
Note that channel descriptors beyond the one used for the column names will not be represented.
- Parameters
channel_descriptor – Which channel descriptor to use to label the data columns in the Dataframe. Defaults to the first channel descriptor.
- Returns
A pandas DataFrame representing the Dataset
- Return type
DataFrame
- class rsatoolbox.data.dataset.TemporalDataset(measurements, descriptors=None, obs_descriptors=None, channel_descriptors=None, time_descriptors=None, check_dims=True)[source]¶
Bases:
Dataset
TemporalDataset for spatio-temporal datasets
- Parameters
measurements (numpy.ndarray) – n_obs x n_channel x time 3d-array,
descriptors (dict) – descriptors (metadata)
obs_descriptors (dict) – observation descriptors (all are array-like with shape = (n_obs,…))
channel_descriptors (dict) – channel descriptors (all are array-like with shape = (n_channel,…))
time_descriptors (dict) –
time descriptors (alls are array-like with shape= (n_time,…))
time_descriptors needs to contain one key ‘time’ that specifies the time-coordinate. if None is provided, ‘time’ is set as (0, 1, …, n_time-1)
- Returns
dataset object
- bin_time(by, bins)[source]¶
Returns an object TemporalDataset with time-binned data.
- Parameters
bins (array-like) – list of bins, with bins[i] containing the vector of time-points for the i-th bin
- Returns
- a single TemporalDataset object
Data is averaged within time-bins. ‘time’ descriptor is set to the average of the binned time-points.
- convert_to_dataset(by)[source]¶
- converts to Dataset long format.
time dimension is absorbed into observation dimension
- Parameters
by (String) – the descriptor which indicates the time dimension in the time_descriptor
- Returns
Dataset
- copy() TemporalDataset [source]¶
Return a copy of this object, with all properties equal to the original’s
- Returns
Value copy
- Return type
- sort_by(by)[source]¶
sorts the dataset by a given observation descriptor
- Parameters
by (String) – the descriptor by which the dataset shall be sorted
- Returns
—
- split_channel(by)[source]¶
Returns a list of TemporalDataset split by channels
- Parameters
by (String) – the descriptor by which the splitting is made
- Returns
- list of TemporalDataset,
split by the selected channel_descriptor
- split_obs(by)[source]¶
Returns a list TemporalDataset splited by obs
- Parameters
by (String) – the descriptor by which the splitting is made
- Returns
list of TemporalDataset, splitted by the selected obs_descriptor
- split_time(by)[source]¶
Returns a list TemporalDataset splited by time
- Parameters
by (String) – the descriptor by which the splitting is made
- Returns
list of TemporalDataset, splitted by the selected time_descriptor
- subset_channel(by, value)[source]¶
Returns a subsetted TemporalDataset defined by a certain channel descriptor value
- Parameters
by (String) – the descriptor by which the subset selection is made from channel dimension
value – the value by which the subset selection is made from channel dimension
- Returns
TemporalDataset, with subset defined by the selected channel_descriptor
- subset_obs(by, value)[source]¶
Returns a subsetted TemporalDataset defined by certain obs value
- Parameters
by (String) – the descriptor by which the subset selection is made from obs dimension
value – the value by which the subset selection is made from obs dimension
- Returns
TemporalDataset, with subset defined by the selected obs_descriptor
- subset_time(by, t_from, t_to)[source]¶
Returns a subsetted TemporalDataset with time between t_from and t_to
- Parameters
by (String) – the descriptor by which the subset selection is made from channel dimension
t_from – time-point from which onwards data should be subsetted
t_to – time-point until which data should be subsetted
- Returns
- TemporalDataset
with subset defined by the selected time_descriptor
- rsatoolbox.data.dataset.dataset_from_dict(data_dict)[source]¶
regenerates a Dataset object from the dictionary representation
Currently this function works for Dataset, DatasetBase, and TemporalDataset objects
- Parameters
data_dict (dict) – the dictionary representation
- Returns
the regenerated Dataset
- Return type
data(Dataset)
- rsatoolbox.data.dataset.load_dataset(filename, file_type=None)[source]¶
loads a Dataset object from disc
- Parameters
filename (String) – path to file to load
- rsatoolbox.data.dataset.merge_subsets(dataset_list)[source]¶
Generate a dataset object from a list of smaller dataset objects (e.g., as generated by the subset_* methods). Assumes that descriptors, channel descriptors and number of channels per observation match.
- Parameters
dataset_list (list) – List containing rsatoolbox datasets
- Returns
rsatoolbox dataset created from all datasets in dataset_list
- Return type
merged_dataset (Dataset)