sciope.data package¶
Submodules¶
sciope.data.dataset module¶
Dataset Class
- class sciope.data.dataset.DataSet(name)[source]¶
Bases:
object
Class for defining a dataset for a modeling/optimization/inference run
Properties/variables: * x (inputs) * y (targets) * ts (time series) * s (summary statistics) * outlier_column_indices (columns containing outliers) * size * configurations (OrderedDict with relevant information)
Methods: * get_size (returns current size of the dataset) * add_points (add data to the dataset, data can be added incrementally) * process_outliers (check summary stats that contain outliers, and apply log scaling) * apply_func_to_columns (Applies a transformation function to selected column indices of a matrix)
- add_points(inputs=None, targets=None, time_series=None, summary_stats=None)[source]¶
Updates the dataset to include new points
- inputsndarray, optional
Usually parameter points, by default None
- targetsndarray, optional
The target for inferene/optimazation/exploration, by default None
- time_seriesndarray, optional
Simulation output trajectories, by default None
- summary_statsndarray, optional
The summary statistics, by default None
- ValueError
If all function args are None
- static apply_func_to_columns(func, matrix, idx)[source]¶
Applies a transformation function to selected column indices of a matrix
- funccallable
the transformation function
- matrixndarray
matrix to be processed
- idxndarray
indices of the matrix to be transformed
- ndarray
the transformed matrix
- ValueError
[description]
- get_size()[source]¶
Returns the current number of points in the dataset
- int
The current number of points in the dataset
- process_outliers(mode='zscore')[source]¶
Check for outliers in calculated summary stats. Outliers are the few very high or very low values that can potentially introduce bias in tasks such as parameter inference. One can either remove them, replace with mean value, or use log scale for the statistic in question. This choice is left to the user.
- modestr, optional
Either use ‘z-score’ or inter-quantile range ‘iqr’, by default ‘zscore’
- array
Indices of dataset.s columns containing outliers