Mixers

Mixers learn to map encoded representation, they are the core of lightwood’s automl.

class mixer.ARIMAMixer(stop_after, target, dtype_dict, horizon, ts_analysis, model_path='statsforecast.StatsForecastAutoARIMA', auto_size=True, sp=None, hyperparam_search=False, use_stl=False)[source]

This mixer is a wrapper around the popular time series library sktime. It exhibits different behavior compared to other forecasting mixers, as it predicts based on indices in a forecasting horizon that is defined with respect to the last seen data point at training time.

Due to this, the mixer tries to “fit_on_all” so that the latest point in the validation split marks the difference between training data and where forecasts will start. In practice, you need to specify how much time has passed since the aforementioned timestamp for correct forecasts. By default, it is assumed that

predictions are for the very next timestamp post-training.

If the task has groups (i.e. ‘TimeseriesSettings.group_by’ is not empty), the mixer will spawn one forecaster object per each different group observed at training time, plus an additional default forecaster fit with all data.

There is an optuna-based automatic hyperparameter search. For now, it considers selecting the forecaster type based on the global SMAPE error across all groups.

Parameters
  • stop_after (float) – time budget in seconds.

  • target (str) – column to forecast.

  • dtype_dict (Dict[str, str]) – dtypes of all columns in the data.

  • horizon (int) – length of forecasted horizon.

  • sp (Optional[int]) – seasonality period to enforce (instead of automatic inference done at the ts_analysis module)

  • ts_analysis (Dict) – dictionary with miscellaneous time series info, as generated by ‘lightwood.data.timeseries_analyzer’.

  • model_path (str) – sktime forecaster to use as underlying model(s). Should be a string with format “$module.$class’ where ‘$module’ is inside sktime.forecasting. Default is ‘arima.AutoARIMA’.

  • hyperparam_search (bool) – bool that indicates whether to perform the hyperparameter tuning or not.

  • auto_size (bool) – whether to filter out old data points if training split is bigger than a certain threshold (defined by the dataset sampling frequency). Enabled by default to avoid long training times in big datasets.

  • use_stl (bool) – Whether to use de-trenders and de-seasonalizers fitted in the timeseries analysis phase.

class mixer.BaseMixer(stop_after)[source]

Base class for all mixers.

Mixers are the backbone of all Lightwood machine learning models. They intake encoded feature representations for every column, and are tasked with learning to fulfill the predictive requirements stated in a problem definition.

There are two important methods for any mixer to work:
  1. fit() contains all logic to train the mixer with the training data that has been encoded by all the (already trained) Lightwood encoders for any given task.

  2. __call__() is executed to generate predictions once the mixer has been trained using fit().

An additional partial_fit() method is used to update any mixer that has already been trained.

Class Attributes: - stable: If set to True, this mixer should always work. Any mixer with stable=False can be expected to fail under some circumstances. - fit_data_len: Length of the training data. - supports_proba: For classification tasks, whether the mixer supports yielding per-class scores rather than only returning the predicted label.

Parameters

stop_after (float) – Time budget to train this mixer.

fit(train_data, dev_data)[source]

Fits/trains a mixer with training data.

Parameters
  • train_data (EncodedDs) – encoded representations of the training data subset.

  • dev_data (EncodedDs) – encoded representations of the “dev” data subset. This can be used as an internal validation subset (e.g. it is used for early stopping in the default Neural mixer).

Return type

None

partial_fit(train_data, dev_data)[source]

Partially fits/trains a mixer with new training data. This is a somewhat experimental method, and it aims at updating pre-existing Lightwood predictors.

Parameters
  • train_data (EncodedDs) – encoded representations of the new training data subset.

  • dev_data (EncodedDs) – encoded representations of new the “dev” data subset. As in fit(), this can be used as an internal validation subset.

Return type

None

class mixer.ETSMixer(stop_after, target, dtype_dict, horizon, ts_analysis, model_path='ets.AutoETS', auto_size=True, sp=None, hyperparam_search=False, use_stl=True)[source]

This mixer is a wrapper around the popular time series library sktime. It exhibits different behavior compared to other forecasting mixers, as it predicts based on indices in a forecasting horizon that is defined with respect to the last seen data point at training time.

Due to this, the mixer tries to “fit_on_all” so that the latest point in the validation split marks the difference between training data and where forecasts will start. In practice, you need to specify how much time has passed since the aforementioned timestamp for correct forecasts. By default, it is assumed that

predictions are for the very next timestamp post-training.

If the task has groups (i.e. ‘TimeseriesSettings.group_by’ is not empty), the mixer will spawn one forecaster object per each different group observed at training time, plus an additional default forecaster fit with all data.

There is an optuna-based automatic hyperparameter search. For now, it considers selecting the forecaster type based on the global SMAPE error across all groups.

Parameters
  • stop_after (float) – time budget in seconds.

  • target (str) – column to forecast.

  • dtype_dict (Dict[str, str]) – dtypes of all columns in the data.

  • horizon (int) – length of forecasted horizon.

  • sp (Optional[int]) – seasonality period to enforce (instead of automatic inference done at the ts_analysis module)

  • ts_analysis (Dict) – dictionary with miscellaneous time series info, as generated by ‘lightwood.data.timeseries_analyzer’.

  • model_path (str) – sktime forecaster to use as underlying model(s). Should be a string with format “$module.$class’ where ‘$module’ is inside sktime.forecasting. Default is ‘arima.AutoARIMA’.

  • hyperparam_search (bool) – bool that indicates whether to perform the hyperparameter tuning or not.

  • auto_size (bool) – whether to filter out old data points if training split is bigger than a certain threshold (defined by the dataset sampling frequency). Enabled by default to avoid long training times in big datasets.

  • use_stl (bool) – Whether to use de-trenders and de-seasonalizers fitted in the timeseries analysis phase.

class mixer.LightGBM(stop_after, target, dtype_dict, input_cols, fit_on_dev, use_optuna, target_encoder)[source]
Parameters
  • stop_after (float) – time budget in seconds.

  • target (str) – name of the target column that the mixer will learn to predict.

  • dtype_dict (Dict[str, str]) – dictionary with dtypes of all columns in the data.

  • input_cols (List[str]) – list of column names.

  • fit_on_dev (bool) – whether to perform a partial_fit() at the end of fit() using the dev data split.

  • use_optuna (bool) – whether to activate the automated hyperparameter search (optuna-based). Note that setting this flag to True does not guarantee the search will run, rather, the speed criteria will be checked first (i.e., if a single iteration is too slow with respect to the time budget, the search will not take place).

fit(train_data, dev_data)[source]

Fits the LightGBM model.

Parameters
  • train_data (EncodedDs) – encoded features for training dataset

  • dev_data (EncodedDs) – encoded features for dev dataset

Return type

None

partial_fit(train_data, dev_data)[source]

Updates the LightGBM model.

Parameters
  • train_data (EncodedDs) – encoded features for (new) training dataset

  • dev_data (EncodedDs) – encoded features for (new) dev dataset

Return type

None

supports_proba: bool

Gradient boosting mixer with a LightGBM backbone.

This mixer is a good all-rounder, due to the generally great performance of tree-based ML algorithms for supervised learning tasks with tabular data. If you want more information regarding the techniques that set apart LightGBM from other gradient boosters, please refer to their technical paper: “LightGBM: A Highly Efficient Gradient Boosting Decision Tree” (2017).

We can basically think of this mixer as a wrapper to the LightGBM interface. To do so, there are a few caveats the user may want to be aware about:
  • If you seek GPU utilization, LightGBM must be compiled from source instead of being installed through pip.

  • Integer, float, and quantity dtype`s are treated as regression tasks with `L2 loss. All other supported dtype`s is casted as a multiclass task with `multi_logloss loss.

  • It has an automatic optuna-based hyperparameter search. This procedure triggers when a single iteration of LightGBM is deemed fast enough (given the time budget).

  • A partial fit can be performed with the dev data split as part of fit, if specified with the fit_on_dev argument.

class mixer.LightGBMArray(stop_after, target, dtype_dict, input_cols, fit_on_dev, target_encoder, ts_analysis, use_stl, tss)[source]

LightGBM-based model, intended for usage in time series tasks.

Parameters

stop_after (float) – Time budget to train this mixer.

fit(train_data, dev_data)[source]

Fits/trains a mixer with training data.

Parameters
  • train_data (EncodedDs) – encoded representations of the training data subset.

  • dev_data (EncodedDs) – encoded representations of the “dev” data subset. This can be used as an internal validation subset (e.g. it is used for early stopping in the default Neural mixer).

Return type

None

partial_fit(train_data, dev_data)[source]

Partially fits/trains a mixer with new training data. This is a somewhat experimental method, and it aims at updating pre-existing Lightwood predictors.

Parameters
  • train_data (EncodedDs) – encoded representations of the new training data subset.

  • dev_data (EncodedDs) – encoded representations of new the “dev” data subset. As in fit(), this can be used as an internal validation subset.

Return type

None

class mixer.NHitsMixer(stop_after, target, horizon, window, dtype_dict, ts_analysis, pretrained=False)[source]

Wrapper around a MQN-HITS deep learning model.

Parameters
  • stop_after (float) – time budget in seconds.

  • target (str) – column to forecast.

  • horizon (int) – length of forecasted horizon.

  • window (int) – length of input data.

  • ts_analysis (Dict) – dictionary with miscellaneous time series info, as generated by ‘lightwood.data.timeseries_analyzer’.

fit(train_data, dev_data)[source]

Fits the N-HITS model.

Return type

None

partial_fit(train_data, dev_data)[source]

Due to how lightwood implements the update procedure, expected inputs for this method are:

Parameters
  • dev_data (EncodedDs) – original test split (used to validate and select model if ensemble is BestOf).

  • train_data (EncodedDs) – concatenated original train and dev splits.

Return type

None

class mixer.Neural(stop_after, target, dtype_dict, target_encoder, net, fit_on_dev, search_hyperparameters, n_epochs=None)[source]

The Neural mixer trains a fully connected dense network from concatenated encoded outputs of each of the features in the dataset to predicted the encoded output.

Parameters
  • stop_after (float) – How long the total fitting process should take

  • target (str) – Name of the target column

  • dtype_dict (Dict[str, str]) – Data type dictionary

  • target_encoder (BaseEncoder) – Reference to the encoder used for the target

  • net (str) – The network type to use (DeafultNet or ArNet)

  • fit_on_dev (bool) – If we should fit on the dev dataset

  • search_hyperparameters (bool) – If the network should run a more through hyperparameter search (currently disabled)

  • n_epochs (Optional[int]) – amount of epochs that the network will be trained for. Supersedes all other early stopping criteria if specified.

fit(train_data, dev_data)[source]

Fits/trains a mixer with training data.

Parameters
  • train_data (EncodedDs) – encoded representations of the training data subset.

  • dev_data (EncodedDs) – encoded representations of the “dev” data subset. This can be used as an internal validation subset (e.g. it is used for early stopping in the default Neural mixer).

Return type

None

partial_fit(train_data, dev_data)[source]

Augments the mixer’s fit with new data, nr of epochs is based on the amount of epochs the original fitting took

Parameters
  • train_data (EncodedDs) – The network is fit/trained on this

  • dev_data (EncodedDs) – Data used for early stopping and hyperparameter determination

Return type

None

class mixer.NeuralTs(stop_after, target, dtype_dict, timeseries_settings, target_encoder, net, fit_on_dev, search_hyperparameters, ts_analysis, n_epochs=None, use_stl=False)[source]

Subclassed Neural mixer used for time series forecasting scenarios.

Parameters
  • stop_after (float) – How long the total fitting process should take

  • target (str) – Name of the target column

  • dtype_dict (Dict[str, str]) – Data type dictionary

  • timeseries_settings (TimeseriesSettings) – TimeseriesSettings object for time-series tasks, refer to its documentation for available settings.

  • target_encoder (BaseEncoder) – Reference to the encoder used for the target

  • net (str) – The network type to use (DeafultNet or ArNet)

  • fit_on_dev (bool) – If we should fit on the dev dataset

  • search_hyperparameters (bool) – If the network should run a more through hyperparameter search (currently disabled)

  • n_epochs (Optional[int]) – amount of epochs that the network will be trained for. Supersedes all other early stopping criteria if specified.

fit(train_data, dev_data)[source]

Fits/trains a mixer with training data.

Parameters
  • train_data (EncodedDs) – encoded representations of the training data subset.

  • dev_data (EncodedDs) – encoded representations of the “dev” data subset. This can be used as an internal validation subset (e.g. it is used for early stopping in the default Neural mixer).

Return type

None

class mixer.ProphetMixer(stop_after, target, dtype_dict, horizon, ts_analysis, model_path='fbprophet.Prophet', auto_size=True, sp=None, hyperparam_search=False, use_decomposers={})[source]

This mixer is a wrapper around the popular time series library sktime. It exhibits different behavior compared to other forecasting mixers, as it predicts based on indices in a forecasting horizon that is defined with respect to the last seen data point at training time.

Due to this, the mixer tries to “fit_on_all” so that the latest point in the validation split marks the difference between training data and where forecasts will start. In practice, you need to specify how much time has passed since the aforementioned timestamp for correct forecasts. By default, it is assumed that

predictions are for the very next timestamp post-training.

If the task has groups (i.e. ‘TimeseriesSettings.group_by’ is not empty), the mixer will spawn one forecaster object per each different group observed at training time, plus an additional default forecaster fit with all data.

There is an optuna-based automatic hyperparameter search. For now, it considers selecting the forecaster type based on the global SMAPE error across all groups.

Parameters
  • stop_after (float) – time budget in seconds.

  • target (str) – column to forecast.

  • dtype_dict (Dict[str, str]) – dtypes of all columns in the data.

  • horizon (int) – length of forecasted horizon.

  • sp (Optional[int]) – seasonality period to enforce (instead of automatic inference done at the ts_analysis module)

  • ts_analysis (Dict) – dictionary with miscellaneous time series info, as generated by ‘lightwood.data.timeseries_analyzer’.

  • model_path (str) – sktime forecaster to use as underlying model(s). Should be a string with format “$module.$class’ where ‘$module’ is inside sktime.forecasting. Default is ‘arima.AutoARIMA’.

  • hyperparam_search (bool) – bool that indicates whether to perform the hyperparameter tuning or not.

  • auto_size (bool) – whether to filter out old data points if training split is bigger than a certain threshold (defined by the dataset sampling frequency). Enabled by default to avoid long training times in big datasets.

  • use_stl – Whether to use de-trenders and de-seasonalizers fitted in the timeseries analysis phase.

class mixer.Regression(stop_after, target_encoder, dtype_dict, target)[source]

The Regression mixer inherits from scikit-learn’s Ridge class (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html)

This class performs Ordinary Least-squares Regression (OLS) under the hood; this means it fits a set of coefficients (w_1, w_2, …, w_N) for an N-length feature vector, that minimize the difference between the predicted target value and the observed true value.

This mixer intakes featurized (encoded) data to predict the target. It deploys if the target data-type is considered numerical/integer.

Parameters
  • stop_after (float) – Maximum amount of seconds it should fit for, currently ignored

  • target_encoder (BaseEncoder) – The encoder which will be used to decode the target

  • dtype_dict (dict) – A map of feature names and their data types

  • target (str) – Name of the target column

fit(train_data, dev_data)[source]

Fits Ridge model on input feature data to provide predictions.

Parameters
  • train_data (EncodedDs) – Regression if fit on this

  • dev_data (EncodedDs) – This just gets concatenated to the train_data

Return type

None

partial_fit(train_data, dev_data)[source]

Fits the linear regression on some data, this refits the model entirely rather than updating it

Parameters
  • train_data (EncodedDs) – Regression is fit on this

  • dev_data (EncodedDs) – This just gets concatenated to the train_data

Return type

None

class mixer.SkTime(stop_after, target, dtype_dict, horizon, ts_analysis, model_path=None, auto_size=True, sp=None, hyperparam_search=True, use_stl=False)[source]

This mixer is a wrapper around the popular time series library sktime. It exhibits different behavior compared to other forecasting mixers, as it predicts based on indices in a forecasting horizon that is defined with respect to the last seen data point at training time.

Due to this, the mixer tries to “fit_on_all” so that the latest point in the validation split marks the difference between training data and where forecasts will start. In practice, you need to specify how much time has passed since the aforementioned timestamp for correct forecasts. By default, it is assumed that

predictions are for the very next timestamp post-training.

If the task has groups (i.e. ‘TimeseriesSettings.group_by’ is not empty), the mixer will spawn one forecaster object per each different group observed at training time, plus an additional default forecaster fit with all data.

There is an optuna-based automatic hyperparameter search. For now, it considers selecting the forecaster type based on the global SMAPE error across all groups.

Parameters
  • stop_after (float) – time budget in seconds.

  • target (str) – column to forecast.

  • dtype_dict (Dict[str, str]) – dtypes of all columns in the data.

  • horizon (int) – length of forecasted horizon.

  • sp (Optional[int]) – seasonality period to enforce (instead of automatic inference done at the ts_analysis module)

  • ts_analysis (Dict) – dictionary with miscellaneous time series info, as generated by ‘lightwood.data.timeseries_analyzer’.

  • model_path (Optional[str]) – sktime forecaster to use as underlying model(s). Should be a string with format “$module.$class’ where ‘$module’ is inside sktime.forecasting. Default is ‘arima.AutoARIMA’.

  • hyperparam_search (bool) – bool that indicates whether to perform the hyperparameter tuning or not.

  • auto_size (bool) – whether to filter out old data points if training split is bigger than a certain threshold (defined by the dataset sampling frequency). Enabled by default to avoid long training times in big datasets.

  • use_stl (bool) – Whether to use de-trenders and de-seasonalizers fitted in the timeseries analysis phase.

fit(train_data, dev_data)[source]

Fits a set of sktime forecasters. The number of models depends on how many groups are observed at training time.

Forecaster type can be specified by providing the model_class argument in __init__(). It can also be determined by hyperparameter optimization based on dev data validation error.

Return type

None

partial_fit(train_data, dev_data)[source]

Note: sktime asks for “specification of the time points for which forecasts are requested”, and this mixer complies by assuming forecasts will start immediately after the last observed value.

Because of this, ProblemDefinition.fit_on_all is set to True so that partial_fit uses both dev and test splits to fit the models.

Due to how lightwood implements the update procedure, expected inputs for this method are:

Parameters
  • dev_data (EncodedDs) – original test split (used to validate and select model if ensemble is BestOf).

  • train_data (EncodedDs) – concatenated original train and dev splits.

Return type

None

class mixer.Unit(stop_after, target_encoder)[source]
Parameters

stop_after (float) – Time budget to train this mixer.

fit(train_data, dev_data)[source]

Fits/trains a mixer with training data.

Parameters
  • train_data (EncodedDs) – encoded representations of the training data subset.

  • dev_data (EncodedDs) – encoded representations of the “dev” data subset. This can be used as an internal validation subset (e.g. it is used for early stopping in the default Neural mixer).

Return type

None

partial_fit(train_data, dev_data)[source]

Partially fits/trains a mixer with new training data. This is a somewhat experimental method, and it aims at updating pre-existing Lightwood predictors.

Parameters
  • train_data (EncodedDs) – encoded representations of the new training data subset.

  • dev_data (EncodedDs) – encoded representations of new the “dev” data subset. As in fit(), this can be used as an internal validation subset.

Return type

None