Various helper functions

class helpers.LightwoodAutocast(enabled=True)[source]

Equivalent to torch.cuda.amp.autocast, but checks device compute capability to activate the feature only when the GPU has tensor cores to leverage AMP.


data – list of str



int: nr words total, dict: word_dist, dict: nr_words_dist


helpers.bounded_evaluate_array_accuracy(true_values, predictions, **kwargs)[source]

The normal MASE accuracy inside evaluate_array_accuracy has a break point of 1.0: smaller values mean a naive forecast is better, and bigger values imply the forecast is better than a naive one. It is upper-bounded by 1e4.

This 0-1 bounded MASE variant scores the 1.0 breakpoint to be equal to 0.5. For worse-than-naive, it scales linearly (with a factor). For better-than-naive, we fix 10 as 0.99, and scaled-logarithms (with 10 and 1e4 cutoffs as respective bases) are used to squash all remaining preimages to values between 0.5 and 1.0.

Return type



Returns None, an integer, float or a string from a string

helpers.evaluate_accuracy(data, predictions, target, accuracy_functions, ts_analysis={})[source]

Dispatcher for accuracy evaluation.

  • data (DataFrame) – original dataframe.

  • predictions (Series) – output of a lightwood predictor for the input data.

  • target (str) – target column name.

  • accuracy_functions (List[str]) – list of accuracy function names. Support currently exists for scikit-learn’s metrics module, plus any custom methods that Lightwood exposes.

  • ts_analysis (Optional[dict]) – output, used to compute time series task accuracy.

Return type

Dict[str, float]


accuracy metric for a dataset and predictions.

helpers.evaluate_array_accuracy(true_values, predictions, **kwargs)[source]

Default time series forecasting accuracy method. Returns mean score over all timesteps in the forecasting horizon, as determined by the base_acc_fn (R2 score by default).

Return type


helpers.evaluate_cat_array_accuracy(true_values, predictions, **kwargs)[source]

Evaluate accuracy in categorical time series forecasting tasks.

Balanced accuracy is computed for each timestep (as determined by timeseries_settings.horizon), and the final accuracy is the reciprocal of the average score through all timesteps.

Return type


helpers.evaluate_multilabel_accuracy(true_values, predictions, **kwargs)[source]

Evaluates accuracy for multilabel/tag prediction.


weighted f1 score of predictions and ground truths.

helpers.evaluate_num_array_accuracy(true_values, predictions, **kwargs)[source]

Evaluate accuracy in numerical time series forecasting tasks. Defaults to mean absolute scaled error (MASE) if in-sample residuals are available. If this is not the case, R2 score is computed instead.

Scores are computed for each timestep (as determined by timeseries_settings.horizon), and the final accuracy is the reciprocal of the average score through all timesteps.

Return type


helpers.evaluate_regression_accuracy(true_values, predictions, **kwargs)[source]

Evaluates accuracy for regression tasks. If predictions have a lower and upper bound, then within-bound accuracy is computed: whether the ground truth value falls within the predicted region. If not, then a (positive bounded) R2 score is returned instead.


accuracy score as defined above.

helpers.gen_chars(length, character)[source]

# lambda to Generates a string consisting of length consiting of repeating character :param length: :param character: :return:

helpers.get_group_matches(data, combination, group_columns)[source]

Given a particular group combination, return the data subset that belongs to it.

Return type

Tuple[list, DataFrame]


Determines if value might be nan or inf or some other numeric value (i.e. which can be cast as float) that is not actually a number.

Return type



We use pandas :( Pandas has no way to guarantee “stability” for the type of a column, it choses to arbitrarily change it based on the values. Pandas also change the values in the columns based on the types. Lightwood relies on having None values for a cells that represent “missing” or “corrupt”.

When we assign None to a cell in a dataframe this might get turned to nan or other values, this function checks if a cell is None or any other values a pd.DataFrame might convert None to.

It also check some extra values (like '') that pandas never converts None to (hopefully). But lightwood would still consider those values “None values”, and this will allow for more generic use later.


Used instead of str.isascii because python 3.6 doesn’t have that

helpers.mase(trues, preds, scale_error, fh)[source]

Computes mean absolute scaled error. The scale corrective factor is the mean in-sample residual from the naive forecasting method.

helpers.r2_score(y_true, y_pred)[source]

Wrapper for sklearn R2 score, lower capped between 0 and 1

Return type