Analysis

Analyse mixer ensembles to extract static insights and train predict-time models for dynamic insights.

class analysis.AccStats(deps=('ICP',))[source]

Computes accuracy stats and a confusion matrix for the validation dataset

analyze(info, **kwargs)[source]

This method should be called once during the analysis phase, or not called at all. It computes any information that the block may either output to the model analysis object, or use at inference time when .explain() is called (in this case, make sure all needed objects are added to the runtime analyzer so that .explain() can access them).

Parameters

info (Dict[str, object]) – Dictionary where any new information or objects are added. The next analysis block will use

the output of the previous block as a starting point. :param kwargs: Dictionary with named variables from either the core analysis or the rest of the prediction pipeline.

Return type

Dict[str, object]

class analysis.BaseAnalysisBlock(deps=())[source]

Class to be inherited by any analysis/explainer block.

analyze(info, **kwargs)[source]

This method should be called once during the analysis phase, or not called at all. It computes any information that the block may either output to the model analysis object, or use at inference time when .explain() is called (in this case, make sure all needed objects are added to the runtime analyzer so that .explain() can access them).

Parameters

info (Dict[str, object]) – Dictionary where any new information or objects are added. The next analysis block will use

the output of the previous block as a starting point. :param kwargs: Dictionary with named variables from either the core analysis or the rest of the prediction pipeline.

Return type

Dict[str, object]

explain(row_insights, global_insights, **kwargs)[source]

This method should be called once during the explaining phase at inference time, or not called at all. Additional explanations can be at an instance level (row-wise) or global. For the former, return a data frame with any new insights. For the latter, a dictionary is required.

Parameters
  • row_insights (DataFrame) – dataframe with previously computed row-level explanations.

  • global_insights (Dict[str, object]) – dict() with any explanations that concern all predicted instances or the model itself.

Return type

Tuple[DataFrame, Dict[str, object]]

Returns

  • row_insights: modified input dataframe with any new row insights added here.

  • global_insights: dict() with any explanations that concern all predicted instances or the model itself.

class analysis.ConfStats(deps=('ICP',), ece_bins=10)[source]

Computes confidence-related statistics on the held-out validation dataset.

TODO: regression & forecasting tasks

analyze(info, **kwargs)[source]

This method should be called once during the analysis phase, or not called at all. It computes any information that the block may either output to the model analysis object, or use at inference time when .explain() is called (in this case, make sure all needed objects are added to the runtime analyzer so that .explain() can access them).

Parameters

info (Dict[str, object]) – Dictionary where any new information or objects are added. The next analysis block will use

the output of the previous block as a starting point. :param kwargs: Dictionary with named variables from either the core analysis or the rest of the prediction pipeline.

Return type

Dict[str, object]

class analysis.GlobalFeatureImportance(disable_column_importance)[source]

Analysis block that estimates column importance with a variant of the LOCO (leave-one-covariate-out) algorithm.

Roughly speaking, the procedure:
  • iterates over all input columns

  • if the input column is optional, then make a predict with its values set to None

  • compare this accuracy with the accuracy obtained using all data

  • all accuracy differences are passed through a softmax and reported as estimated column importance scores

Note that, crucially, this method does not refit the predictor at any point.

Reference:

https://compstat-lmu.github.io/iml_methods_limitations/pfi.html

analyze(info, **kwargs)[source]

This method should be called once during the analysis phase, or not called at all. It computes any information that the block may either output to the model analysis object, or use at inference time when .explain() is called (in this case, make sure all needed objects are added to the runtime analyzer so that .explain() can access them).

Parameters

info (Dict[str, object]) – Dictionary where any new information or objects are added. The next analysis block will use

the output of the previous block as a starting point. :param kwargs: Dictionary with named variables from either the core analysis or the rest of the prediction pipeline.

Return type

Dict[str, object]

class analysis.ICP(fixed_significance, positive_domain, confidence_normalizer)[source]

Confidence estimation block, uses inductive conformal predictors (ICPs) for model agnosticity

analyze(info, **kwargs)[source]

This method should be called once during the analysis phase, or not called at all. It computes any information that the block may either output to the model analysis object, or use at inference time when .explain() is called (in this case, make sure all needed objects are added to the runtime analyzer so that .explain() can access them).

Parameters

info (Dict[str, object]) – Dictionary where any new information or objects are added. The next analysis block will use

the output of the previous block as a starting point. :param kwargs: Dictionary with named variables from either the core analysis or the rest of the prediction pipeline.

Return type

Dict[str, object]

explain(row_insights, global_insights, **kwargs)[source]

This method should be called once during the explaining phase at inference time, or not called at all. Additional explanations can be at an instance level (row-wise) or global. For the former, return a data frame with any new insights. For the latter, a dictionary is required.

Parameters
  • row_insights (DataFrame) – dataframe with previously computed row-level explanations.

  • global_insights (Dict[str, object]) – dict() with any explanations that concern all predicted instances or the model itself.

Return type

Tuple[DataFrame, Dict[str, object]]

Returns

  • row_insights: modified input dataframe with any new row insights added here.

  • global_insights: dict() with any explanations that concern all predicted instances or the model itself.

class analysis.TempScaler[source]

Original reference (MIT Licensed): https://github.com/gpleiss/temperature_scaling NB: Output of the neural network should be the classification logits, NOT the softmax (or log softmax)! TODO

analyze(info, **kwargs)[source]

Tune and set the temperature of a neural model optimizing NLL using validation set logits.

Return type

Dict[str, object]

explain(row_insights, global_insights, **kwargs)[source]

Perform temperature scaling on logits

Return type

Tuple[DataFrame, Dict[str, object]]

analysis.explain(data, encoded_data, predictions, timeseries_settings, analysis, target_name, target_dtype, positive_domain, anomaly_detection, pred_args, explainer_blocks=[], ts_analysis={})[source]

This procedure runs at the end of every normal .predict() call. Its goal is to generate prediction insights, potentially using information generated at the model analysis stage (e.g. confidence estimation).

As in analysis(), any user-specified analysis blocks (see class BaseAnalysisBlock) are also called here.

Returns

row_insights: a DataFrame containing predictions and all generated insights at a row-level.

analysis.model_analyzer(predictor, data, train_data, stats_info, target, tss, dtype_dict, accuracy_functions, ts_analysis, analysis_blocks=[])[source]

Analyses model on a validation subset to evaluate accuracy, estimate feature importance and generate a calibration model to estimating confidence in future predictions.

Additionally, any user-specified analysis blocks (see class BaseAnalysisBlock) are also called here.

Return type

Tuple[ModelAnalysis, Dict[str, object]]

Returns

runtime_analyzer: This dictionary object gets populated in a sequential fashion with data generated from any .analyze() block call. This dictionary object is stored in the predictor itself, and used when calling the .explain() method of all analysis blocks when generating predictions.

model_analysis: ModelAnalysis object that contains core analysis metrics, not necessarily needed when predicting.