Model

The default API for a BartPy model follows the common sklearn API. In particular, it implements:

  • fit
  • predict
  • score

For example, if we just want to train the model using default parameters, we can do:

from bartpy.sklearnmodel import SklearnModel
model = SklearnModel
model.fit(X_train, y_train)
prediction = model.predict(y_test)

The default parameters are designed to be suitable for a wide range of data, but there are a number of parameters that can be passed into the model These parameters can be cross_validated and optimized through grid search in the normal sklearn way

class bartpy.sklearnmodel.SklearnModel(n_trees: int = 50, sigma_a: float = 0.001, sigma_b: float = 0.001, n_samples: int = 200, n_burn: int = 200, p_grow: float = 0.5, p_prune: float = 0.5, alpha: float = 0.95, beta: float = 2.0)[source]

The main access point to building BART models in BartPy

Parameters:
  • n_trees (int) – the number of trees to use, more trees will make a smoother fit, but slow training and fitting
  • sigma_a (float) – shape parameter of the prior on sigma
  • sigma_b (float) – scale parameter of the prior on sigma
  • n_samples (int) – how many recorded samples to take
  • n_burn (int) – how many samples to run without recording to reach convergence
  • p_grow (float) – probability of choosing a grow mutation in tree mutation sampling
  • p_prune (float) – probability of choosing a prune mutation in tree mutation sampling
  • alpha (float) – prior parameter on tree structure
  • beta (float) – prior parameter on tree structure
fit(X: pandas.core.frame.DataFrame, y: numpy.ndarray) → bartpy.sklearnmodel.SklearnModel[source]

Learn the model based on training data

Parameters:
  • X (pd.DataFrame) – training covariates
  • y (np.ndarray) – training targets
Returns:

self with trained parameter values

Return type:

SklearnModel

model_samples

Array of the model as it was after each sample. Useful for examining for:

  • examining the state of trees, nodes and sigma throughout the sampling
  • out of sample prediction

Returns None if the model hasn’t been fit

Returns:
Return type:List[Model]
predict(X: numpy.ndarray = None)[source]

Predict the target corresponding to the provided covariate matrix If X is None, will predict based on training covariates

Prediction is based on the mean of all samples

Parameters:X (pd.DataFrame) – covariates to predict from
Returns:predictions for the X covariates
Return type:np.ndarray
prediction_samples

Matrix of prediction samples at each point in sampling Useful for assessing convergence, calculating point estimates etc.

Returns:prediction samples with dimensionality n_samples * n_points
Return type:np.ndarray