Model¶

The default API for a BartPy model follows the common sklearn API. In particular, it implements:

fit

predict

score

For example, if we just want to train the model using default parameters, we can do:

from bartpy.sklearnmodel import SklearnModel
model = SklearnModel
model.fit(X_train, y_train)
prediction = model.predict(y_test)

The default parameters are designed to be suitable for a wide range of data, but there are a number of parameters that can be passed into the model These parameters can be cross_validated and optimized through grid search in the normal sklearn way

class bartpy.sklearnmodel.SklearnModel(n_trees: int = 50, sigma_a: float = 0.001, sigma_b: float = 0.001, n_samples: int = 200, n_burn: int = 200, p_grow: float = 0.5, p_prune: float = 0.5, alpha: float = 0.95, beta: float = 2.0)[source]¶

The main access point to building BART models in BartPy

Parameters:

n_trees (int) – the number of trees to use, more trees will make a smoother fit, but slow training and fitting
sigma_a (float) – shape parameter of the prior on sigma
sigma_b (float) – scale parameter of the prior on sigma
n_samples (int) – how many recorded samples to take
n_burn (int) – how many samples to run without recording to reach convergence
p_grow (float) – probability of choosing a grow mutation in tree mutation sampling
p_prune (float) – probability of choosing a prune mutation in tree mutation sampling
alpha (float) – prior parameter on tree structure
beta (float) – prior parameter on tree structure

fit(X: pandas.core.frame.DataFrame, y: numpy.ndarray) → bartpy.sklearnmodel.SklearnModel[source]¶

Learn the model based on training data

Parameters:	X (pd.DataFrame) – training covariates y (np.ndarray) – training targets
Returns:	self with trained parameter values
Return type:	SklearnModel

model_samples¶

Array of the model as it was after each sample. Useful for examining for:

examining the state of trees, nodes and sigma throughout the sampling

out of sample prediction

Returns None if the model hasn’t been fit

Returns:
Return type:	List[Model]

predict(X: numpy.ndarray = None)[source]¶

Predict the target corresponding to the provided covariate matrix If X is None, will predict based on training covariates

Prediction is based on the mean of all samples

Parameters:	X (pd.DataFrame) – covariates to predict from
Returns:	predictions for the X covariates
Return type:	np.ndarray

prediction_samples¶

Matrix of prediction samples at each point in sampling Useful for assessing convergence, calculating point estimates etc.

Returns:	prediction samples with dimensionality n_samples * n_points
Return type:	np.ndarray