Model¶
The default API for a BartPy model follows the common sklearn API. In particular, it implements:
- fit
- predict
- score
For example, if we just want to train the model using default parameters, we can do:
from bartpy.sklearnmodel import SklearnModel
model = SklearnModel
model.fit(X_train, y_train)
prediction = model.predict(y_test)
The default parameters are designed to be suitable for a wide range of data, but there are a number of parameters that can be passed into the model These parameters can be cross_validated and optimized through grid search in the normal sklearn way
-
class
bartpy.sklearnmodel.
SklearnModel
(n_trees: int = 50, sigma_a: float = 0.001, sigma_b: float = 0.001, n_samples: int = 200, n_burn: int = 200, p_grow: float = 0.5, p_prune: float = 0.5, alpha: float = 0.95, beta: float = 2.0)[source]¶ The main access point to building BART models in BartPy
Parameters: - n_trees (int) – the number of trees to use, more trees will make a smoother fit, but slow training and fitting
- sigma_a (float) – shape parameter of the prior on sigma
- sigma_b (float) – scale parameter of the prior on sigma
- n_samples (int) – how many recorded samples to take
- n_burn (int) – how many samples to run without recording to reach convergence
- p_grow (float) – probability of choosing a grow mutation in tree mutation sampling
- p_prune (float) – probability of choosing a prune mutation in tree mutation sampling
- alpha (float) – prior parameter on tree structure
- beta (float) – prior parameter on tree structure
-
fit
(X: pandas.core.frame.DataFrame, y: numpy.ndarray) → bartpy.sklearnmodel.SklearnModel[source]¶ Learn the model based on training data
Parameters: - X (pd.DataFrame) – training covariates
- y (np.ndarray) – training targets
Returns: self with trained parameter values
Return type:
-
model_samples
¶ Array of the model as it was after each sample. Useful for examining for:
- examining the state of trees, nodes and sigma throughout the sampling
- out of sample prediction
Returns None if the model hasn’t been fit
Returns: Return type: List[Model]
-
predict
(X: numpy.ndarray = None)[source]¶ Predict the target corresponding to the provided covariate matrix If X is None, will predict based on training covariates
Prediction is based on the mean of all samples
Parameters: X (pd.DataFrame) – covariates to predict from Returns: predictions for the X covariates Return type: np.ndarray
-
prediction_samples
¶ Matrix of prediction samples at each point in sampling Useful for assessing convergence, calculating point estimates etc.
Returns: prediction samples with dimensionality n_samples * n_points Return type: np.ndarray