Treelite API
API of Treelite Python package.
Main API
Treelite: a model compiler for decision tree ensembles
Classes:
Branch annotator class: annotate branches in a given model using frequency patterns in the training data |
|
|
Decision tree ensemble model |
|
Builder class for tree ensemble model: provides tools to iteratively build an ensemble of decision trees |
Exceptions:
Error thrown by Treelite |
Functions:
|
Create shared library. |
|
Generate a CMakeLists.txt for a given directory of headers and sources. |
|
Generate a Makefile for a given directory of headers and sources. |
- class treelite.Annotator
Branch annotator class: annotate branches in a given model using frequency patterns in the training data
Methods:
annotate_branch
(model, dmat[, nthread, verbose])Annotate branches in a given model using frequency patterns in the training data.
save
(path)Save branch annotation infromation as a JSON file.
- annotate_branch(model, dmat, nthread=None, verbose=False)
Annotate branches in a given model using frequency patterns in the training data. Each node gets the count of the instances that belong to it. Any prior annotation information stored in the annotator will be replaced with the new annotation returned by this method.
- Parameters:
model (object of type
Model
) – decision tree ensemble modeldmat (object of type
DMatrix
) – data matrix representing the training datanthread (
int
, optional) – number of threads to use while annotating. If missing, use all physical cores in the system.verbose (
bool
, optional) – whether to produce extra messages
- class treelite.Model(handle=None)
Decision tree ensemble model
- Parameters:
handle (
ctypes.c_void_p
, optional) – Initial value of model handle
Methods:
compile
(dirpath[, params, compiler, verbose])Generate prediction code from a tree ensemble model.
concatenate
(model_objs)Concatenate multiple model objects into a single model object by copying all member trees into the destination model object.
deserialize
(filename)Deserialize (recover) the model from a checkpoint file in the disk.
deserialize_bytes
(model_bytes)Deserialize (recover) the model from a byte sequence.
dump_as_json
(*[, pretty_print])Dump the model as a JSON string.
export_lib
(toolchain, libpath[, params, ...])Convenience function: Generate prediction code and immediately turn it into a dynamic shared library.
export_srcpkg
(platform, toolchain, pkgpath, ...)Convenience function: Generate prediction code and create a zipped source package for deployment.
from_lightgbm
(booster)Load a tree ensemble model from a LightGBM Booster object
from_xgboost
(booster)Load a tree ensemble model from an XGBoost Booster object
from_xgboost_json
(json_str[, ...])Load a tree ensemble model from a string containing XGBoost JSON
import_from_json
(json_str)Import a tree ensemble model from a JSON string.
load
(filename, model_format[, ...])Load a tree ensemble model from a file
serialize
(filename)Serialize (persist) the model to a checkpoint file in the disk, using a fast binary representation.
Serialize (persist) the model to a byte sequence, using a fast binary representation.
set_tree_limit
(tree_limit)Set first n trees to be kept, the remaining ones will be dropped
Attributes:
Number of classes of the model (1 if the model is not a multi-class classifier
Number of features used in the model
Number of decision trees in the model
- compile(dirpath, params=None, compiler='ast_native', verbose=False)
Generate prediction code from a tree ensemble model. The code will be C99 compliant. One header file (.h) will be generated, along with one or more source files (.c). Use
create_shared()
method to package prediction code as a dynamic shared library (.so/.dll/.dylib).- Parameters:
Example
The following populates the directory
./model
with source and header files:model.compile(dirpath='./my/model', params={}, verbose=True)
If parallel compilation is enabled (parameter
parallel_comp
), the files are in the form of./my/model/header.h
,./my/model/main.c
,./my/model/tu0.c
,./my/model/tu1.c
and so forth, depending on the value ofparallel_comp
. Otherwise, there will be exactly two files:./model/header.h
,./my/model/main.c
- classmethod concatenate(model_objs)
Concatenate multiple model objects into a single model object by copying all member trees into the destination model object.
- Parameters:
- Returns:
model – Concatenated model
- Return type:
Model
object
Example
concatenated_model = Model.concatenate([model1, model2, model3])
- classmethod deserialize(filename)
Deserialize (recover) the model from a checkpoint file in the disk. It is expected that the file was generated by a call to the
serialize()
method.
- classmethod deserialize_bytes(model_bytes)
Deserialize (recover) the model from a byte sequence. It is expected that the byte sequence was generated by a call to the
serialize_bytes()
method.
- dump_as_json(*, pretty_print=True)
Dump the model as a JSON string. This is useful for inspecting details of the tree ensemble model.
Note
The operation performed in
dump_as_json()
is strictly one-way. So the output ofdump_as_json()
will differ from the JSON string you used in callingimport_from_json()
.
- export_lib(toolchain, libpath, params=None, compiler='ast_native', verbose=False, nthread=None, options=None)
Convenience function: Generate prediction code and immediately turn it into a dynamic shared library. A temporary directory will be created to hold the source files.
- Parameters:
toolchain (
str
) – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)libpath (
str
) – location to save the generated dynamic shared libraryparams (
dict
, optional) – parameters to be passed to the compiler. See this page for the list of compiler parameters.compiler (
str
, optional) – name of compiler to use in C code generationverbose (
bool
, optional) – whether to produce extra messagesnthread (
int
, optional) – number of threads to use in creating the shared library. Defaults to the number of cores in the system.options (
list
ofstr
, optional) – Additional options to pass to toolchain
Example
The one-line command
model.export_lib(toolchain='msvc', libpath='./mymodel.dll', params={}, verbose=True)
is equivalent to the following sequence of commands:
model.compile(dirpath='/temporary/directory', params={}, verbose=True) treelite.create_shared(toolchain='msvc', dirpath='/temporary/directory', verbose=True) # move the library out of the temporary directory shutil.move('/temporary/directory/mymodel.dll', './mymodel.dll')
- export_srcpkg(platform, toolchain, pkgpath, libname, params=None, compiler='ast_native', verbose=False, options=None)
Convenience function: Generate prediction code and create a zipped source package for deployment. The resulting zip file will also contain a Makefile.
- Parameters:
platform (
str
) – name of the operating system on which the headers and sources shall be compiled. Must be one of the following: ‘windows’ (Microsoft Windows), ‘osx’ (Mac OS X), ‘unix’ (Linux and other UNIX-like systems)toolchain (
str
) – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, ‘gcc’, and ‘cmake’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)pkgpath (
str
) – location to save the zipped source packagelibname (
str
) – name of model shared library to be builtparams (
dict
, optional) – parameters to be passed to the compiler. See this page for the list of compiler parameters.compiler (
str
, optional) – name of compiler to use in C code generationverbose (
bool
, optional) – whether to produce extra messagesnthread (
int
, optional) – number of threads to use in creating the shared library. Defaults to the number of cores in the system.options (
list
ofstr
, optional) – Additional options to pass to toolchain
Example
The one-line command
model.export_srcpkg(platform='unix', toolchain='gcc', pkgpath='./mymodel_pkg.zip', libname='mymodel.so', params={}, verbose=True)
is equivalent to the following sequence of commands:
model.compile(dirpath='/temporary/directory/mymodel', params={}, verbose=True) generate_makefile(dirpath='/temporary/directory/mymodel', platform='unix', toolchain='gcc') # zip the directory containing C code and Makefile shutil.make_archive(base_name=pkgpath, format='zip', root_dir='/temporary/directory', base_dir='mymodel/')
- classmethod from_lightgbm(booster)
Load a tree ensemble model from a LightGBM Booster object
- Parameters:
booster (object of type
lightgbm.Booster
) – Python handle to LightGBM model- Returns:
model – loaded model
- Return type:
Model
object
Example
bst = lightgbm.train(params, dtrain, 10, valid_sets=[dtrain], valid_names=['train']) tl_model = Model.from_lightgbm(bst)
- classmethod from_xgboost(booster)
Load a tree ensemble model from an XGBoost Booster object
- Parameters:
booster (object of type
xgboost.Booster
) – Python handle to XGBoost model- Returns:
model – loaded model
- Return type:
Model
object
Example
bst = xgboost.train(params, dtrain, 10, [(dtrain, 'train')]) tl_model = Model.from_xgboost(bst)
- classmethod from_xgboost_json(json_str, allow_unknown_field=False)
Load a tree ensemble model from a string containing XGBoost JSON
- Parameters:
- Returns:
model – loaded model
- Return type:
Model
object
Example
bst = xgboost.train(params, dtrain, 10, [(dtrain, 'train')]) bst.save_model('model.json') with open('model.json') as file_: json_str = file_.read() tl_model = Model.from_xgboost_json(json_str)
- classmethod import_from_json(json_str)
Import a tree ensemble model from a JSON string.
See Specifying models using JSON string for details.
Note
import_from_json()
is strict about which JSON strings to acceptSome tree libraries let users to export models as JSON strings, but in general
import_from_json()
will not accept them. See the warning at the top of Specifying models using JSON string.Note
The operation performed in
import_from_json()
is strictly one-way. So the output ofdump_as_json()
will differ from the JSON string you used in callingimport_from_json()
.
- classmethod load(filename, model_format, allow_unknown_field=False)
Load a tree ensemble model from a file
Note
To load scikit-learn models, use
import_model()
instead.- Parameters:
- Returns:
model – loaded model
- Return type:
Model
object
Example
xgb_model = Model.load('xgboost_model.model', 'xgboost')
- property num_class
Number of classes of the model (1 if the model is not a multi-class classifier
- property num_feature
Number of features used in the model
- property num_tree
Number of decision trees in the model
- serialize(filename)
Serialize (persist) the model to a checkpoint file in the disk, using a fast binary representation. To recover the model from the checkpoint, use
deserialize()
method.- Parameters:
filename (
str
) – Path to checkpoint
- serialize_bytes()
Serialize (persist) the model to a byte sequence, using a fast binary representation. To recover the model from the byte sequence, use
deserialize_bytes()
method.- Return type:
- set_tree_limit(tree_limit)
Set first n trees to be kept, the remaining ones will be dropped
- class treelite.ModelBuilder(num_feature, num_class=1, average_tree_output=False, threshold_type='float32', leaf_output_type='float32', **kwargs)
Builder class for tree ensemble model: provides tools to iteratively build an ensemble of decision trees
- Parameters:
num_feature (
int
) – number of features used in model being built. We assume that all feature indices are between0
and (num_feature - 1
)num_class (
int
, optional) – number of output groups;>1
indicates multiclass classificationaverage_tree_output (
bool
, optional) – whether the model is a random forest;True
indicates a random forest andFalse
indicates gradient boosted trees**kwargs – model parameters, to be used to specify the resulting model. Refer to this page for the full list of model parameters.
Classes:
Node
()Handle to a node in a tree
Tree
([threshold_type, leaf_output_type])Handle to a decision tree in a tree ensemble Builder
Value
(init_value, dtype)Value whose type may be specified at runtime
Methods:
append
(tree)Add a tree at the end of the ensemble
commit
()Finalize the ensemble model
insert
(index, tree)Insert a tree at specified location in the ensemble
- class Node
Handle to a node in a tree
Methods:
set_categorical_test_node
(feature_id, ...)Set the node as a test node with categorical split.
set_leaf_node
(leaf_value[, leaf_value_type])Set the node as a leaf node
set_numerical_test_node
(feature_id, opname, ...)Set the node as a test node with numerical split.
set_root
()Set the node as the root
- set_categorical_test_node(feature_id, left_categories, default_left, left_child_key, right_child_key)
Set the node as a test node with categorical split. A list defines all categories that would be classified as the left side. Categories are integers ranging from
0
ton-1
, wheren
is the number of categories in that particular feature.- Parameters:
feature_id (
int
) – feature indexleft_categories (
list
ofint
) – list of categories belonging to the left child.default_left (
bool
) – default direction for missing values (True
for left;False
for right)left_child_key (
int
) – unique integer key to identify the left child noderight_child_key (
int
) – unique integer key to identify the right child node
- set_leaf_node(leaf_value, leaf_value_type='float32')
Set the node as a leaf node
- set_numerical_test_node(feature_id, opname, threshold, default_left, left_child_key, right_child_key, threshold_type='float32')
Set the node as a test node with numerical split. The test is in the form
[feature value] OP [threshold]
. Depending on the result of the test, either left or right child would be taken.- Parameters:
feature_id (
int
) – feature indexopname (
str
) – binary operator to use in the testthreshold (
float
) – threshold valuedefault_left (
bool
) – default direction for missing values (True
for left;False
for right)left_child_key (
int
) – unique integer key to identify the left child noderight_child_key (
int
) – unique integer key to identify the right child nodethreshold_type (str) – data type for threshold value (e.g. ‘float32’)
- set_root()
Set the node as the root
- class Tree(threshold_type='float32', leaf_output_type='float32')
Handle to a decision tree in a tree ensemble Builder
- class Value(init_value, dtype)
Value whose type may be specified at runtime
- Parameters:
dtype (str) – Initial value of model handle
- append(tree)
Add a tree at the end of the ensemble
- Parameters:
tree (
Tree
object) – tree to be added
Example
builder = ModelBuilder(num_feature=4227) tree = ... # build tree somehow builder.append(tree) # add tree at the end of the ensemble
- commit()
Finalize the ensemble model
- Returns:
model – finished model
- Return type:
Model
object
Example
builder = ModelBuilder(num_feature=4227) for i in range(100): tree = ... # build tree somehow builder.append(tree) # add one tree at a time model = builder.commit() # now get a Model object model.compile(dirpath='test') # compile model into C code
- insert(index, tree)
Insert a tree at specified location in the ensemble
- Parameters:
Example
builder = ModelBuilder(num_feature=4227) tree = ... # build tree somehow builder.insert(0, tree) # insert tree at index 0
- exception treelite.TreeliteError
Error thrown by Treelite
Create shared library.
- Parameters:
toolchain (
str
) – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)dirpath (
str
) – directory containing the header and source files previously generated byModel.compile()
. The directory must contain recipe.json which specifies build dependencies.nthread (
int
, optional) – number of threads to use in creating the shared library. Defaults to the number of cores in the system.verbose (
bool
, optional) – whether to produce extra messagesoptions (
list
ofstr
, optional) – Additional options to pass to toolchainlong_build_time_warning (
bool
, optional) – If set to False, suppress the warning about potentially long build time
- Returns:
libpath – absolute path of created shared library
- Return type:
Example
The following command uses Visual C++ toolchain to generate
./my/model/model.dll
:model.compile(dirpath='./my/model', params={}, verbose=True) create_shared(toolchain='msvc', dirpath='./my/model', verbose=True)
Later, the shared library can be referred to by its directory name:
predictor = Predictor(libpath='./my/model', verbose=True) # looks for ./my/model/model.dll
Alternatively, one may specify the library down to its file name:
predictor = Predictor(libpath='./my/model/model.dll', verbose=True)
- treelite.generate_cmakelists(dirpath, options=None)
Generate a CMakeLists.txt for a given directory of headers and sources. The resulting CMakeLists.txt will be stored in the directory. This function is useful for deploying a model on a different machine.
- Parameters:
dirpath (
str
) – directory containing the header and source files previously generated byModel.compile()
. The directory must contain recipe.json which specifies build dependencies.options (
list
ofstr
, optional) – Additional options to pass to toolchain
- treelite.generate_makefile(dirpath, platform, toolchain, options=None)
Generate a Makefile for a given directory of headers and sources. The resulting Makefile will be stored in the directory. This function is useful for deploying a model on a different machine.
- Parameters:
dirpath (
str
) – directory containing the header and source files previously generated byModel.compile()
. The directory must contain recipe.json which specifies build dependencies.platform (
str
) – name of the operating system on which the headers and sources shall be compiled. Must be one of the following: ‘windows’ (Microsoft Windows), ‘osx’ (Mac OS X), ‘unix’ (Linux and other UNIX-like systems)toolchain (
str
) – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)options (
list
ofstr
, optional) – Additional options to pass to toolchain
Scikit-learn importer
Converter to ingest scikit-learn models into Treelite
Functions:
|
Load a tree ensemble model from a scikit-learn model object |
|
Load a tree ensemble model from a scikit-learn model object using the model builder API. |
- treelite.sklearn.import_model(sklearn_model)
Load a tree ensemble model from a scikit-learn model object
Note
For ‘IsolationForest’, it will calculate the outlier score using the standardized ratio as proposed in the original reference, which matches with ‘IsolationForest._compute_chunked_score_samples’ but is a bit different from ‘IsolationForest.decision_function’.
- Parameters:
sklearn_model (object of type
RandomForestRegressor
/RandomForestClassifier
/ExtraTreesRegressor
/ExtraTreesClassifier
/GradientBoostingRegressor
/GradientBoostingClassifier
/HistGradientBoostingRegressor
/HistGradientBoostingClassifier
/IsolationForest
) – Python handle to scikit-learn model- Returns:
model – loaded model
- Return type:
Model
object
Example
import sklearn.datasets import sklearn.ensemble X, y = sklearn.datasets.load_boston(return_X_y=True) clf = sklearn.ensemble.RandomForestRegressor(n_estimators=10) clf.fit(X, y) import treelite.sklearn model = treelite.sklearn.import_model(clf)
Notes
This function does not yet support categorical splits in HistGradientBoostingRegressor and HistGradientBoostingClassifier. If you are using either estimator types, make sure that all test nodes have numerical test conditions.
- treelite.sklearn.import_model_with_model_builder(sklearn_model)
Load a tree ensemble model from a scikit-learn model object using the model builder API.
Note
Use
import_model
for production useThis function exists to demonstrate the use of the model builder API and is slow with large models. For production, please use
import_model()
which is significantly faster.- Parameters:
sklearn_model (object of type
RandomForestRegressor
/RandomForestClassifier
/ExtraTreesRegressor
/ExtraTreesClassifier
/GradientBoostingRegressor
/GradientBoostingClassifier
) – Python handle to scikit-learn model- Returns:
model – loaded model
- Return type:
Model
object
Example
import sklearn.datasets import sklearn.ensemble X, y = sklearn.datasets.load_boston(return_X_y=True) clf = sklearn.ensemble.RandomForestRegressor(n_estimators=10) clf.fit(X, y) import treelite.sklearn model = treelite.sklearn.import_model_with_model_builder(clf)