API of Treelite Python package.
Treelite: a model compiler for decision tree ensembles
treelite.
Model
(handle=None)¶Decision tree ensemble model
handle (ctypes.c_void_p
, optional) – Initial value of model handle
compile
(dirpath, params=None, compiler='ast_native', verbose=False)¶Generate prediction code from a tree ensemble model. The code will be C99
compliant. One header file (.h) will be generated, along with one or more
source files (.c). Use create_shared()
method to package
prediction code as a dynamic shared library (.so/.dll/.dylib).
Example
The following populates the directory ./model
with source and header
files:
model.compile(dirpath='./my/model', params={}, verbose=True)
If parallel compilation is enabled (parameter parallel_comp
), the files
are in the form of ./my/model/header.h
, ./my/model/main.c
,
./my/model/tu0.c
, ./my/model/tu1.c
and so forth, depending on
the value of parallel_comp
. Otherwise, there will be exactly two files:
./model/header.h
, ./my/model/main.c
deserialize
(filename)¶Deserialize (recover) the model from a checkpoint file in the disk. It is expected that
the file was generated by a call to the serialize()
method.
Note
Use exactly matching versions of Treelite when exchanging checkpoints
We provide ZERO backward compatibility guarantee. You will not be able to recover the model from a checkpoint that was generated by a previous version of Treelite. Both the producer and the consumer of the checkpoint must have the identical major and minor versions of Treelite.
dump_as_json
(*, pretty_print=True)¶Dump the model as a JSON string. This is useful for inspecting details of the tree ensemble model.
export_lib
(toolchain, libpath, params=None, compiler='ast_native', verbose=False, nthread=None, options=None)¶Convenience function: Generate prediction code and immediately turn it into a dynamic shared library. A temporary directory will be created to hold the source files.
toolchain (str
) – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’.
You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)
libpath (str
) – location to save the generated dynamic shared library
params (dict
, optional) – parameters to be passed to the compiler. See
this page for the list of compiler
parameters.
compiler (str
, optional) – name of compiler to use in C code generation
verbose (bool
, optional) – whether to produce extra messages
nthread (int
, optional) – number of threads to use in creating the shared library.
Defaults to the number of cores in the system.
options (list
of str
, optional) – Additional options to pass to toolchain
Example
The one-line command
model.export_lib(toolchain='msvc', libpath='./mymodel.dll',
params={}, verbose=True)
is equivalent to the following sequence of commands:
model.compile(dirpath='/temporary/directory', params={}, verbose=True)
treelite.create_shared(toolchain='msvc', dirpath='/temporary/directory',
verbose=True)
# move the library out of the temporary directory
shutil.move('/temporary/directory/mymodel.dll', './mymodel.dll')
export_srcpkg
(platform, toolchain, pkgpath, libname, params=None, compiler='ast_native', verbose=False, options=None)¶Convenience function: Generate prediction code and create a zipped source package for deployment. The resulting zip file will also contain a Makefile.
platform (str
) – name of the operating system on which the headers and sources shall be
compiled. Must be one of the following: ‘windows’ (Microsoft Windows),
‘osx’ (Mac OS X), ‘unix’ (Linux and other UNIX-like systems)
toolchain (str
) – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, ‘gcc’, and ‘cmake’.
You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)
pkgpath (str
) – location to save the zipped source package
libname (str
) – name of model shared library to be built
params (dict
, optional) – parameters to be passed to the compiler. See
this page for the list of compiler
parameters.
compiler (str
, optional) – name of compiler to use in C code generation
verbose (bool
, optional) – whether to produce extra messages
nthread (int
, optional) – number of threads to use in creating the shared library.
Defaults to the number of cores in the system.
options (list
of str
, optional) – Additional options to pass to toolchain
Example
The one-line command
model.export_srcpkg(platform='unix', toolchain='gcc',
pkgpath='./mymodel_pkg.zip', libname='mymodel.so',
params={}, verbose=True)
is equivalent to the following sequence of commands:
model.compile(dirpath='/temporary/directory/mymodel',
params={}, verbose=True)
generate_makefile(dirpath='/temporary/directory/mymodel',
platform='unix', toolchain='gcc')
# zip the directory containing C code and Makefile
shutil.make_archive(base_name=pkgpath, format='zip',
root_dir='/temporary/directory',
base_dir='mymodel/')
from_lightgbm
(booster)¶Load a tree ensemble model from a LightGBM Booster object
booster (object of type lightgbm.Booster
) – Python handle to LightGBM model
model – loaded model
Model
object
Example
bst = lightgbm.train(params, dtrain, 10, valid_sets=[dtrain],
valid_names=['train'])
tl_model = Model.from_lightgbm(bst)
from_xgboost
(booster)¶Load a tree ensemble model from an XGBoost Booster object
booster (object of type xgboost.Booster
) – Python handle to XGBoost model
model – loaded model
Model
object
Example
bst = xgboost.train(params, dtrain, 10, [(dtrain, 'train')])
tl_model = Model.from_xgboost(bst)
from_xgboost_json
(json_str: Union[bytearray, str])¶Load a tree ensemble model from a string containing XGBoost JSON
json_str – a string specifying an XGBoost model in the XGBoost JSON format
model – loaded model
Model
object
Example
bst = xgboost.train(params, dtrain, 10, [(dtrain, 'train')])
bst.save_model('model.json')
with open('model.json') as file_:
json_str = file_.read()
tl_model = Model.from_xgboost_json(json_str)
load
(filename, model_format)¶Load a tree ensemble model from a file
model – loaded model
Model
object
Example
xgb_model = Model.load('xgboost_model.model', 'xgboost')
num_class
¶Number of classes of the model (1 if the model is not a multi-class classifier
num_feature
¶Number of features used in the model
num_tree
¶Number of decision trees in the model
serialize
(filename)¶Serialize (persist) the model to a checkpoint file in the disk, using a fast binary
representation. To recover the model from the checkpoint, use deserialize()
method.
Note
Use exactly matching versions of Treelite when exchanging checkpoints
We provide ZERO backward compatibility guarantee. You will not be able to recover the model from a checkpoint that was generated by a previous version of Treelite. Both the producer and the consumer of the checkpoint must have the identical major and minor versions of Treelite.
filename (str
) – Path to checkpoint
set_tree_limit
(tree_limit)¶Set first n trees to be kept, the remaining ones will be dropped
treelite.
ModelBuilder
(num_feature, num_class=1, average_tree_output=False, threshold_type='float32', leaf_output_type='float32', **kwargs)¶Builder class for tree ensemble model: provides tools to iteratively build an ensemble of decision trees
num_feature (int
) – number of features used in model being built. We assume that all
feature indices are between 0
and (num_feature - 1
)
num_class (int
, optional) – number of output groups; >1
indicates multiclass classification
average_tree_output (bool
, optional) – whether the model is a random forest; True
indicates a random forest
and False
indicates gradient boosted trees
**kwargs – model parameters, to be used to specify the resulting model. Refer to this page for the full list of model parameters.
Node
¶Handle to a node in a tree
set_categorical_test_node
(feature_id, left_categories, default_left, left_child_key, right_child_key)¶Set the node as a test node with categorical split. A list defines all
categories that would be classified as the left side. Categories are
integers ranging from 0
to n-1
, where n
is the number of
categories in that particular feature.
feature_id (int
) – feature index
left_categories (list
of int
) – list of categories belonging to the left child.
default_left (bool
) – default direction for missing values
(True
for left; False
for right)
left_child_key (int
) – unique integer key to identify the left child node
right_child_key (int
) – unique integer key to identify the right child node
set_leaf_node
(leaf_value, leaf_value_type='float32')¶Set the node as a leaf node
set_numerical_test_node
(feature_id, opname, threshold, default_left, left_child_key, right_child_key, threshold_type='float32')¶Set the node as a test node with numerical split. The test is in the form
[feature value] OP [threshold]
. Depending on the result of the test,
either left or right child would be taken.
feature_id (int
) – feature index
opname (str
) – binary operator to use in the test
threshold (float
) – threshold value
default_left (bool
) – default direction for missing values
(True
for left; False
for right)
left_child_key (int
) – unique integer key to identify the left child node
right_child_key (int
) – unique integer key to identify the right child node
threshold_type (str) – data type for threshold value (e.g. ‘float32’)
set_root
()¶Set the node as the root
Tree
(threshold_type='float32', leaf_output_type='float32')¶Handle to a decision tree in a tree ensemble Builder
Value
(init_value, dtype)¶Value whose type may be specified at runtime
dtype (str) – Initial value of model handle
append
(tree)¶Add a tree at the end of the ensemble
tree (Tree
object) – tree to be added
Example
builder = ModelBuilder(num_feature=4227)
tree = ... # build tree somehow
builder.append(tree) # add tree at the end of the ensemble
commit
()¶Finalize the ensemble model
model – finished model
Model
object
Example
builder = ModelBuilder(num_feature=4227)
for i in range(100):
tree = ... # build tree somehow
builder.append(tree) # add one tree at a time
model = builder.commit() # now get a Model object
model.compile(dirpath='test') # compile model into C code
insert
(index, tree)¶Insert a tree at specified location in the ensemble
Example
builder = ModelBuilder(num_feature=4227)
tree = ... # build tree somehow
builder.insert(0, tree) # insert tree at index 0
treelite.
Annotator
¶Branch annotator class: annotate branches in a given model using frequency patterns in the training data
annotate_branch
(model, dmat, nthread=None, verbose=False)¶Annotate branches in a given model using frequency patterns in the training data. Each node gets the count of the instances that belong to it. Any prior annotation information stored in the annotator will be replaced with the new annotation returned by this method.
model (object of type Model
) – decision tree ensemble model
dmat (object of type DMatrix
) – data matrix representing the training data
nthread (int
, optional) – number of threads to use while annotating. If missing, use all physical
cores in the system.
verbose (bool
, optional) – whether to produce extra messages
Create shared library.
toolchain (str
) – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’.
You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)
dirpath (str
) – directory containing the header and source files previously generated
by Model.compile()
. The directory must contain recipe.json
which specifies build dependencies.
nthread (int
, optional) – number of threads to use in creating the shared library.
Defaults to the number of cores in the system.
verbose (bool
, optional) – whether to produce extra messages
options (list
of str
, optional) – Additional options to pass to toolchain
long_build_time_warning (bool
, optional) – If set to False, suppress the warning about potentially long build time
libpath – absolute path of created shared library
Example
The following command uses Visual C++ toolchain to generate
./my/model/model.dll
:
model.compile(dirpath='./my/model', params={}, verbose=True)
create_shared(toolchain='msvc', dirpath='./my/model', verbose=True)
Later, the shared library can be referred to by its directory name:
predictor = Predictor(libpath='./my/model', verbose=True)
# looks for ./my/model/model.dll
Alternatively, one may specify the library down to its file name:
predictor = Predictor(libpath='./my/model/model.dll', verbose=True)
treelite.
generate_makefile
(dirpath, platform, toolchain, options=None)¶Generate a Makefile for a given directory of headers and sources. The resulting Makefile will be stored in the directory. This function is useful for deploying a model on a different machine.
dirpath (str
) – directory containing the header and source files previously generated
by Model.compile()
. The directory must contain recipe.json
which specifies build dependencies.
platform (str
) – name of the operating system on which the headers and sources shall be
compiled. Must be one of the following: ‘windows’ (Microsoft Windows),
‘osx’ (Mac OS X), ‘unix’ (Linux and other UNIX-like systems)
toolchain (str
) – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’.
You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)
options (list
of str
, optional) – Additional options to pass to toolchain
treelite.
generate_cmakelists
(dirpath, options=None)¶Generate a CMakeLists.txt for a given directory of headers and sources. The resulting CMakeLists.txt will be stored in the directory. This function is useful for deploying a model on a different machine.
dirpath (str
) – directory containing the header and source files previously generated
by Model.compile()
. The directory must contain recipe.json
which specifies build dependencies.
options (list
of str
, optional) – Additional options to pass to toolchain
treelite.
TreeliteError
¶Error thrown by Treelite
Converter to ingest scikit-learn models into Treelite
treelite.sklearn.
import_model_with_model_builder
(sklearn_model)¶Load a tree ensemble model from a scikit-learn model object using the model builder API.
Note
Use import_model
for production use
This function exists to demonstrate the use of the model builder API and is slow with
large models. For production, please use import_model()
which is significantly faster.
sklearn_model (object of type RandomForestRegressor
/ RandomForestClassifier
/ ExtraTreesRegressor
/ ExtraTreesClassifier
/ GradientBoostingRegressor
/ GradientBoostingClassifier
) – Python handle to scikit-learn model
model – loaded model
Model
object
Example
import sklearn.datasets
import sklearn.ensemble
X, y = sklearn.datasets.load_boston(return_X_y=True)
clf = sklearn.ensemble.RandomForestRegressor(n_estimators=10)
clf.fit(X, y)
import treelite.sklearn
model = treelite.sklearn.import_model_with_model_builder(clf)
treelite.sklearn.
import_model
(sklearn_model)¶Load a tree ensemble model from a scikit-learn model object
Note
For ‘IsolationForest’, it will calculate the outlier score using the standardized ratio as proposed in the original reference, which matches with ‘IsolationForest._compute_chunked_score_samples’ but is a bit different from ‘IsolationForest.decision_function’.
sklearn_model (object of type RandomForestRegressor
/ RandomForestClassifier
/ ExtraTreesRegressor
/ ExtraTreesClassifier
/ GradientBoostingRegressor
/ GradientBoostingClassifier
/ IsolationForest
) – Python handle to scikit-learn model
model – loaded model
Model
object
Example
import sklearn.datasets
import sklearn.ensemble
X, y = sklearn.datasets.load_boston(return_X_y=True)
clf = sklearn.ensemble.RandomForestRegressor(n_estimators=10)
clf.fit(X, y)
import treelite.sklearn
model = treelite.sklearn.import_model(clf)
General Tree Inference Library (GTIL)
treelite.gtil.
predict
(model: treelite.frontend.Model, data: numpy.ndarray, nthread: int = -1, pred_margin: bool = False)¶Predict with a Treelite model using General Tree Inference Library (GTIL). GTIL is intended to be a reference implementation. GTIL is also useful in situations where using a C compiler is not feasible.
Note
GTIL is currently experimental
GTIL is currently in its early stage of development and may have bugs and performance issues. Please report any issues found on GitHub.
model (Model
object) – Treelite model object
data (numpy.ndarray
array) – 2D NumPy array, with which to run prediction
nthread (int
, optional) – Number of CPU cores to use in prediction. If <= 0, use all CPU cores.
pred_margin (bool
, optional) – Whether to produce raw margin scores
prediction – Prediction
numpy.ndarray
array