Skip to content

incremental CART decision tree, based on the hoeffding tree i.e. very fast decision tree (VFDT), which is proposed in this paper "Mining High-Speed Data Streams" by Domingos & Hulten (2000). And a newly extended model "Extremely Fast Decision Tree" (EFDT) by Manapragada, Webb & Salehi (2018). Added new implementation of Random Forest

Notifications You must be signed in to change notification settings

doubleplusplus/incremental_decision_tree-CART-Random_Forest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

incremental-decision-tree-learner

Definition from wikipedia: incremental decision tree

"An incremental decision tree algorithm is an online machine learning algorithm that outputs a decision tree. Many decision tree methods, construct a tree using a complete (static) dataset. Incremental decision tree methods allow an existing tree to be updated using only new data instances, without having to re-process past instances. This may be useful in situations where the entire dataset is not available when the tree is updated (i.e. the data was not stored), the original data set is too large to process or the characteristics of the data change over time (concept drift)."

VFDT

This implementation is CART tree, based on the Hoeffding Tree i.e. very fast decision tree (VFDT) which is describe by the paper "Mining High-Speed Data Streams" (Domingos & Hulten, 2000). The code is tested on dataset downloaded from UCI data base.

EFDT

"Extremely Fast Decision Tree" by Manapragada, Webb & Salehi (2018). As new data instances come in, EFDT can dynamically modify existing model, re-evaluate previous split or kill subtree. Now EFDT is available. But it runs slower than VFDT.

Random Forest

Added implementation of Random Forest: rf.py. It is very efficient, because I used vectorized computation for computing gini impurity/index, and pooling for multi-processing.

About

incremental CART decision tree, based on the hoeffding tree i.e. very fast decision tree (VFDT), which is proposed in this paper "Mining High-Speed Data Streams" by Domingos & Hulten (2000). And a newly extended model "Extremely Fast Decision Tree" (EFDT) by Manapragada, Webb & Salehi (2018). Added new implementation of Random Forest

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages