Skip to content

Latest commit

 

History

History
59 lines (48 loc) · 2.32 KB

Readme.md

File metadata and controls

59 lines (48 loc) · 2.32 KB

rgf_python

Regularized Greedy Forest (RGF) Python wrapper (with x64 binaries)

Summary:

  • The aim of this fork is to simplify installation and ensure usage out-of-the-box
  • I forked Python wrapper, added Win and Linux binaries, and made some changes in code
  • So now you just need to install this package and you'll get working binaries with it
  • Tested on: Ubuntu 14.04 x64, Win 7 x64
  • Implementation (binaries) is slow (single thread)
  • License for rgf_python: Apache License v2.0
  • License for RGF: GNU GPL v3
  • RGF maintainer page
  • RGF page
  • There are also multi-thread implementation FastRGF, but I didn't find Python wrapper for it and didn't try it

Installation:

git clone https://github.com/vecxoz/rgf_python
cd rgf_python
python setup.py install --user

Usage:

from sklearn.datasets import load_iris, load_boston
from sklearn.model_selection import cross_val_score
from rgf.rgf import RGFClassifier, RGFRegressor

# Classification
iris = load_iris()
X = iris.data
y = iris.target
model = RGFClassifier()
print(cross_val_score(model, X, y, cv = 5))
# array([ 0.96666667,  0.96666667,  0.93333333,  0.9       ,  1.        ])

# Regression
boston = load_boston()
X = boston.data
y = boston.target
model = RGFRegressor()
print(cross_val_score(model, X, y, cv = 5))
# [ 0.7286153   0.79284581  0.7961001   0.47978064  0.1185657 ]

Hyperparameter tuning:

  • Ditails on hyperparameter tuning
  • max_leaf: Appropriate values are data-dependent and vary from 1000 to 10000.
  • test_interval: For efficiency, it must be either multiple or divisor of 100.
  • algorithm: 'RGF', 'RGF_Opt' or 'RGF_Sib'
  • loss: "LS", "Log" or "Expo".
  • reg_depth: Must be no smaller than 1. Meant for being used with algorithm='RGF_Opt' or 'RGF_Sib'.
  • l2: Either 1, 0.1, or 0.01 often produces good results though with exponential loss (loss='Expo') and logistic loss (loss='Log') some data requires smaller values such as 1e-10 or 1e-20
  • sl2: By default equal to l2. On some data, l2/100 works well