I implemented xgboost trees from scratch!
(Left) Fitting a noisy sine wave with varius number of trees (Right) Fitting 2 dimensional gaussian data (in blue)
I've heard a lot about xgboost and gradient boosted trees, but never went deeply into understanding how they worked before. XGBoost is very popular, and I think understanding how it works could be very useful! Plus there really isn't a better way to gain understanding compared to implementing something from scratch.
Read through and watched the reasources I listed above, then proceeded to implement an ExtremeBoostedTree class that follows the greedy split algorithm described in the paper (for regression). I tested the algorithm with a sinusoidal function.
Added ensembling and adapted code to be able to use any loss function (given its 1st and 2nd derivatives).
Reconfigured everything to accept any dimensional data and added approximate splitting.
- (Oct 3) Created a basic extreme gradient boost tree for 1 dimensional data
- (Oct 4) Added ensembling and generalized greedy algorithm to any loss function (given 1st and 2nd derivative functions)
- (Oct 5) Implemented multi-dim and approximate splitting