Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support time series classification #83

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Version 0.1.*
.. |Fix| replace:: :raw-html:`<span class="badge badge-danger">Fix</span>` :raw-latex:`{\small\sc [Fix]}`
.. |API| replace:: :raw-html:`<span class="badge badge-warning">API Change</span>` :raw-latex:`{\small\sc [API Change]}`

- |Feature| |API| add :obj:`TimeSeriesCascadeForestClassifier` for time series classification (`#83 <https://github.com/LAMDA-NJU/Deep-Forest/pull/83>`__) @xuyxu
- |Fix| fix missing functionality of :meth:`_set_n_trees` @xuyxu
- |Fix| |API| add docstrings for parameter ``bin_type`` (`#74 <https://github.com/LAMDA-NJU/Deep-Forest/pull/74>`__) @xuyxu
- |Feature| |API| recover the parameter ``min_samples_split`` (`#73 <https://github.com/LAMDA-NJU/Deep-Forest/pull/73>`__) @xuyxu
Expand Down
1 change: 1 addition & 0 deletions build_tools/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ pytest-cov
lightgbm
xgboost
cython>=0.28.5
pandas>=0.25.0
7 changes: 6 additions & 1 deletion deepforest/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
from .cascade import CascadeForestClassifier, CascadeForestRegressor
from .cascade import (
CascadeForestClassifier,
CascadeForestRegressor,
TimeSeriesCascadeForestClassifier,
)
from .forest import RandomForestClassifier, RandomForestRegressor
from .forest import ExtraTreesClassifier, ExtraTreesRegressor
from .tree import DecisionTreeClassifier, DecisionTreeRegressor
Expand All @@ -8,6 +12,7 @@
__all__ = [
"CascadeForestClassifier",
"CascadeForestRegressor",
"TimeSeriesCascadeForestClassifier",
"RandomForestClassifier",
"RandomForestRegressor",
"ExtraTreesClassifier",
Expand Down
195 changes: 195 additions & 0 deletions deepforest/cascade.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import time
import numbers
import numpy as np
import pandas as pd
from abc import ABCMeta, abstractmethod
from sklearn.preprocessing import LabelEncoder
from sklearn.utils.multiclass import type_of_target
Expand Down Expand Up @@ -202,6 +203,34 @@
return predictor


def _build_time_series_feature_transformer():
"""Build the time series feature transformer from tsfresh."""
# Skip Windows platform
import platform

Check warning on line 209 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L209

Added line #L209 was not covered by tests

if platform.system() == "Windows":
msg = (

Check warning on line 212 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L211-L212

Added lines #L211 - L212 were not covered by tests
"TimeSeriesClascadeForestClassifier currently is not available"
" on Windows due to the parallelization issue of tsfresh."
)
raise NotImplementedError(msg)

Check warning on line 216 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L216

Added line #L216 was not covered by tests

try:
__import__("tsfresh")
except ModuleNotFoundError:
msg = (

Check warning on line 221 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L218-L221

Added lines #L218 - L221 were not covered by tests
"Cannot load the module tsfresh when building the feature"
" transformer. Please make sure that tsfresh is installed."
)
raise ModuleNotFoundError(msg)

Check warning on line 225 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L225

Added line #L225 was not covered by tests

from tsfresh.transformers import RelevantFeatureAugmenter

Check warning on line 227 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L227

Added line #L227 was not covered by tests

augmenter = RelevantFeatureAugmenter(column_id="id", column_sort="time")

Check warning on line 229 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L229

Added line #L229 was not covered by tests

return augmenter

Check warning on line 231 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L231

Added line #L231 was not covered by tests


__classifier_model_doc = """
Parameters
----------
Expand Down Expand Up @@ -1699,3 +1728,169 @@
_y = _utils.merge_proba(X_aug_test_, self.n_outputs_)

return _y


class TimeSeriesCascadeForestClassifier(ClassifierMixin):
def __init__(
self,
n_bins=255,
bin_subsample=200000,
bin_type="percentile",
max_layers=20,
criterion="gini",
n_estimators=2,
n_trees=100,
max_depth=None,
min_samples_split=2,
min_samples_leaf=1,
use_predictor=False,
predictor="forest",
predictor_kwargs={},
backend="custom",
n_tolerant_rounds=2,
delta=1e-5,
partial_mode=False,
n_jobs=None,
random_state=None,
verbose=1,
):

# Feature transformer
self.transformer = _build_time_series_feature_transformer()

Check warning on line 1759 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1759

Added line #L1759 was not covered by tests

# Classifier
self.classifier = CascadeForestClassifier(

Check warning on line 1762 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1762

Added line #L1762 was not covered by tests
n_bins=n_bins,
bin_subsample=bin_subsample,
bin_type=bin_type,
max_layers=max_layers,
criterion=criterion,
n_estimators=n_estimators,
n_trees=n_trees,
max_depth=max_depth,
min_samples_split=min_samples_split,
min_samples_leaf=min_samples_leaf,
use_predictor=use_predictor,
predictor=predictor,
predictor_kwargs=predictor_kwargs,
backend=backend,
n_tolerant_rounds=n_tolerant_rounds,
delta=delta,
partial_mode=partial_mode,
n_jobs=n_jobs,
random_state=random_state,
verbose=verbose,
)

self.verbose = verbose

Check warning on line 1785 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1785

Added line #L1785 was not covered by tests

def _check_input(self, X, y=None):
"""Check the input training and evaluating time series."""
is_training_data = y is not None

Check warning on line 1789 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1789

Added line #L1789 was not covered by tests

if not isinstance(X, pd.DataFrame):
msg = "X should be a pandas DataFrame, but got {} instead."
raise ValueError(msg.format(type(X)))

Check warning on line 1793 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1791-L1793

Added lines #L1791 - L1793 were not covered by tests

if not "id" in X.columns:
msg = "X should have one column named: `id`."
raise ValueError(msg)

Check warning on line 1797 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1795-L1797

Added lines #L1795 - L1797 were not covered by tests

if not "time" in X.columns:
msg = "X should have one column named: `time`."
raise ValueError(msg)

Check warning on line 1801 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1799-L1801

Added lines #L1799 - L1801 were not covered by tests

# Check same time series length
length = X.groupby(["id"]).size().to_numpy()
if not (length == length[0]).all():
msg = "All time series should have the same length."
raise ValueError(msg)

Check warning on line 1807 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1804-L1807

Added lines #L1804 - L1807 were not covered by tests

# Additional checks for training data
if is_training_data:

Check warning on line 1810 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1810

Added line #L1810 was not covered by tests

if not isinstance(y, pd.Series):
msg = "y should be a pandas Series, but got {} instead."
raise ValueError(msg.format(type(y)))

Check warning on line 1814 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1812-L1814

Added lines #L1812 - L1814 were not covered by tests

# Check same time series id
if not (y.index == X["id"].unique()).all():
msg = "Mismatch of time series IDs in X and y."
raise ValueError(msg)

Check warning on line 1819 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1817-L1819

Added lines #L1817 - L1819 were not covered by tests

# Set attributes
self.length = length[0]

Check warning on line 1822 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1822

Added line #L1822 was not covered by tests

def fit(self, X, y, sample_weight=None):
"""
Build a deep forest using the training time series for classification.

Parameters
----------
X : :obj:`pandas.DataFrame` of shape (n_samples * length, n_series)
The input time series in a flat DataFrame. The column ``"id"`` and
``"time"`` is used to locate the `time`-th record of the `id`-th
time series. Internally, it will be transformed into non-ordinal
numerical features using :mod:`tsfresh`.
y : :obj:`pandas.Series` of shape (n_samples,)
The class labels of input time series.
sample_weight : :obj:`numpy.ndarray` of shape (n_samples,), default=None
Sample weights. If ``None``, then samples are equally weighted.
"""
self._check_input(X, y)
dummy_X = pd.DataFrame(index=y.index)
self.transformer.set_timeseries_container(X)

Check warning on line 1842 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1840-L1842

Added lines #L1840 - L1842 were not covered by tests

if self.verbose > 0:
print("{} Transforming time series".format(_utils.ctime()))

Check warning on line 1845 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1844-L1845

Added lines #L1844 - L1845 were not covered by tests

X_with_features = self.transformer.fit_transform(dummy_X, y).to_numpy()
self.classifier.fit(X_with_features, y, sample_weight)

Check warning on line 1848 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1847-L1848

Added lines #L1847 - L1848 were not covered by tests

def predict_proba(self, X):
"""
Predict class probabilities for time series X.

Parameters
----------
X : :obj:`pandas.DataFrame` of shape (n_samples * length, n_series)
The input time series in a flat DataFrame. The column ``"id"`` and
``"time"`` is used to locate the `time`-th record of the `id`-th
time series. Internally, it will be transformed into non-ordinal
numerical features using :mod:`tsfresh`.

Returns
-------
proba : :obj:`numpy.ndarray` of shape (n_series, n_classes)
The class probabilities of the input time series.
"""
self._check_input(X)
dummy_X = pd.DataFrame()
self.transformer.set_timeseries_container(X)

Check warning on line 1869 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1867-L1869

Added lines #L1867 - L1869 were not covered by tests

if self.verbose > 0:
print("{} Transforming time series".format(_utils.ctime()))

Check warning on line 1872 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1871-L1872

Added lines #L1871 - L1872 were not covered by tests

X_with_features = self.transformer.transform(dummy_X).to_numpy()

Check warning on line 1874 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1874

Added line #L1874 was not covered by tests

return self.classifier.predict_proba(X_with_features)

Check warning on line 1876 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1876

Added line #L1876 was not covered by tests

def predict(self, X):
"""
Predict class for time series X.

Parameters
----------
X : :obj:`pandas.DataFrame` of shape (n_samples * length, n_series)
The input time series in a flat DataFrame. The column ``"id"`` and
``"time"`` is used to locate the `time`-th record of the `id`-th
time series. Internally, it will be transformed into non-ordinal
numerical features using :mod:`tsfresh`.

Returns
-------
y : :obj:`numpy.ndarray` of shape (n_series,)
The predicted classes.
"""
proba = self.predict_proba(X)
return np.argmax(proba, axis=1)

Check warning on line 1896 in deepforest/cascade.py

View check run for this annotation

Codecov / codecov/patch

deepforest/cascade.py#L1895-L1896

Added lines #L1895 - L1896 were not covered by tests