Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default threshold_outlier value for EWMA monitoring class #4

Open
loicdtx opened this issue Jan 28, 2023 · 1 comment
Open

Default threshold_outlier value for EWMA monitoring class #4

loicdtx opened this issue Jan 28, 2023 · 1 comment

Comments

@loicdtx
Copy link
Collaborator

loicdtx commented Jan 28, 2023

While experimenting with simulated time-series, I found out that the default value for the threshold_outlier argument should be significantly higher (10 could be a more reasonable value, though that would depend on the range of the monitored values too I guess). When threshold_outlier is too low, all observations following a disturbance are considered extreme outliers and the process value is no longer updated, resulting in the break never being detected.

default sensitivity also appears to be a bit low (meaning too sensitive). I obtain best performances with a sensitivity of 4; though the synthetic data may not be realistic enough to be fully conclusive on that aspect.

import datetime

import numpy as np
import xarray as xr
from nrt import data
from nrt.monitor.ewma import EWMA
import matplotlib.pyplot as plt

# Create synthetic time-series
dates = np.arange('2018-01-01', '2022-06-15', dtype='datetime64[W]')
params_ds = data.make_cube_parameters(shape=(50,50),
                                      n_outliers_interval=(0,5),
                                      n_nan_interval=(0,7),
                                      break_idx_interval=(105,dates.size - 20))
cube = data.make_cube(dates=dates, params_ds=params_ds)

# Split stable history and monitoring period
cube_history = cube.sel(time=slice('2018-01-01','2019-12-31'))
cube_monitor = cube.sel(time=slice('2020-01-01', '2022-12-31'))


# INstantiation and fitting 
EwmaMonitor = EWMA(trend=False, harmonic_order=1, lambda_=0.3, sensitivity=4, threshold_outlier=10)
EwmaMonitor.fit(dataarray=cube_history)

# Function to 
def get_monitoring_state(nrtInstance, date, array, pixel_idx):
    """Retrieve monitoring paramters of a single pixel
    """
    process = nrtInstance.process[pixel_idx]
    boundary = nrtInstance.boundary[pixel_idx]
    model = nrtInstance.predict(date)[pixel_idx]
    value = array[pixel_idx]
    mask = nrtInstance.mask[pixel_idx]
    return value, model, process, boundary, mask

process_list = []
boundary_list = []
model_list = []
value_list = []
date_list = []
mask_list = []


# Emulate monitoring
for array, date in zip(cube_monitor.values,
                       cube_monitor.time.values.astype('M8[s]').astype(datetime.datetime)):
    EwmaMonitor.monitor(array=array, date=date)
    # Collect monitoring state for visualization and inspection
    value, model, process, boundary, mask = get_monitoring_state(EwmaMonitor, date, array, (5,5))
    process_list.append(process)
    boundary_list.append(boundary)
    model_list.append(model)
    value_list.append(value)
    date_list.append(date)
    mask_list.append(mask)

fig, [ax0, ax1, ax2] = plt.subplots(3,1)
ax0.plot(date_list, model_list)
ax0.plot(date_list, value_list, marker='.', color='green')
ax1.plot(date_list, process_list, color='magenta')
ax1.plot(date_list, boundary_list, color='black')
ax2.plot(date_list, mask_list)
plt.show()

# Compute precision and recall of the monitoring simulation
def accuracy(nrtInstance, params_ds, dates, delta=180):
    """Compute accuracy metrics (precision, recall) of a nrt simulation on synthetic data

    Args:
        nrtInstance: Instance of a nrt monitoring class used for monitoring
        params_ds: Time-series generation paramaters
        dates: Array of numpy.datetime64 dates used for synthetic time-series generation
        delta: Number of days after for which a detection is still considered valid
    """
    detection_date = nrtInstance._report(layers=['detection_date'], dtype=np.uint16)
    dates_true = np.where(params_ds.break_idx != -1,
                          dates[params_ds.break_idx.values],
                          np.datetime64('NaT'))
    dates_true_bound = dates_true + np.timedelta64(delta)
    dates_pred = np.datetime64('1970-01-01') + np.timedelta64(1) * detection_date
    dates_pred[dates_pred == np.datetime64('1970-01-01')] = np.datetime64('NaT')
    # COmputes arrays of TP, FP, FN (they should be mutually exclusive
    TP = np.where(np.logical_and(dates_pred >= dates_true, dates_pred <= dates_true_bound),
                  1, 0)
    FP = np.where(np.logical_and(TP == 0, ~np.isnat(dates_pred)), 1, 0)
    FN = np.where(np.logical_and(np.isnat(dates_pred), ~np.isnat(dates_true)), 1, 0)
    precision = TP.sum() / (TP.sum() + FP.sum())
    recall = TP.sum() / (TP.sum() + FN.sum())
    return precision, recall

print(accuracy(EwmaMonitor, params_ds, dates))
@loicdtx
Copy link
Collaborator Author

loicdtx commented Jan 28, 2023

Example with threshold of 2:

  • First panel: obs + fit
  • Second panel: process and boundary
  • Third panel: mask value

Figure_1

loicdtx added a commit that referenced this issue Mar 16, 2023
- Small bug fix in report method
Fixes #2 and #4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant