Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect result when using MMD with some chunk_size argument values #252

Open
jaime-cespedes-sisniega opened this issue Jul 23, 2023 · 1 comment · May be fixed by #355
Open

Incorrect result when using MMD with some chunk_size argument values #252

jaime-cespedes-sisniega opened this issue Jul 23, 2023 · 1 comment · May be fixed by #355
Labels
bug Something isn't working needs triage Issue requires triage

Comments

@jaime-cespedes-sisniega
Copy link
Contributor

jaime-cespedes-sisniega commented Jul 23, 2023

Describe the bug

Incorrect result when using MMD with some chunk_size argument values. For many chunk_size values there is a difference between the MMD² with chunk_size=None and chunk_size!=None.

For the provided code to reproduce, the following chunk_size values produce an incorrect result: 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 18, 19. The remaining values between 1 and 20 produce a correct result.

Steps/Code to Reproduce

from frouros.detectors.data_drift import MMD
import numpy as np
from functools import partial
from frouros.utils.kernels import rbf_kernel

np.random.seed(seed=31)

dim = 1
size = 20
kernel = partial(rbf_kernel, sigma=0.5)
chunk_size = 4

X_ref = np.random.multivariate_normal(mean=np.zeros(dim), cov=np.identity(dim), size=size)
X_test = np.random.multivariate_normal(mean=np.full(dim, 0.3), cov=np.identity(dim), size=size)

detector = MMD(
    kernel=kernel,
    chunk_size=None,
)
detector.fit(X_ref)
result, _ = detector.compare(X=X_test, verbose=True)

detector_chunk = MMD(
    kernel=kernel,
    chunk_size=chunk_size,
)
detector_chunk.fit(X_ref)
result_chunk, _ = detector_chunk.compare(X=X_test, verbose=True)

assert result.distance == result_chunk.distance

Expected Results

No error is thrown.

Actual Results

Traceback (most recent call last):
  File "/home/jaime/.config/JetBrains/PyCharm2023.1/scratches/frouros/expected/data_drift/batch/mmd_chunk.py", line 30, in <module>
    assert result.distance == result_chunk.distance
AssertionError

Versions

'0.5.1'
@jaime-cespedes-sisniega jaime-cespedes-sisniega added bug Something isn't working needs triage Issue requires triage labels Jul 23, 2023
@jaime-cespedes-sisniega jaime-cespedes-sisniega linked a pull request Dec 6, 2024 that will close this issue
@jaime-cespedes-sisniega
Copy link
Contributor Author

#355 proves that this issue is generated by a floating-point errors. Increase precision (e.g. np.float128) to avoid this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Issue requires triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant