-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TopologicalVector #493
Conversation
Signed-off-by: Guillaume Tauzin <[email protected]>
@gtauzin I had never seen this before! It reminds me of the fact that I once thought it would be a good idea to have a sorting transformer to sort persistence pairs by persistence, so as to have a canonical order that would make it a bit meaningful to use a neural network directly on the diagrams. But I read somewhere that this does not work so well "in practice" (though I've never tried and am a bit skeptical). Anyway, this one is cool too. |
@gtauzin I'm thinking that this transformer can also be seen as a "representation" and not as a "feature generator". Of course the boundary has always been blurry, but I was under the impression that the unspoken rule is that we call "features" only the scalar features (more precisely, we allow at most one scalar per homology dimension). In that case, it would seem that |
That's an interesting point. one could argue that If I was to define what a representation is, I would say that it is an object whose visualization is useful to understand the information contained in a persistence diagram and on which further interesting features can be extracted. I think both |
We should allow here |
IMO, we say "representations" as a shorthand for "vector representations" which in turn is a perfect synonym for "vectorizations". So the general gist for me is that these are ways for each persistence diagram to be made into high-dimensional vectors. If the vector space structure is relevant, e.g. if one can use the Euclidean distance, cosine distance, Euclidean inner product, etc. to get meaningful quantities, then in my mind we are firmly in the realm of representations. In the end, the boundary is pretty blurred I guess. To avoid getting too philosophical, I thought that having a stronger divide (one feature per hom dim vs multi-dimensional vectors per hom dim) would make our life easier, but it's not that important to me (and the user only sees the difference in the API reference, not in import statements). My personal preference in general would be to not tie the characterization to visualization, because then one can claim anything can be visualized and how useful that really is uncomfortably subjective for me. |
From a purely practical perspective, I feel like a transformer is a "feature generator" if what it outputs is a 2D array (n_samples, n_features) and it is a "representation" if it outputs a intermediate data structure (>3D arrays). If it is a "representation", it means that there are ways to extract interesting meaningful features from it (and we should provide them) and or it helps to visualize it to understand better the data. Moving |
@gtauzin thanks for the patience and for the suggestions. I'm happy with the practical criterion you mentioned, that anything outputting 2D arrays is a feature generator. So let's keep both transformers here! |
gtda/diagrams/features.py
Outdated
distances = self._distance_function.pairwise(Xd) | ||
|
||
Xd[:, 1] = Xd[:, 1] - Xd[:, 0] | ||
min_persistence = 0.5 * np.minimum(Xd[:, 1], Xd[:, 1].T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gtauzin could you explain this line to me? I expect Xd[:, 1]
to be a 1d array at this point, and if true this means that Xd[:, 1]
are Xd[:, 1].T
are equal arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you are right, this line must have remained for my own experiments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I change this line to
min_persistence = 0.5 * Xd[:, 1]
? I also think this logic should be documented.
- Allow tuple n_coefficients - Fix explanation of behaviour when n_coefficients is None - Make homology_dimensions_ and n_coefficients_ tuples - Fix shape of Xt in ComplexPolynomial .transform docs - Allow tuple and list n_distances in TopologicalVector, fix docs - Simplify docs for metric in TopologicalVector - Make n_distances_ a tuple - Add X input check in TopologicalVector.fit - Begin fixing TopologicalVector.transform logic
Guillaume Tauzin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
self.n_coefficients_ = [_homology_dimensions_counts[dim] | ||
for dim in self.homology_dimensions_] | ||
elif type(self.n_coefficients) == list: | ||
self.n_coefficients_ = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just write
self.n_coefficients_ = tuple(
[ _homology_dimensions_counts[dim] for dim in self.homology_dimensions_]
)
self.n_coefficients_ = \ | ||
tuple([_homology_dimensions_counts[dim] | ||
for dim in self.homology_dimensions_]) | ||
elif type(self.n_coefficients) in (list, tuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not how we check the type.
it should be elif isistance(self.n_coefficients, (list, tuple))
You should use the modes Typing instead of the primitive types.
f'{_n_homology_dimensions} homology dimensions.' | ||
f'`n_coefficients` has length {len(self.n_coefficients)} ' | ||
f'while diagrams in `X` have {_n_homology_dimensions} ' | ||
f'homology dimensions.' | ||
) | ||
self.n_coefficients_ = self.n_coefficients | ||
else: | ||
self.n_coefficients_ = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rewrite it better
f'`n_coefficients` has been passed as a list of length ' | ||
f'{len(self.n_coefficients)} while diagrams in `X` have ' | ||
f'{_n_homology_dimensions} homology dimensions.' | ||
f'`n_coefficients` has length {len(self.n_coefficients)} ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
difficult to read this message
) | ||
self.n_coefficients_ = self.n_coefficients | ||
else: | ||
self.n_coefficients_ = \ | ||
[self.n_coefficients] * _n_homology_dimensions | ||
tuple([self.n_coefficients] * _n_homology_dimensions) | ||
|
||
self._polynomial_function = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rewrite that
} | ||
|
||
def __init__(self, n_distances=10, metric='chebyshev', metric_params={}, | ||
n_jobs=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put a comma after n_jobs=None
.
Why you chose these default parameters?
counts)) | ||
|
||
if self.n_distances is None: | ||
self.n_distances_ = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrite that please
) | ||
self.n_distances_ = self.n_distances | ||
else: | ||
self.n_distances_ = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrite that please
self.n_distances_ = \ | ||
tuple([self.n_distances] * self.homology_dimensions_) | ||
|
||
self._distance_function = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrite that please
Reference issues/PRs
Types of changes
Description
Add TopologicalVector.
Screenshots (if appropriate)
Any other comments?
Checklist
flake8
to check my Python changes.pytest
to check this on Python tests.