Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TopologicalVector #493

Closed
wants to merge 16 commits into from
Closed

Add TopologicalVector #493

wants to merge 16 commits into from

Conversation

gtauzin
Copy link
Collaborator

@gtauzin gtauzin commented Sep 14, 2020

Reference issues/PRs

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Description
Add TopologicalVector.

Screenshots (if appropriate)

Any other comments?

Checklist

  • I have read the guidelines for contributing.
  • My code follows the code style of this project. I used flake8 to check my Python changes.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed. I used pytest to check this on Python tests.

Signed-off-by: Guillaume Tauzin <[email protected]>
@ulupo
Copy link
Collaborator

ulupo commented Sep 14, 2020

@gtauzin I had never seen this before! It reminds me of the fact that I once thought it would be a good idea to have a sorting transformer to sort persistence pairs by persistence, so as to have a canonical order that would make it a bit meaningful to use a neural network directly on the diagrams. But I read somewhere that this does not work so well "in practice" (though I've never tried and am a bit skeptical). Anyway, this one is cool too.

@ulupo
Copy link
Collaborator

ulupo commented Sep 19, 2020

@gtauzin I'm thinking that this transformer can also be seen as a "representation" and not as a "feature generator". Of course the boundary has always been blurry, but I was under the impression that the unspoken rule is that we call "features" only the scalar features (more precisely, we allow at most one scalar per homology dimension). In that case, it would seem that TopologicalVector better belongs among BettiCurve and the like. What do you think?

@gtauzin
Copy link
Collaborator Author

gtauzin commented Sep 20, 2020

@gtauzin I'm thinking that this transformer can also be seen as a "representation" and not as a "feature generator". Of course the boundary has always been blurry, but I was under the impression that the unspoken rule is that we call "features" only the scalar features (more precisely, we allow at most one scalar per homology dimension). In that case, it would seem that TopologicalVector better belongs among BettiCurve and the like. What do you think?

That's an interesting point. one could argue that ComplexPolynomial is in the same situation.

If I was to define what a representation is, I would say that it is an object whose visualization is useful to understand the information contained in a persistence diagram and on which further interesting features can be extracted. I think both TopologicalVector and ComplexPlolynomail would then not qualify as representations. What do you think about this definition?

@gtauzin
Copy link
Collaborator Author

gtauzin commented Sep 21, 2020

We should allow here n_distances to be a list as for ComplexPolynomial #479.

@ulupo
Copy link
Collaborator

ulupo commented Sep 21, 2020

If I was to define what a representation is, I would say that it is an object whose visualization is useful to understand the information contained in a persistence diagram and on which further interesting features can be extracted. I think both TopologicalVector and ComplexPlolynomail would then not qualify as representations. What do you think about this definition?

IMO, we say "representations" as a shorthand for "vector representations" which in turn is a perfect synonym for "vectorizations". So the general gist for me is that these are ways for each persistence diagram to be made into high-dimensional vectors. If the vector space structure is relevant, e.g. if one can use the Euclidean distance, cosine distance, Euclidean inner product, etc. to get meaningful quantities, then in my mind we are firmly in the realm of representations. In the end, the boundary is pretty blurred I guess. To avoid getting too philosophical, I thought that having a stronger divide (one feature per hom dim vs multi-dimensional vectors per hom dim) would make our life easier, but it's not that important to me (and the user only sees the difference in the API reference, not in import statements).

My personal preference in general would be to not tie the characterization to visualization, because then one can claim anything can be visualized and how useful that really is uncomfortably subjective for me.

@ulupo
Copy link
Collaborator

ulupo commented Sep 22, 2020

We should allow here n_distances to be a list as for ComplexPolynomial #479.

This is now possible following #502. I'll make the change.

@gtauzin
Copy link
Collaborator Author

gtauzin commented Sep 23, 2020

If I was to define what a representation is, I would say that it is an object whose visualization is useful to understand the information contained in a persistence diagram and on which further interesting features can be extracted. I think both TopologicalVector and ComplexPlolynomail would then not qualify as representations. What do you think about this definition?

IMO, we say "representations" as a shorthand for "vector representations" which in turn is a perfect synonym for "vectorizations". So the general gist for me is that these are ways for each persistence diagram to be made into high-dimensional vectors. If the vector space structure is relevant, e.g. if one can use the Euclidean distance, cosine distance, Euclidean inner product, etc. to get meaningful quantities, then in my mind we are firmly in the realm of representations. In the end, the boundary is pretty blurred I guess. To avoid getting too philosophical, I thought that having a stronger divide (one feature per hom dim vs multi-dimensional vectors per hom dim) would make our life easier, but it's not that important to me (and the user only sees the difference in the API reference, not in import statements).

My personal preference in general would be to not tie the characterization to visualization, because then one can claim anything can be visualized and how useful that really is uncomfortably subjective for me.

From a purely practical perspective, I feel like a transformer is a "feature generator" if what it outputs is a 2D array (n_samples, n_features) and it is a "representation" if it outputs a intermediate data structure (>3D arrays). If it is a "representation", it means that there are ways to extract interesting meaningful features from it (and we should provide them) and or it helps to visualize it to understand better the data.

Moving TopologicalVector and the like to representations.py does not have much consequences anyways (but in that case, we should have a separate version that is in feature.py when n_distances=1 xD), so I won't fight for it. But it is weird to me to think that TopologicalVector is a representation is strange. I would say, just make it output a 3D array with homology dimension as an axis just for consistency with the rest. But this does not make sense as it is better to be able to specify the n_distances per homology dimension.

@ulupo
Copy link
Collaborator

ulupo commented Sep 24, 2020

@gtauzin thanks for the patience and for the suggestions. I'm happy with the practical criterion you mentioned, that anything outputting 2D arrays is a feature generator. So let's keep both transformers here!

@ulupo ulupo changed the title WIP: Add TopologicalVector Add TopologicalVector Sep 26, 2020
distances = self._distance_function.pairwise(Xd)

Xd[:, 1] = Xd[:, 1] - Xd[:, 0]
min_persistence = 0.5 * np.minimum(Xd[:, 1], Xd[:, 1].T)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gtauzin could you explain this line to me? I expect Xd[:, 1] to be a 1d array at this point, and if true this means that Xd[:, 1] are Xd[:, 1].T are equal arrays.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you are right, this line must have remained for my own experiments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I change this line to

min_persistence = 0.5 * Xd[:, 1]

? I also think this logic should be documented.

- Allow tuple n_coefficients
- Fix explanation of behaviour when n_coefficients is None
- Make homology_dimensions_ and n_coefficients_ tuples
- Fix shape of Xt in ComplexPolynomial .transform docs
- Allow tuple and list n_distances in TopologicalVector, fix docs
- Simplify docs for metric in TopologicalVector
- Make n_distances_ a tuple
- Add X input check in TopologicalVector.fit
- Begin fixing TopologicalVector.transform logic
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ulupo
❌ Guillaume Tauzin


Guillaume Tauzin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

self.n_coefficients_ = [_homology_dimensions_counts[dim]
for dim in self.homology_dimensions_]
elif type(self.n_coefficients) == list:
self.n_coefficients_ = \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just write

self.n_coefficients_ = tuple(
    [ _homology_dimensions_counts[dim] for dim in self.homology_dimensions_]
)

self.n_coefficients_ = \
tuple([_homology_dimensions_counts[dim]
for dim in self.homology_dimensions_])
elif type(self.n_coefficients) in (list, tuple):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not how we check the type.

it should be elif isistance(self.n_coefficients, (list, tuple))

You should use the modes Typing instead of the primitive types.

f'{_n_homology_dimensions} homology dimensions.'
f'`n_coefficients` has length {len(self.n_coefficients)} '
f'while diagrams in `X` have {_n_homology_dimensions} '
f'homology dimensions.'
)
self.n_coefficients_ = self.n_coefficients
else:
self.n_coefficients_ = \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rewrite it better

f'`n_coefficients` has been passed as a list of length '
f'{len(self.n_coefficients)} while diagrams in `X` have '
f'{_n_homology_dimensions} homology dimensions.'
f'`n_coefficients` has length {len(self.n_coefficients)} '

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

difficult to read this message

)
self.n_coefficients_ = self.n_coefficients
else:
self.n_coefficients_ = \
[self.n_coefficients] * _n_homology_dimensions
tuple([self.n_coefficients] * _n_homology_dimensions)

self._polynomial_function = \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rewrite that

}

def __init__(self, n_distances=10, metric='chebyshev', metric_params={},
n_jobs=None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put a comma after n_jobs=None.

Why you chose these default parameters?

counts))

if self.n_distances is None:
self.n_distances_ = \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrite that please

)
self.n_distances_ = self.n_distances
else:
self.n_distances_ = \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrite that please

self.n_distances_ = \
tuple([self.n_distances] * self.homology_dimensions_)

self._distance_function = \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrite that please

@matteocao matteocao closed this May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants