Add TopologicalVector #493

gtauzin · 2020-09-14T15:57:55Z

Reference issues/PRs

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Description
Add TopologicalVector.

Screenshots (if appropriate)

Any other comments?

Checklist

I have read the guidelines for contributing.
My code follows the code style of this project. I used flake8 to check my Python changes.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing tests passed. I used pytest to check this on Python tests.

Signed-off-by: Guillaume Tauzin <[email protected]>

ulupo · 2020-09-14T16:04:17Z

@gtauzin I had never seen this before! It reminds me of the fact that I once thought it would be a good idea to have a sorting transformer to sort persistence pairs by persistence, so as to have a canonical order that would make it a bit meaningful to use a neural network directly on the diagrams. But I read somewhere that this does not work so well "in practice" (though I've never tried and am a bit skeptical). Anyway, this one is cool too.

ulupo · 2020-09-19T17:43:59Z

@gtauzin I'm thinking that this transformer can also be seen as a "representation" and not as a "feature generator". Of course the boundary has always been blurry, but I was under the impression that the unspoken rule is that we call "features" only the scalar features (more precisely, we allow at most one scalar per homology dimension). In that case, it would seem that TopologicalVector better belongs among BettiCurve and the like. What do you think?

gtauzin · 2020-09-20T10:23:49Z

@gtauzin I'm thinking that this transformer can also be seen as a "representation" and not as a "feature generator". Of course the boundary has always been blurry, but I was under the impression that the unspoken rule is that we call "features" only the scalar features (more precisely, we allow at most one scalar per homology dimension). In that case, it would seem that TopologicalVector better belongs among BettiCurve and the like. What do you think?

That's an interesting point. one could argue that ComplexPolynomial is in the same situation.

If I was to define what a representation is, I would say that it is an object whose visualization is useful to understand the information contained in a persistence diagram and on which further interesting features can be extracted. I think both TopologicalVector and ComplexPlolynomail would then not qualify as representations. What do you think about this definition?

gtauzin · 2020-09-21T14:23:56Z

We should allow here n_distances to be a list as for ComplexPolynomial #479.

ulupo · 2020-09-21T16:11:12Z

If I was to define what a representation is, I would say that it is an object whose visualization is useful to understand the information contained in a persistence diagram and on which further interesting features can be extracted. I think both TopologicalVector and ComplexPlolynomail would then not qualify as representations. What do you think about this definition?

IMO, we say "representations" as a shorthand for "vector representations" which in turn is a perfect synonym for "vectorizations". So the general gist for me is that these are ways for each persistence diagram to be made into high-dimensional vectors. If the vector space structure is relevant, e.g. if one can use the Euclidean distance, cosine distance, Euclidean inner product, etc. to get meaningful quantities, then in my mind we are firmly in the realm of representations. In the end, the boundary is pretty blurred I guess. To avoid getting too philosophical, I thought that having a stronger divide (one feature per hom dim vs multi-dimensional vectors per hom dim) would make our life easier, but it's not that important to me (and the user only sees the difference in the API reference, not in import statements).

My personal preference in general would be to not tie the characterization to visualization, because then one can claim anything can be visualized and how useful that really is uncomfortably subjective for me.

ulupo · 2020-09-22T08:24:33Z

We should allow here n_distances to be a list as for ComplexPolynomial #479.

This is now possible following #502. I'll make the change.

gtauzin · 2020-09-23T14:44:58Z

If I was to define what a representation is, I would say that it is an object whose visualization is useful to understand the information contained in a persistence diagram and on which further interesting features can be extracted. I think both TopologicalVector and ComplexPlolynomail would then not qualify as representations. What do you think about this definition?

IMO, we say "representations" as a shorthand for "vector representations" which in turn is a perfect synonym for "vectorizations". So the general gist for me is that these are ways for each persistence diagram to be made into high-dimensional vectors. If the vector space structure is relevant, e.g. if one can use the Euclidean distance, cosine distance, Euclidean inner product, etc. to get meaningful quantities, then in my mind we are firmly in the realm of representations. In the end, the boundary is pretty blurred I guess. To avoid getting too philosophical, I thought that having a stronger divide (one feature per hom dim vs multi-dimensional vectors per hom dim) would make our life easier, but it's not that important to me (and the user only sees the difference in the API reference, not in import statements).

My personal preference in general would be to not tie the characterization to visualization, because then one can claim anything can be visualized and how useful that really is uncomfortably subjective for me.

From a purely practical perspective, I feel like a transformer is a "feature generator" if what it outputs is a 2D array (n_samples, n_features) and it is a "representation" if it outputs a intermediate data structure (>3D arrays). If it is a "representation", it means that there are ways to extract interesting meaningful features from it (and we should provide them) and or it helps to visualize it to understand better the data.

Moving TopologicalVector and the like to representations.py does not have much consequences anyways (but in that case, we should have a separate version that is in feature.py when n_distances=1 xD), so I won't fight for it. But it is weird to me to think that TopologicalVector is a representation is strange. I would say, just make it output a 3D array with homology dimension as an axis just for consistency with the rest. But this does not make sense as it is better to be able to specify the n_distances per homology dimension.

ulupo · 2020-09-24T11:20:22Z

@gtauzin thanks for the patience and for the suggestions. I'm happy with the practical criterion you mentioned, that anything outputting 2D arrays is a feature generator. So let's keep both transformers here!

ulupo · 2020-09-27T17:52:24Z

gtda/diagrams/features.py

+        distances = self._distance_function.pairwise(Xd)
+
+        Xd[:, 1] = Xd[:, 1] - Xd[:, 0]
+        min_persistence = 0.5 * np.minimum(Xd[:, 1], Xd[:, 1].T)


@gtauzin could you explain this line to me? I expect Xd[:, 1] to be a 1d array at this point, and if true this means that Xd[:, 1] are Xd[:, 1].T are equal arrays.

I believe you are right, this line must have remained for my own experiments.

Should I change this line to

min_persistence = 0.5 * Xd[:, 1]

? I also think this logic should be documented.

- Allow tuple n_coefficients - Fix explanation of behaviour when n_coefficients is None - Make homology_dimensions_ and n_coefficients_ tuples - Fix shape of Xt in ComplexPolynomial .transform docs - Allow tuple and list n_distances in TopologicalVector, fix docs - Simplify docs for metric in TopologicalVector - Make n_distances_ a tuple - Add X input check in TopologicalVector.fit - Begin fixing TopologicalVector.transform logic

CLAassistant · 2021-04-01T01:55:02Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ulupo
❌ Guillaume Tauzin

Guillaume Tauzin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

VascoSch92 · 2024-01-28T19:29:26Z

gtda/diagrams/features.py

-            self.n_coefficients_ = [_homology_dimensions_counts[dim]
-                                    for dim in self.homology_dimensions_]
-        elif type(self.n_coefficients) == list:
+            self.n_coefficients_ = \


just write

self.n_coefficients_ = tuple( [ _homology_dimensions_counts[dim] for dim in self.homology_dimensions_] )

VascoSch92 · 2024-01-28T19:30:57Z

gtda/diagrams/features.py

+            self.n_coefficients_ = \
+                tuple([_homology_dimensions_counts[dim]
+                       for dim in self.homology_dimensions_])
+        elif type(self.n_coefficients) in (list, tuple):


this is not how we check the type.

it should be elif isistance(self.n_coefficients, (list, tuple))

You should use the modes Typing instead of the primitive types.

VascoSch92 · 2024-01-28T19:31:24Z

gtda/diagrams/features.py

-                    f'{_n_homology_dimensions} homology dimensions.'
+                    f'`n_coefficients` has length {len(self.n_coefficients)} '
+                    f'while diagrams in `X` have {_n_homology_dimensions} '
+                    f'homology dimensions.'
                    )
            self.n_coefficients_ = self.n_coefficients
        else:
            self.n_coefficients_ = \


please rewrite it better

VascoSch92 · 2024-01-28T19:31:33Z

gtda/diagrams/features.py

-                    f'`n_coefficients` has been passed as a list of length '
-                    f'{len(self.n_coefficients)} while diagrams in `X` have '
-                    f'{_n_homology_dimensions} homology dimensions.'
+                    f'`n_coefficients` has length {len(self.n_coefficients)} '


difficult to read this message

VascoSch92 · 2024-01-28T19:31:53Z

gtda/diagrams/features.py

                    )
            self.n_coefficients_ = self.n_coefficients
        else:
            self.n_coefficients_ = \
-                [self.n_coefficients] * _n_homology_dimensions
+                tuple([self.n_coefficients] * _n_homology_dimensions)

        self._polynomial_function = \


please rewrite that

VascoSch92 · 2024-01-28T19:33:01Z

gtda/diagrams/features.py

+        }
+
+    def __init__(self, n_distances=10, metric='chebyshev', metric_params={},
+                 n_jobs=None):


put a comma after n_jobs=None.

Why you chose these default parameters?

VascoSch92 · 2024-01-28T19:33:13Z

gtda/diagrams/features.py

+                                               counts))
+
+        if self.n_distances is None:
+            self.n_distances_ = \


rewrite that please

VascoSch92 · 2024-01-28T19:33:23Z

gtda/diagrams/features.py

+                    )
+            self.n_distances_ = self.n_distances
+        else:
+            self.n_distances_ = \


rewrite that please

VascoSch92 · 2024-01-28T19:33:35Z

gtda/diagrams/features.py

+            self.n_distances_ = \
+                tuple([self.n_distances] * self.homology_dimensions_)
+
+        self._distance_function = \


rewrite that please

Add TopologicalVector

3896609

Signed-off-by: Guillaume Tauzin <[email protected]>

ulupo and others added 2 commits September 19, 2020 19:16

Merge branch 'master' into vector

274d732

Merge branch 'master' into vector

956a9df

ulupo added 4 commits September 19, 2020 19:54

Fix errors and linting, fix self.homology_dimensions_

1bbe10f

Add DistanceMetric import

c8e7b73

Add Seel also and References in TopologicalVector

e0b3808

Add caveats on properties of X

487c541

ulupo added 3 commits September 26, 2020 10:31

Merge branch 'master' into vector

06a7dda

Fix doc entries

0ec7f44

Fix linting

cb82210

ulupo changed the title ~~WIP: Add TopologicalVector~~ Add TopologicalVector Sep 26, 2020

ulupo added 2 commits September 26, 2020 10:36

Reorder doc entries

4bd941d

Fix See alsos in diagrams/features.py

5ff90f2

ulupo reviewed Sep 27, 2020

View reviewed changes

ulupo added 4 commits September 27, 2020 19:57

Improve tests of ComplexPolynomial

efa7ecc

Continue adapting the logic of TopologicalVector to list n_distances

64c22ea

Place validate_params after array checks as in rest of library

dbbab1e

VascoSch92 reviewed Jan 28, 2024

View reviewed changes

matteocao closed this May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TopologicalVector #493

Add TopologicalVector #493

gtauzin commented Sep 14, 2020 •

edited by ulupo

Loading

ulupo commented Sep 14, 2020 •

edited

Loading

ulupo commented Sep 19, 2020

gtauzin commented Sep 20, 2020

gtauzin commented Sep 21, 2020

ulupo commented Sep 21, 2020 •

edited

Loading

ulupo commented Sep 22, 2020

gtauzin commented Sep 23, 2020

ulupo commented Sep 24, 2020

ulupo Sep 27, 2020

gtauzin Sep 27, 2020

ulupo Sep 27, 2020

CLAassistant commented Apr 1, 2021

VascoSch92 Jan 28, 2024

VascoSch92 Jan 28, 2024

VascoSch92 Jan 28, 2024

VascoSch92 Jan 28, 2024

VascoSch92 Jan 28, 2024

VascoSch92 Jan 28, 2024

VascoSch92 Jan 28, 2024

VascoSch92 Jan 28, 2024

VascoSch92 Jan 28, 2024

Add TopologicalVector #493

Add TopologicalVector #493

Conversation

gtauzin commented Sep 14, 2020 • edited by ulupo Loading

ulupo commented Sep 14, 2020 • edited Loading

ulupo commented Sep 19, 2020

gtauzin commented Sep 20, 2020

gtauzin commented Sep 21, 2020

ulupo commented Sep 21, 2020 • edited Loading

ulupo commented Sep 22, 2020

gtauzin commented Sep 23, 2020

ulupo commented Sep 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CLAassistant commented Apr 1, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gtauzin commented Sep 14, 2020 •

edited by ulupo

Loading

ulupo commented Sep 14, 2020 •

edited

Loading

ulupo commented Sep 21, 2020 •

edited

Loading