Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PD gradient #411

Merged
merged 102 commits into from
Jun 27, 2022
Merged

PD gradient #411

merged 102 commits into from
Jun 27, 2022

Conversation

MathieuCarriere
Copy link
Collaborator

@MathieuCarriere MathieuCarriere commented Nov 2, 2020

A pull request for computing gradient of various persistence diagram constructions with Tensorflow

@MathieuCarriere MathieuCarriere marked this pull request as ready for review November 13, 2020 08:57
@@ -0,0 +1,251 @@
import numpy as np
import tensorflow as tf
import tensorflow_addons as tfa
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tensorflow_addons needs do be added to the cmake and CI files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am adding it to CI in #425, but after that you will still need to add the cmake stuff.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great thanks!!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now it looks like tensorflow_addons is not used anymore. Should we remove it from gudhi-deploy?

@MathieuCarriere
Copy link
Collaborator Author

Pour le coup je ne sais pas trop comment réparer le fail du build windows...

@mglisse
Copy link
Member

mglisse commented Feb 15, 2021

You can ignore the windows mpfr issue, we'll need to modify or disable that CI.


.. include:: differentiation_sum.inc

In this module, we provide neural network models for computing persistent homology. In particular, we provide TensorFlow 2 models that allow to compute persistence diagrams from complexes available in the Gudhi library, including simplex trees, cubical complexes and Vietoris-Rips complexes. These models can be incorporated at each step of a given neural network architecture, and can be used in addition to `PersLay <https://github.com/MathieuCarriere/gudhi/blob/perslay/src/python/gudhi/representations/perslay.py>`_ to produce topological features.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we provide neural network models for computing persistent homology.

Wouldn't something like "a model supporting automatic differentiation" be more precise? In sense that it could be use in a more general setting than the one of neural networks, right?
I am not very familiar with this project so feel free to ignore this comment.


class LowerStarSimplexTreeModel(tf.keras.Model):
"""
TensorFlow model for computing lower-star persistence out of a simplex tree. Since simplex trees cannot be easily encoded as TensorFlow variables, the model takes as input a path to a file containing the simplex tree simplices, and read it each time the simplex tree is required for computations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is a bit long. Don't know if we really care about it though.

DXX = tf.reshape(DX, [1, DX.shape[0], DX.shape[1]])

# Turn numpy function into tensorflow function
RipsTF = lambda DX: tf.numpy_function(_Rips, [DX, m, d, c], [tf.int32 for _ in range(4*c)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is crucial, but [tf.int32 for _ in range(4*c)] can be replaced by [tf.int32]*(4*c), which is about 30x faster with c=1000 for instance (same comment holds in other places as well).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reproduce my claim:

~ipython                                                                                                                                                                               
In [1]: import tensorflow as tf
In [2]: c = 1000
In [3]: %timeit [tf.int32 for _ in range(4*c)]
201 µs ± 5.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [4]: %timeit [tf.int32]*(4*c)
6.92 µs ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [5]: [tf.int32]*(4*c) == [tf.int32 for _ in range(4*c)]
Out[5]: True

@hensel-f
Copy link

As long as the behavior is well documented, I think it does not matter much. I'm slightly leaning towards having the same output shape in all cases, just to maintain consistency (and perhaps make it a bit more user-friendly).

@tlacombe
Copy link
Contributor

Hello,
I would personally tip the scale in favor of outputing object with the same shape, as it may happen that a user copy-paste some code s·he wrote for Rips dgm and use it for cubical complexes without reading the doc (thinking "this should be the same") ; having something not working may be frustrating.

However I agree with Felix, the most important thing is to document it accordingly.

@MathieuCarriere
Copy link
Collaborator Author

OK I'll quickly change the output then. Thanks for the comments!

@MathieuCarriere
Copy link
Collaborator Author

All right, I changed the output, now it should be consistent with the two other functions

Constructor for the CubicalLayer class

Parameters:
dimensions (List[int]): homology dimensions
Copy link
Member

@mglisse mglisse May 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now notice that CubicalComplex.__init__ also has a parameter called dimensions but with a completely different meaning (the shape of the array). Should we use something more explicit like homology_dimensions (in #499 @VincentRouvreau proposes persistence_dimension, it could make sense to use the same name in both, whatever name we decide), or do we consider that it is ok with just dimensions and hopefully people won't get too confused?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I am a bit more in favor of homology_dimensions though. If @VincentRouvreau is OK with it, I'll update the code.

Copy link
Member

@mglisse mglisse Jun 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

homology_dimensions works for me if Vincent is OK with it.
In the sklearn interface, @VincentRouvreau also accepts an int when we want to specify a single dimension, so that would read homology_dimensions=2, I hope it is clear enough that we are not asking for 2 unspecified homology groups in this case but we only want homology in dimension 2 🤞

Copy link
Collaborator Author

@MathieuCarriere MathieuCarriere Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can be even more explicit with homology_dimension_list, but maybe this is too much?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MathieuCarriere homology_dimension_list may be too restrictive as my interface can accept an int or a list of int. When only one dimension is required, it is quite useful and simplifies the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I'll stick with homology_dimensions then

Constructor for the LowerStarSimplexTreeLayer class

Parameters:
simplextree (gudhi.SimplexTree): underlying simplex tree. Its vertices MUST be named with integers from 0 to n = number of vertices. Note that its filtration values are modified in each call of the class.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From 0 to n, there are n+1 numbers 😉

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!!

@mglisse
Copy link
Member

mglisse commented Jun 4, 2022

Once the name for dimensions/persistence_dimension/homology_dimensions is settled, this is ok to merge, thanks.


class CubicalLayer(tf.keras.layers.Layer):
"""
TensorFlow layer for computing the persistent homology of a cubical complex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could mention somewhere what field we are working on, possibly even let users specify it, but that can be done later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it. Since this parameter has a default value, it should not break examples and tests.

@@ -8,15 +8,15 @@

# The parameters of the model are the point coordinates.

def _Rips(DX, max_edge, dimensions):
def _Rips(DX, max_edge, dimensions, homology_coeff_field=11):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to repeat the default value here, _Rips (and similarly for the other private functions) is always called with a specified field.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you are right, I removed it.

@VincentRouvreau VincentRouvreau merged commit 3e0e47b into GUDHI:master Jun 27, 2022
@VincentRouvreau VincentRouvreau added the 3.6.0 GUDHI version 3.6.0 label Aug 12, 2022
@MathieuCarriere MathieuCarriere deleted the diff branch January 19, 2023 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.6.0 GUDHI version 3.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants