-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PD gradient #411
PD gradient #411
Conversation
@@ -0,0 +1,251 @@ | |||
import numpy as np | |||
import tensorflow as tf | |||
import tensorflow_addons as tfa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tensorflow_addons needs do be added to the cmake and CI files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding it to CI in #425, but after that you will still need to add the cmake stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great thanks!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now it looks like tensorflow_addons is not used anymore. Should we remove it from gudhi-deploy?
Pour le coup je ne sais pas trop comment réparer le fail du build windows... |
You can ignore the windows mpfr issue, we'll need to modify or disable that CI. |
src/python/doc/differentiation.rst
Outdated
|
||
.. include:: differentiation_sum.inc | ||
|
||
In this module, we provide neural network models for computing persistent homology. In particular, we provide TensorFlow 2 models that allow to compute persistence diagrams from complexes available in the Gudhi library, including simplex trees, cubical complexes and Vietoris-Rips complexes. These models can be incorporated at each step of a given neural network architecture, and can be used in addition to `PersLay <https://github.com/MathieuCarriere/gudhi/blob/perslay/src/python/gudhi/representations/perslay.py>`_ to produce topological features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we provide neural network models for computing persistent homology.
Wouldn't something like "a model supporting automatic differentiation" be more precise? In sense that it could be use in a more general setting than the one of neural networks, right?
I am not very familiar with this project so feel free to ignore this comment.
|
||
class LowerStarSimplexTreeModel(tf.keras.Model): | ||
""" | ||
TensorFlow model for computing lower-star persistence out of a simplex tree. Since simplex trees cannot be easily encoded as TensorFlow variables, the model takes as input a path to a file containing the simplex tree simplices, and read it each time the simplex tree is required for computations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is a bit long. Don't know if we really care about it though.
DXX = tf.reshape(DX, [1, DX.shape[0], DX.shape[1]]) | ||
|
||
# Turn numpy function into tensorflow function | ||
RipsTF = lambda DX: tf.numpy_function(_Rips, [DX, m, d, c], [tf.int32 for _ in range(4*c)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is crucial, but [tf.int32 for _ in range(4*c)]
can be replaced by [tf.int32]*(4*c)
, which is about 30x faster with c=1000 for instance (same comment holds in other places as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reproduce my claim:
~ ➜ ipython
In [1]: import tensorflow as tf
In [2]: c = 1000
In [3]: %timeit [tf.int32 for _ in range(4*c)]
201 µs ± 5.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [4]: %timeit [tf.int32]*(4*c)
6.92 µs ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [5]: [tf.int32]*(4*c) == [tf.int32 for _ in range(4*c)]
Out[5]: True
As long as the behavior is well documented, I think it does not matter much. I'm slightly leaning towards having the same output shape in all cases, just to maintain consistency (and perhaps make it a bit more user-friendly). |
Hello, However I agree with Felix, the most important thing is to document it accordingly. |
OK I'll quickly change the output then. Thanks for the comments! |
All right, I changed the output, now it should be consistent with the two other functions |
Constructor for the CubicalLayer class | ||
|
||
Parameters: | ||
dimensions (List[int]): homology dimensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now notice that CubicalComplex.__init__
also has a parameter called dimensions
but with a completely different meaning (the shape of the array). Should we use something more explicit like homology_dimensions
(in #499 @VincentRouvreau proposes persistence_dimension
, it could make sense to use the same name in both, whatever name we decide), or do we consider that it is ok with just dimensions
and hopefully people won't get too confused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I am a bit more in favor of homology_dimensions
though. If @VincentRouvreau is OK with it, I'll update the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
homology_dimensions
works for me if Vincent is OK with it.
In the sklearn interface, @VincentRouvreau also accepts an int when we want to specify a single dimension, so that would read homology_dimensions=2
, I hope it is clear enough that we are not asking for 2 unspecified homology groups in this case but we only want homology in dimension 2 🤞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can be even more explicit with homology_dimension_list
, but maybe this is too much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MathieuCarriere homology_dimension_list
may be too restrictive as my interface can accept an int or a list of int
. When only one dimension is required, it is quite useful and simplifies the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'll stick with homology_dimensions
then
Constructor for the LowerStarSimplexTreeLayer class | ||
|
||
Parameters: | ||
simplextree (gudhi.SimplexTree): underlying simplex tree. Its vertices MUST be named with integers from 0 to n = number of vertices. Note that its filtration values are modified in each call of the class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From 0 to n, there are n+1 numbers 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks!!
Once the name for dimensions/persistence_dimension/homology_dimensions is settled, this is ok to merge, thanks. |
|
||
class CubicalLayer(tf.keras.layers.Layer): | ||
""" | ||
TensorFlow layer for computing the persistent homology of a cubical complex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could mention somewhere what field we are working on, possibly even let users specify it, but that can be done later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it. Since this parameter has a default value, it should not break examples and tests.
@@ -8,15 +8,15 @@ | |||
|
|||
# The parameters of the model are the point coordinates. | |||
|
|||
def _Rips(DX, max_edge, dimensions): | |||
def _Rips(DX, max_edge, dimensions, homology_coeff_field=11): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to repeat the default value here, _Rips
(and similarly for the other private functions) is always called with a specified field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you are right, I removed it.
A pull request for computing gradient of various persistence diagram constructions with Tensorflow