[WIP] Add local blur ("blob") augmentations #18

mdraw · 2018-04-12T14:21:03Z

No description provided.

and pass data erasing config to the training routine. Example: erasing_config = {"probability" : 0.75, "threshold": threshold, "lim_blob_depth": [5, 15], "lim_blob_width": [5, 15], "lim_blob_height": [5, 15], "lim_gauss_diffusion": [1, 6], "verbose": False, "save_path": "../../trash_bin/", "num_steps_save": 1000} Be sure that the config is consistent together with the batch size. For that reason, one can call "check_data_erasing_config" before calling the training routine. Example: try: check_data_erasing_config(test_batch.numpy()[0], **erasing_config) except (IncorrectLimits, IncorrectType, OSError) as internal_error: print(internal_error) raise Exception except Exception: print("an expected error happened during data erasing config check") raise Exception

mdraw

@ravil-mobile Please look into the review comments.

mdraw · 2018-04-12T14:42:08Z

elektronn3/training/trainer.py

+                    # since the new raw data batch is not pinned
+                    inp = deepcopy(inp)
+
+                    apply_data_erasing(batch=inp.numpy()[0], **self.data_erasing_config)


We really shouldn't do any augmentations directly during the training loop. It slows down training and can lead to accidental gradient computation. All augmentations should be done in Dataset classes (like PatchCreator in our case) so they can be automatically offloaded to DataLoader background processes and don't have to use PyTorch tensors. See http://pytorch.org/docs/master/data.html.
To be clear, the proposed non-geometric augmentations should be performed on the inp numpy array after the normalization step here:

elektronn3/elektronn3/data/cnndata.py

Line 193 in cacafe9

inp = (inp - self.mean) / self.std

PyTorch tensors should not be involved in augmentations.

I totally agree with that. The problem is: I haven't figured out where exactly I have to insert my augmentation. Whenever I look at the dataloader and stuff which is related to that I go to the "guts" of pytorch which is not a right place to insert my function. Please, propose an idea where exactly it should be

As I wrote above, it should be performed directly after the input normalization here:

elektronn3/elektronn3/data/cnndata.py

Line 193 in cacafe9

inp = (inp - self.mean) / self.std

mdraw · 2018-04-12T14:44:45Z

elektronn3/training/trainer.py

+                    # to avoid messing up samples within the data loader
+                    # WARNING: possible it slows down the performance
+                    # since the new raw data batch is not pinned
+                    inp = deepcopy(inp)


If you want to copy a tensor x, use x.clone(). That's not necessary here though, because augmentations shouldn't be performed here anyway (see the comment below).

mdraw · 2018-04-12T14:51:32Z

elektronn3/data/blob_generator.py

+
+class BlobGenerator:
+    """ A class instance generates blobs with arbitrary
+    spacial size and location within specified domain.


*spatial

*the specified

"domain" is ambiguous. Perhaps something like "coordinate bounds"?

mdraw · 2018-04-12T14:52:21Z

elektronn3/data/blob_generator.py

+class BlobGenerator:
+    """ A class instance generates blobs with arbitrary
+    spacial size and location within specified domain.
+    The domain size is usually the special size of the batch.


*spatial size of the input sample. We are not dealing with batches here.

mdraw · 2018-04-12T14:54:51Z

elektronn3/data/blob_generator.py

+    """ A class instance generates blobs with arbitrary
+    spacial size and location within specified domain.
+    The domain size is usually the special size of the batch.
+    The user is responsible to pass correct parameters.


Since this class is only internally used (and a programmer is kind of always responsible for passing correct parameters), I think this line can be deleted.

mdraw · 2018-04-13T01:25:35Z

elektronn3/data/data_erasing.py

+
+            blob = generator.create_blob()
+
+            for k in range(blob.z_min, blob.z_max + 1):


(Low priority:) Avoid nested for loops. You can rewrite them with itertools.

mdraw · 2018-04-13T01:28:15Z

elektronn3/data/data_erasing.py

+    if np.random.rand() > probability:
+        return
+
+    channels, batch_depth, batch_width, batch_height = batch.shape


As noted above, none of these values are related to a batch, so the names should be updated.

mdraw · 2018-04-13T01:30:34Z

elektronn3/data/data_erasing.py

+        return
+
+    channels, batch_depth, batch_width, batch_height = batch.shape
+    batch_volume = batch_depth * batch_width * batch_height


(Low priority:) volume = np.prod(inp.shape[1:]) (given that batch is renamed to inp.

mdraw · 2018-04-13T01:39:27Z

elektronn3/data/data_erasing.py

+                            blob.x_min:blob.x_max + 1,
+                            blob.y_min:blob.y_max + 1]
+
+            diffuseness = np.random.randint(low=lim_gauss_diffusion[0],


(Low priority:) IMO "diffuseness" is not an optimal name for the sigma parameter of a gaussian blur filter. I'd just call it something like gaussian_sigma or gaussian_std. But that's just my opinion.

mdraw · 2018-04-13T01:45:45Z

elektronn3/data/data_erasing.py

+            print("erased percentage for channel (%i): %f" %
+                  (batch_indx, erasing_percentage))
+
+        if save_path and num_steps_save:


This shouldn't be inside the main function. If you still need this, you can put it into an own utility function.

mdraw · 2018-04-13T02:12:12Z

elektronn3/data/data_erasing.py

+
+    # Check the user's specified blob size.
+    # First entry of each list must be less than the second one
+    if lim_blob_depth[0] >= lim_blob_depth[1]:


Can you please squash those checks (until line 214) into one or two checks and report all dimensions in case of an error? IMO it would even be fine to replace all these verbose checks with assertions. If any of the assertions fail, you will still know where they occur and can debug them properly.
Another idea would be to use relative size limits in the augmentation function (relative to the input size), instead of specifying fixed sizes. That would have the benefits that all those checks will no longer be needed and you won't have to re-think those sizes when varying input patch sizes in PatchCreator.

mdraw · 2018-04-13T13:23:01Z

elektronn3/data/data_erasing.py

+                            blob.x_min:blob.x_max + 1,
+                            blob.y_min:blob.y_max + 1]
+
+            diffuseness = np.random.randint(low=lim_gauss_diffusion[0],


Why is this an integer? That looks like an arbitrary limitation for the sigma parameter of gaussian blurring. It should be a random float value.
It should also make use of scipy.ndimage.gaussian_filter's support for per-axis sigma values: You can enhance the versatility of the blurring effect by passing a sequence of three independent random values as the per-axis sigma.
Most importantly, the depth axis should optionally take input data anisotropy into account (which is very common in 3D data sets including ours...). This should be controlled by an additional aniso parameter in which the user can specify per-axis anisotropy, or at least z-axis anisotropy factor like in

elektronn3/elektronn3/data/transformations.py

Lines 433 to 435 in a18301d

aniso_factor: float

Anisotropy factor that determines an additional scaling in ``z``

direction.

mdraw · 2018-04-13T13:42:16Z

examples/train_unet_neurodata.py

+iterator = st.loader.__iter__()
+test_batch, test_target = iterator.__next__()
+
+try:


You can just remove the try/except block because it exits the process anyways. check_data_erasing_config(test_batch.numpy()[0], **erasing_config) is enough.

mdraw · 2018-04-13T13:53:16Z

examples/train_unet_neurodata.py

 )
-st.train(max_steps)
+
+iterator = st.loader.__iter__()


Don't use double underscore methods, they are meant to be private. In this case, what you want is
test_inp, _ = next(iter(st.loader)).

You don't actually need to load an example batch to find out the values that are relevant for your check. You can find the spatial input shape at

elektronn3/examples/train_unet_neurodata.py

Line 81 in cacafe9

'patch_shape': (48, 96, 96),

Please replace the batch parameter in check_data_erasing_config() with a patch_shape parameter (spatial_shape could be a better name though) and use the shape from the data_init_kwargs dict for it.

mdraw · 2018-04-13T14:10:30Z

examples/train_unet_neurodata.py

+test_batch, test_target = iterator.__next__()
+
+try:
+    check_data_erasing_config(test_batch.numpy()[0], **erasing_config)


It's less than ideal that this check is performed inside of a user's training script. Once you've addressed the comments above, you should move it to an elektronn3-internal module and perform the check automatically from PatchCreator if the new augmentations are used.

mdraw · 2018-04-13T14:25:07Z

examples/train_unet_neurodata.py

+
+erasing_config = {"probability" : 0.75,
+                  "threshold": threshold,
+                  "lim_blob_depth": [5, 15],


The default depth should be halved because of the anisotropy of neuro_data.

#18

mdraw · 2018-04-19T15:44:11Z

73a554e: Please don't simultaneously rename files and make major changes to them in one commit, because this messes with the git diffs (making all review comments "outdated"). Renaming should be in a separate commit so that git can always understand that files were only moved/renamed, not deleted/created. You can check if renaming was correctly detected by git by checking git status between staging and committing.

Edit: I just tried to fix the review by force-pushing a modified version of 73a554e, but it's still broken (showing unchanged parts as "outdated").

#18 This commit is the same as 73a554e, but without the data_erasing renaming. Co-authored-by: mdraw <[email protected]>

mdraw · 2018-04-19T16:22:41Z

elektronn3/data/data_erasing.py

+class ScheduledParameter(object):
+    """ The class is responsible for a parameter scheduling along an iterative
+    process according to either the linear or exponential growth. The user
+    specifies the initial value, the target one, growth type and the number


*target one -> maximum value
(because that's closer to the parameter name "target" can be confused with training targets)

mdraw · 2018-04-19T16:32:45Z

elektronn3/data/data_erasing.py

-    process according to the exponential law. The user specifies
-    the initial value, the target one and the number of steps
-    along which the variable will be gradually scaled.
+class ScheduledParameter(object):


Parameter is even more confusing than Variable, because "parameter" generally refers to learnable weights when talking about neural networks. How about just Scheduler or ScalarScheduler?

mdraw · 2018-04-19T16:35:21Z

elektronn3/data/data_erasing.py

-    the initial value, the target one and the number of steps
-    along which the variable will be gradually scaled.
+class ScheduledParameter(object):
+    """ The class is responsible for a parameter scheduling along an iterative


*Scheduler for a scalar value.

mdraw · 2018-04-19T16:36:32Z

elektronn3/data/data_erasing.py

-    along which the variable will be gradually scaled.
+class ScheduledParameter(object):
+    """ The class is responsible for a parameter scheduling along an iterative
+    process according to either the linear or exponential growth. The user


*the linear -> linear (without "the")

mdraw · 2018-04-19T16:38:13Z

elektronn3/data/data_erasing.py

-    If the user doesn't specify the target value or the interval the variable
-    works as a constant
+    If the user doesn't specify the target value or the interval,
+    the parameter works as a constant


mdraw · 2018-04-19T17:09:58Z

elektronn3/data/data_erasing.py

+        """ Prints the current value of the parameter on the screen
+        during an iterative process. The function counts number of
+        step() calls and prints information each time when the number
+        of the calls is even with respect to steps_per_report


*"is divisible by ``steps_per_report``."

mdraw · 2018-04-19T17:20:42Z

elektronn3/data/data_erasing.py


-    def __eq__(self, other):
-        return self.value == other
+        If the used doesn't pass the number of steps_per_report the function


*user
Avoid negative statements "if not a, then not b". Write it like
"""Logs information every ``steps_per_report`` steps (if ``steps_per_report`` is set)."""

mdraw · 2018-04-19T17:22:36Z

elektronn3/data/data_erasing.py

        steps_per_report - int
+            number of step between information update on the screen


The special case steps_per_report = None should be mentioned here, especially since it's the default.

mdraw · 2018-04-19T17:26:34Z

elektronn3/data/data_erasing.py


    """
-    def __init__(self, value, max_value=None, interval=None, steps_per_report=None):
+
+    logger = logging.getLogger('elektronn3log')


Don't tie the logger to a specific class. Use a module-level logger instead, like in all other modules of elektronn3 that perform logging.

mdraw · 2018-04-19T17:38:37Z

elektronn3/data/data_erasing.py

+        -------
+        None
+        """
+        if self.steps_per_report:


if self.steps_per_report suggests that it's a bool.
This should be
if self.steps_per_report is not None.
None checks should always be explicit.

mdraw · 2018-04-19T17:45:25Z

We can re-do the renaming of data_erasing.py later, just before merging the PR (I am a little scared that renaming it now will mess up the review again).

one that contained all latest updates of the refactoring

The documentaion of the "ScalarScheduler" class was corrected The "apply_random_blurring" function calls was moved from the training function to the "__getitem__" one of the "PatchCreator" class

for k, i, j in product(range(region.coords_lo[0], region.coords_hi[0] + 1), range(region.coords_lo[1], region.coords_hi[1] + 1), range(region.coords_lo[2], region.coords_hi[2] + 1)): intersection.add((k, i, j))

And disable writing augmented files to disk during training

mdraw commented Apr 13, 2018

View reviewed changes

ravil-mobile requested a review from pschubert April 15, 2018 14:04

ravil-mobile pushed a commit that referenced this pull request Apr 18, 2018

Code refactoring was done according to the Martin's comments:

73a554e

#18

ravil-mobile closed this Apr 18, 2018

ravil-mobile deleted the ravil branch April 18, 2018 10:55

mdraw mentioned this pull request Apr 19, 2018

Ravil #20

Closed

mdraw reopened this Apr 19, 2018

Code refactoring was done according to the Martin's comments:

38fdafa

#18 This commit is the same as 73a554e, but without the data_erasing renaming. Co-authored-by: mdraw <[email protected]>

mdraw force-pushed the ravil branch 2 times, most recently from b9208e1 to 38fdafa Compare April 19, 2018 16:10

mdraw commented Apr 19, 2018

View reviewed changes

ravil and others added 7 commits April 22, 2018 22:40

The remote branch "ravil" was manually merged with the local

419cce6

one that contained all latest updates of the refactoring

The "ScheduledParameter" class was renamed to "ScalarScheduler"

116a223

The documentaion of the "ScalarScheduler" class was corrected The "apply_random_blurring" function calls was moved from the training function to the "__getitem__" one of the "PatchCreator" class

The indentation was changed within the "product" loop

b06ae83

for k, i, j in product(range(region.coords_lo[0], region.coords_hi[0] + 1), range(region.coords_lo[1], region.coords_hi[1] + 1), range(region.coords_lo[2], region.coords_hi[2] + 1)): intersection.add((k, i, j))

Manually merge branch 'master' into ravil

7ff7127

examples: Synchronize, update random blurring ex.

80c9a02

And disable writing augmented files to disk during training

Minor typo fixes and style changes

b542285

Rename data_erasing to random_blurring

cefadcb

mdraw merged commit 8420048 into master May 17, 2018

mdraw restored the ravil branch January 7, 2019 17:47

mdraw deleted the ravil branch February 23, 2019 23:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add local blur ("blob") augmentations #18

[WIP] Add local blur ("blob") augmentations #18

mdraw commented Apr 12, 2018

mdraw left a comment

mdraw Apr 12, 2018

ravil-mobile Apr 17, 2018

mdraw Apr 19, 2018

mdraw Apr 12, 2018

mdraw Apr 12, 2018

mdraw Apr 12, 2018

mdraw Apr 12, 2018

mdraw Apr 13, 2018

mdraw Apr 13, 2018

mdraw Apr 13, 2018

mdraw Apr 13, 2018

mdraw Apr 13, 2018

mdraw Apr 13, 2018 •

edited

Loading

mdraw Apr 13, 2018

mdraw Apr 13, 2018

mdraw Apr 13, 2018

mdraw Apr 13, 2018

mdraw Apr 13, 2018 •

edited

Loading

mdraw Apr 13, 2018

mdraw commented Apr 19, 2018 •

edited

Loading

mdraw Apr 19, 2018

mdraw Apr 19, 2018

mdraw Apr 19, 2018

mdraw Apr 19, 2018

mdraw Apr 19, 2018

mdraw Apr 19, 2018

mdraw Apr 19, 2018

mdraw Apr 19, 2018

mdraw Apr 19, 2018

mdraw Apr 19, 2018

mdraw commented Apr 19, 2018


		blob = generator.create_blob()

		for k in range(blob.z_min, blob.z_max + 1):

	aniso_factor: float
	Anisotropy factor that determines an additional scaling in ``z``
	direction.

		steps_per_report - int
		number of step between information update on the screen

[WIP] Add local blur ("blob") augmentations #18

[WIP] Add local blur ("blob") augmentations #18

Conversation

mdraw commented Apr 12, 2018

mdraw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdraw Apr 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdraw Apr 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdraw commented Apr 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdraw commented Apr 19, 2018

mdraw Apr 13, 2018 •

edited

Loading

mdraw Apr 13, 2018 •

edited

Loading

mdraw commented Apr 19, 2018 •

edited

Loading