-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Enable heatmaps when tiling on the fly #491
base: main
Are you sure you want to change the base?
Conversation
…d for homogeneity
for more information, see https://pre-commit.ci
Codecov Report
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, I left some suggestions.
hi-ml-histopathology/src/histopathology/configs/classification/DeepSMILEPanda.py
Outdated
Show resolved
Hide resolved
hi-ml-histopathology/src/histopathology/datamodules/base_module.py
Outdated
Show resolved
Hide resolved
hi-ml-histopathology/src/histopathology/datamodules/base_module.py
Outdated
Show resolved
Hide resolved
hi-ml-histopathology/src/histopathology/datamodules/base_module.py
Outdated
Show resolved
Hide resolved
hi-ml-histopathology/src/histopathology/datamodules/base_module.py
Outdated
Show resolved
Hide resolved
return faulty_slides_idx | ||
|
||
def get_slide_patch_coordinates( | ||
self, slide_offset: List, patches_location: List, patch_size: List |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add doc string please?
results.update(metadata_dict) | ||
# each slide can have a different number of patches | ||
for i in range(n_slides): | ||
updated_metadata_dict = self.compute_slide_metadata(batch, i, metadata_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if all this metadata processing should be wrapped into a transform. Data is prefetched anyways it should be more efficient to apply it as a transform that is muti processed by the dataloader. Also from a design perspective, the model shouldn't be handling any metadata processing...
@@ -72,6 +72,8 @@ def normalize_dict_for_df(dict_old: Dict[ResultsKey, Any]) -> Dict[str, Any]: | |||
value = value.squeeze(0).cpu().numpy() | |||
if value.ndim == 0: | |||
value = np.full(bag_size, fill_value=value) | |||
if isinstance(value, List) and isinstance(value[0], torch.Tensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why do we need to do that here? is that for the coordinates? Maybe turn it into numpy arrays in the first place?
hi-ml-histopathology/src/histopathology/datamodules/base_module.py
Outdated
Show resolved
Hide resolved
self, slide_offset: List, patches_location: List, patch_size: List | ||
) -> Tuple[List, List, List, List]: | ||
""" computing absolute coordinates for all patches in a slide""" | ||
top, bottom, left, right = self.get_empty_lists(len(patches_location), 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might this be cleaner using numpy arrays, perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dccastro , I like your suggestion above to use the Box
class instead. Much cleaner than using a 4-tuple of ints, that's just calling for errors to happen.
return ll | ||
|
||
@staticmethod | ||
def get_patch_coordinate(slide_offset: List, patch_location: List, patch_size: List) -> Tuple[int, int, int, int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you think of using our Box
class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on that. Tuples with 4 elements of the same type are just too easy to mix up
hi-ml-histopathology/src/histopathology/datamodules/base_module.py
Outdated
Show resolved
Hide resolved
|
||
def compute_slide_metadata(self, batch: Dict, index: int, metadata_dict: Dict) -> Dict: | ||
"""compute patch-dependent and patch-invariante metadata for a single slide """ | ||
offset = batch[SlideKey.OFFSET.value][index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Minor] The .value
shouldn't be necessary
top[i], bottom[i], left[i], right[i] = self.get_patch_coordinate(slide_offset, location, patch_size) | ||
return top, bottom, left, right | ||
|
||
def compute_slide_metadata(self, batch: Dict, index: int, metadata_dict: Dict) -> Dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this method need to mutate the input metadata_dict
in-place, instead of returning a new dictionary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. If needs to mutate in place, then the signature and name of the function should reflect that mutation. (update_metadata
, -> None
)
], | ||
if all(key.value in batch.keys() for key in [SlideKey.OFFSET, SlideKey.PATCH_LOCATION, SlideKey.PATCH_SIZE]): | ||
n_slides = len(batch[SlideKey.SLIDE_ID]) | ||
metadata_dict = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this be replaced by just adding if key not in results: results[key] = []
in the loop below?
top[i], bottom[i], left[i], right[i] = self.get_patch_coordinate(slide_offset, location, patch_size) | ||
return top, bottom, left, right | ||
|
||
def compute_slide_metadata(self, batch: Dict, index: int, metadata_dict: Dict) -> Dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. If needs to mutate in place, then the signature and name of the function should reflect that mutation. (update_metadata
, -> None
)
ResultsKey.IMAGE_PATH: [ | ||
[img_path] * bag_sizes[i] for i, img_path in enumerate(batch[SlideKey.IMAGE_PATH]) | ||
], | ||
if all(key.value in batch.keys() for key in [SlideKey.OFFSET, SlideKey.PATCH_LOCATION, SlideKey.PATCH_SIZE]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a set operation would be easier to understand. set(SlideKey.OFFSET, SlideKey....) <= set(batch.keys()
@@ -72,6 +72,8 @@ def normalize_dict_for_df(dict_old: Dict[ResultsKey, Any]) -> Dict[str, Any]: | |||
value = value.squeeze(0).cpu().numpy() | |||
if value.ndim == 0: | |||
value = np.full(bag_size, fill_value=value) | |||
if isinstance(value, List) and isinstance(value[0], torch.Tensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function could do with a lot more documentation. A docstring would be great. And each of the if
branches should have a clear description which case we are handling here, and where those cases arise.
for more information, see https://pre-commit.ci
…crosoft/hi-ml into vsalva/monai_transform_update
for more information, see https://pre-commit.ci
…crosoft/hi-ml into vsalva/monai_transform_update
for more information, see https://pre-commit.ci
@@ -33,6 +33,47 @@ dependencies: | |||
- ruamel.yaml==0.16.12 | |||
- tensorboard==2.6.0 | |||
# Histopathology requirements | |||
- coloredlogs==15.0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to primary_deps.yml
should be made via requirements_run.txt
If only one float number is given, it will be applied to all dimensions. Defaults to 0.0. | ||
:param intensity_threshold: a value to keep only the patches whose sum of intensities are less than the | ||
threshold. Defaults to no filtering. | ||
:pad_mode: refer to NumpyPadMode and PytorchPadMode. If None, no padding will be applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:pad_mode: refer to NumpyPadMode and PytorchPadMode. If None, no padding will be applied. | |
:param pad_mode: refer to NumpyPadMode and PytorchPadMode. If `None`, no padding will be applied. |
monai transform for tiling on the fly. | ||
:param filter_mode: when `num_patches` is provided, it determines if keep patches with highest values | ||
(`"max"`), lowest values (`"min"`), or in their default order (`None`). Default to None. | ||
:param overlap: the amount of overlap of neighboring patches in each dimension (a value between 0.0 and 1.0). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the order of values? (width, height) or the other way around?
max_offset = None if (self.random_offset and stage == ModelKey.TRAIN) else 0 | ||
|
||
if stage != ModelKey.TRAIN: | ||
grid_transform = RandGridPatchd( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I don't get here: We are using random tiles when we are NOT training?
bottom = slide_offset[0] + patch_location[0] + patch_size[0] | ||
left = slide_offset[1] + patch_location[1] | ||
right = slide_offset[1] + patch_location[1] + patch_size[1] | ||
return top, bottom, left, right |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuples of 4 integers are really error prone. Can we use the Box
class instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(maybe changing it to use top, bottom, left, right in the same washup?)
@staticmethod | ||
def expand_slide_constant_metadata(id: str, path: str, n_patches: int, top: List[int], | ||
bottom: List[int], left: List[int], right: List[int]) -> Tuple[List, List, List]: | ||
"""Duplicate metadata that is patch invariant to match the shape of other arrays""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand the documentation a bit here? also "match the shape of other arrays" is not completely correct, it is matching the number given in n_patches
(and assumes that many arrays have matching lengths).
@@ -72,6 +72,8 @@ def normalize_dict_for_df(dict_old: Dict[ResultsKey, Any]) -> Dict[str, Any]: | |||
value = value.squeeze(0).cpu().numpy() | |||
if value.ndim == 0: | |||
value = np.full(bag_size, fill_value=value) | |||
if isinstance(value, List) and isinstance(value[0], torch.Tensor): | |||
value = [value[i].item() for i in range(len(value))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value = [value[i].item() for i in range(len(value))] | |
value = [v.item() for v in value] |
@@ -39,7 +39,7 @@ def __getitem__(self, index: int) -> List[Dict[SlideKey, Any]]: | |||
|
|||
|
|||
@pytest.mark.parametrize("random_n_tiles", [False, True]) | |||
def test_image_collate(random_n_tiles: bool) -> None: | |||
def test_array_collate(random_n_tiles: bool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no test coverage for the new functionality?
In this PR we enable heatmap outputs when tiling on the fly. Specifically: