nan in Rasterizer gradients #110

shubhtuls · 2020-03-12T21:04:17Z

Description

I encountered a case where two vertices of a triangle had same screen-space XY coordinates, and this led to a 'nan' value in the gradient.

Some interesting things I noted:

Using CPU instead of GPU (Quadro GP100 in this case) leads to no issues
Using CPU vs GPU also gives different pix2face in forward pass of the rasterizer. The CPU run ignores the offending triangle's contribution, the GPU run does not. I am not sure why this happens, but I suspect this is what needs to be fixed.

Instructions To Reproduce the Issue:

import torch

import pytorch3d
import pytorch3d.renderer
import numpy as np
import matplotlib.pyplot as plt

device = torch.device("cuda:0")

# using cpu instead of gpu leads no nan values!
# device = torch.device("cpu")

# Note that v0 and v2 of the triangle have same x,y but different Z, so it's not a degenerate case (just a triangle that is parallel to camera)
# These are actual vertex values that occured during a run
# However, my attempts to manually create a simpler vertex location that reproduced this were unsuccessful
vs = torch.Tensor([[0.7922, -0.1992,  6.8850],[0.8408, -0.1622,  6.8568],[0.7922, -0.1992,  6.89]]).to(device)
fs = torch.Tensor([[0,1,2]]).to(device)

vs.requires_grad = True

meshes = pytorch3d.structures.Meshes([vs],[fs])
cameras = pytorch3d.renderer.OpenGLOrthographicCameras(znear=0, zfar=1, device=device)

blend_params = pytorch3d.renderer.BlendParams(sigma=1e-4, gamma=1e-4)
mask_raster_settings = pytorch3d.renderer.RasterizationSettings(
    image_size=256, 
    blur_radius=np.log(1. / 1e-4 - 1.) * blend_params.sigma, 
    faces_per_pixel=20,
    bin_size=0
)
mask_rasterizer = pytorch3d.renderer.MeshRasterizer(
    cameras=cameras, 
    raster_settings=mask_raster_settings
)
mask_shader = pytorch3d.renderer.SoftSilhouetteShader(blend_params=blend_params)
mask_renderer = pytorch3d.renderer.MeshRenderer(mask_rasterizer, mask_shader)

img_mask = mask_renderer(meshes)

img_mask[0,:,:,3].mean().backward()
print(vs.grad)

plt.imshow(img_mask[0,:,:,3].detach().cpu().numpy())
plt.show()

tensor([[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]], device='cuda:0')

## debugging via checking barycentric coords
pix2face, _, barycentric_coords, _ = mask_rasterizer(meshes)
(pix2face == 0).nonzero()[0]

tensor([  0, 147,  17,   0], device='cuda:0')

print(barycentric_coords[0, 147, 17, 0])

tensor([-3.5279e+26,  0.0000e+00,  3.5279e+26], device='cuda:0',
       grad_fn=<SelectBackward>)

The text was updated successfully, but these errors were encountered:

shubhtuls · 2020-03-12T21:14:52Z

Update: actually, even using a simpler vertex location can reproduce the error e.g.

vs = torch.Tensor([[0., 0.,  1.0],[0.2, 0.2,  2.0],[0., 0.,  3.0]]).to(device)

nikhilaravi · 2020-03-12T21:30:30Z

@shubhtuls thanks for the detailed explanation of the issue. I will try to reproduce the error as described and get back to you!

shubhtuls · 2020-03-12T22:59:06Z

I think this is related to some precision issues when computing the face areas. Adding a statement to print the 'face_area' in the cuda rasterizer implementation here shows that it is of the order of 1e-9, which is greater than the kEpsilon=1e-30 used to check for zero area.

I unblocked on my end by additionally defining a 'kEpsilonFace=1e-7' in these lines and using that for the zero area check, but I'm not sure if this is the ideal solution.

gkioxari · 2020-03-13T01:55:00Z

Small face areas are such a headache! I guess in both the example faces above you have almost 0 face areas. Did nans disappear with 1e-7?

shubhtuls · 2020-03-13T02:03:28Z

@gkioxari - so far, yes!

tomguluson92 · 2020-04-08T03:48:26Z

@gkioxari - so far, yes!

Thanks for your solution, can I realize your solution aims to avoid gradient vanish problem through a more rigid settings according to each face area?

gkioxari · 2020-04-08T16:14:18Z

@tomguluson92 In a follow up diff we are setting the kEpsilon value to 1e-8. Yes the issue is the small face areas that are determined based on that value.

nikhilaravi · 2020-04-24T04:05:28Z

This has been fixed by 487d4d6.

nikhilaravi self-assigned this Mar 12, 2020

nikhilaravi added the bug Something isn't working label Mar 20, 2020

nikhilaravi closed this as completed Apr 24, 2020

JudyYe mentioned this issue Apr 18, 2021

NaN when using MeshRasterizer #561

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nan in Rasterizer gradients #110

nan in Rasterizer gradients #110

shubhtuls commented Mar 12, 2020

shubhtuls commented Mar 12, 2020

nikhilaravi commented Mar 12, 2020

shubhtuls commented Mar 12, 2020 •

edited

Loading

gkioxari commented Mar 13, 2020 •

edited

Loading

shubhtuls commented Mar 13, 2020

tomguluson92 commented Apr 8, 2020

gkioxari commented Apr 8, 2020

nikhilaravi commented Apr 24, 2020 •

edited

Loading

nan in Rasterizer gradients #110

nan in Rasterizer gradients #110

Comments

shubhtuls commented Mar 12, 2020

Description

Instructions To Reproduce the Issue:

shubhtuls commented Mar 12, 2020

nikhilaravi commented Mar 12, 2020

shubhtuls commented Mar 12, 2020 • edited Loading

gkioxari commented Mar 13, 2020 • edited Loading

shubhtuls commented Mar 13, 2020

tomguluson92 commented Apr 8, 2020

gkioxari commented Apr 8, 2020

nikhilaravi commented Apr 24, 2020 • edited Loading

shubhtuls commented Mar 12, 2020 •

edited

Loading

gkioxari commented Mar 13, 2020 •

edited

Loading

nikhilaravi commented Apr 24, 2020 •

edited

Loading