Skip to content

Commit

Permalink
generic: sycl: resampling: Avoid using the cudnn resampling to use th…
Browse files Browse the repository at this point in the history
…e sycl impl
  • Loading branch information
kala855 authored and vpirogov committed Aug 27, 2024
1 parent 16d6dd4 commit 40c1476
Show file tree
Hide file tree
Showing 5 changed files with 2 additions and 581 deletions.
10 changes: 2 additions & 8 deletions src/gpu/gpu_resampling_list.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,6 @@
#include "gpu/intel/ocl/vectorized_resampling.hpp"
#endif

#if DNNL_GPU_VENDOR == DNNL_VENDOR_NVIDIA
#include "gpu/nvidia/cudnn_resampling.hpp"
#endif

#ifdef GENERIC_SYCL_KERNELS_ENABLED
#include "gpu/generic/sycl/ref_resampling.hpp"
#endif
Expand All @@ -40,15 +36,13 @@ using namespace dnnl::impl::prop_kind;
const std::map<pk_impl_key_t, std::vector<impl_list_item_t>>
impl_list_map REG_RESAMPLING_P({
{{forward}, {
GPU_INSTANCE_INTEL(intel::ocl::ref_resampling_fwd_t)
GPU_INSTANCE_NVIDIA(nvidia::cudnn_resampling_fwd_t)
GPU_INSTANCE_INTEL(intel::ocl::ref_resampling_fwd_t)
GPU_INSTANCE_GENERIC_SYCL(generic::sycl::ref_resampling_fwd_t)
nullptr,
}},
{{backward}, REG_BWD_PK({
GPU_INSTANCE_INTEL(intel::ocl::vectorized_resampling_bwd_t)
GPU_INSTANCE_INTEL(intel::ocl::ref_resampling_bwd_t)
GPU_INSTANCE_NVIDIA(nvidia::cudnn_resampling_bwd_t)
GPU_INSTANCE_INTEL(intel::ocl::ref_resampling_bwd_t)
GPU_INSTANCE_GENERIC_SYCL(generic::sycl::ref_resampling_bwd_t)
nullptr,
})},
Expand Down
37 changes: 0 additions & 37 deletions src/gpu/nvidia/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -350,43 +350,6 @@ GPU:
* Forward pass supports `f32`, `f16`, `bf16` and `s8` data types.
* Backward pass supports `f32` and `bf16` data types.

### Resampling

#### Using cuDNN

The `cudnnSpatialTfSamplerForward` and `cudnnSpatialTfSamplerBackward` are used
to implement the resampling primitive.

The Nvidia's spatial sampling is based on
[Spacial Transformer Network](https://papers.nips.cc/paper/5854-spatial-transformer-networks.pdf)
where all the data locations are normalized between `-1 <= (xi, yi) <= 1`.

* cuDNN backend requires a grid of coordinates that can be sample-up/down based
on `theta`. The grid is generated by `cudnnSpatialTfGridGeneratorForward`.
* The `theta` is a `MB * 2 * 3` matrix scaling factor for each coordinate and is
used to generate the grid.
* The grid value must be normalized in range [-1 , 1]. cuDNN clamps the out of
bounds coordinate to zero. Therefore, it is needed to manually clamp the out
of bound coordinate to edges in order to avoid incorrect result.
* 3D spatial sampling is not supported in cuDNN.
* `Nearest neighbour` algorithm is not supported in cuDNN.
* Since cuDNN computation is different from that of oneDNN, the error threshold
is smaller than other oneDNN implementation, so reduced testing accuracy for
`fp32` and `fp16` data types are required.
* The backward pass requires an output parameter for `d_grid` which cannot be
`nullptr`. However, since the grid coordinates are not a tunable parameter in
oneDNN, a dummy memory for `d_grid` is created and is deleted when the
destructor of the primitive is called.

##### Forward Direction
* Supported data types: `f32`, `bf16`, `f16`, `s8`, `u8` and `s32`
* Supported post-ops: `sum`, `eltwise`, `binary`
* Supported algorithms: nearest neighbor, bilinear

##### Backward Direction
* Supported data types: `f32`, `bf16`
* Supported algorithms: nearest neighbor, bilinear

### Softmax/LogSoftmax

#### Using cuDNN
Expand Down
95 changes: 0 additions & 95 deletions src/gpu/nvidia/cudnn_resampling.cpp

This file was deleted.

Loading

0 comments on commit 40c1476

Please sign in to comment.