Skip to content

Commit

Permalink
moved quantize error documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
edwardhartnett committed Jun 23, 2022
1 parent 68b0d50 commit e45292f
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 85 deletions.
85 changes: 0 additions & 85 deletions docs/filters.md
Original file line number Diff line number Diff line change
Expand Up @@ -617,91 +617,6 @@ As part of its testing, the NetCDF build process creates a number of shared libr
If you need a filter from that set, you may be able to set *HDF5\_PLUGIN\_PATH*
to point to that directory or you may be able to copy the shared libraries out of that directory to your own location.

# Lossy One-Way Filters

As of NetCDF version 4.8.2, the netcdf-c library supports
bit-grooming filters.
````
Bit-grooming is a lossy compression algorithm that removes the
bloat due to false-precision, those bits and bytes beyond the
meaningful precision of the data. Bit Grooming is statistically
unbiased, applies to all floating point numbers, and is easy to
use. Bit-Grooming reduces data storage requirements by
25-80%. Unlike its best-known competitor Linear Packing, Bit
Grooming imposes no software overhead on users, and guarantees
its precision throughout the whole floating point range
[https://doi.org/10.5194/gmd-9-3199-2016].
````
The generic term "quantize" is used to refer collectively to the various
bitgroom algorithms. The key thing to note about quantization is that
it occurs at the point of writing of data only. Since its output is
legal data, it does not need to be "de-quantized" when the data is read.
Because of this, quantization is not part of the standard filter
mechanism and has a separate API.

The API for bit-groom is currently as follows.
````
int nc_def_var_quantize(int ncid, int varid, int quantize_mode, int nsd);
int nc_inq_var_quantize(int ncid, int varid, int *quantize_modep, int *nsdp);
````
The *quantize_mode* argument specifies the particular algorithm.
Currently, three are supported: NC_QUANTIZE_BITGROOM, NC_QUANTIZE_GRANULARBR,
and NC_QUANTIZE_BITROUND. In addition quantization can be disabled using
the value NC_NOQUANTIZE.

The input to ncgen or the output from ncdump supports special attributes
to indicate if quantization was applied to a given variable.
These attributes have the following form.
````
_QuantizeBitGroomNumberOfSignificantDigits = <NSD>
or
_QuantizeGranularBitRoundNumberOfSignificantDigits = <NSD>
or
_QuantizeBitRoundNumberOfSignificantBits = <NSB>
````
The value NSD is the number of significant (decimal) digits to keep.
The value NSB is the number of significant bits to keep.

## Distortions introduced by lossy filters

Any lossy filter introduces distortions to data.
The lossy filters implemented in netcdf-c introduce a distortoin
that can be quantified in terms of a _relative_ error. The magnitude of
distortion introduced to every single value V is guaranteed to be within
a certain fraction of V, expressed as 0.5 * V * 2**{-NSB}:
i.e. it is 0.5V for NSB=0, 0.25V for NSB=1, 0.125V for NSB=2 etc.


Two other methods use different definitions of _decimal precision_, though both
are guaranteed to reproduce NSD decimals when printed.
The margin for a relative error introduced by the methods are summarised in the table

```
NSD 1 2 3 4 5 6 7
BitGroom
Error Margin 3.1e-2 3.9e-3 4.9e-4 3.1e-5 3.8e-6 4.7e-7 -
GranularBitRound
Error Margin 1.4e-1 1.9e-2 2.2e-3 1.4e-4 1.8e-5 2.2e-6 -
```


If one defines decimal precision as in BitGroom, i.e. the introduced relative
error must not exceed half of the unit at the decimal place NSD in the
worst-case scenario, the following values of NSB should be used for BitRound:

```
NSD 1 2 3 4 5 6 7
NSB 3 6 9 13 16 19 23
```

The resulting application of BitRound is as fast as BitGroom, and is free from
artifacts in multipoint statistics introduced by BitGroom
(see https://doi.org/10.5194/gmd-14-377-2021).


# Debugging {#filters_debug}

Depending on the debugger one uses, debugging plugins can be very difficult.
Expand Down
42 changes: 42 additions & 0 deletions docs/quantize.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,48 @@ changed, if that element is the fill value. This is necessary if the
fill value is to retain its purpose as an indicator of values that
have not been written.

## Distortions Introduced by Lossy Compression

Any lossy compression introduces distortions to data.

The Bitgroom algorithms implemented in netcdf-c introduce a distortoin
that can be quantified in terms of a _relative_ error. The magnitude
of distortion introduced to every single value V is guaranteed to be
within a certain fraction of V, expressed as 0.5 * V * 2**{-NSB}:
i.e. it is 0.5V for NSB=0, 0.25V for NSB=1, 0.125V for NSB=2 etc.

Two quantize algorithms use different definitions of _decimal
precision_, though both are guaranteed to reproduce NSD decimals when
printed.

The margin for a relative error introduced by the methods are
summarised in the table:

```
NSD 1 2 3 4 5 6 7
BitGroom
Error Margin 3.1e-2 3.9e-3 4.9e-4 3.1e-5 3.8e-6 4.7e-7 -
GranularBitRound
Error Margin 1.4e-1 1.9e-2 2.2e-3 1.4e-4 1.8e-5 2.2e-6 -
```

If one defines decimal precision as in BitGroom, i.e. the introduced
relative error must not exceed half of the unit at the decimal place
NSD in the worst-case scenario, the following values of NSB should be
used for BitRound:

```
NSD 1 2 3 4 5 6 7
NSB 3 6 9 13 16 19 23
```

The resulting application of BitRound is as fast as BitGroom, and is
free from artifacts in multipoint statistics introduced by BitGroom
(see https://doi.org/10.5194/gmd-14-377-2021).

## Using the Quantize Feature

Turning on the quantize feature must be done on a per-variable basis,
Expand Down

0 comments on commit e45292f

Please sign in to comment.