Reduce large floating point accumulation error in high photon simulations #41

fangq · 2018-07-20T23:56:36Z

MCX has been using atomic operations for fluence accumulation by default since a few years ago. However, a drop in fluence intensity in large photon simulations has been observed. For example, running the below script using the current MCX github code, you can get the below plot

clear cfg
cfg.vol=uint8(ones(60,60,60));
cfg.srcpos=[30 30 1];
cfg.srcdir=[0 0 1];
cfg.gpuid=1;
cfg.autopilot=1;
cfg.prop=[0 0 1 1;0.005 1 0 1.37];
cfg.tstart=0;
cfg.tend=5e-9;
cfg.tstep=cfg.tend;

figure
hold on;
for i=6:9
    cfg.nphoton=10^i;
    flux=mcxlab(cfg);
    plot(log10(abs(squeeze(flux.data(30,30,:)))),'-','color', rand(1,3));
end

The reason for the drop in intensity was not due to data racing, like the case when non-atomic operations were used, but the accumulations of the round-off errors. In the region near the source, the energy deposit quickly increases to a large value. When adding a new energy deposit (which is a very small value) on top of a large value, the accuracy becomes a problem.

This is a serious problem because, with the increase of GPU computing capacity, most people would choose to run large photon simulations. We must be able to run large photon numbers without loosing accuracy.

There are a few solutions to such problem.

The easiest solution is to change the energy storage to double. However, consumer GPUs have extremely poor double performance, so moving to double precision addition can likely lead to drop in speed.

The standard way to sum a small values with a large floating point value is the Kahan summation. This is what we used in MMC. However, this requires multiple step operations with additional storage space. When combining with the atomic operation, atomic Kahan summation is very difficult to be implemented in the GPU.

Another idea is to use repetitions (-r) to split a large simulation into smaller chunks, and sum the solutions together. For example, for 1e9 photons with 10 respin, we run 10x 10^8 photon simulations. This can reduce the round-off error, but the repeated launch of the kernel causes a large overhead, sometimes, significantly higher than the kernel execution itself. In addition, even simulate at 1e8 photons, from the above plot, the drop in intensity remains noticeable.

A robust method is needed to obtain stable and converging solution especially at large photon numbers.

The text was updated successfully, but these errors were encountered:

fangq · 2018-07-21T00:04:44Z

This issue is now fixed. After the fix, the result can be seen at

the simulation speed is nearly the same (or only 1-2% drop). With this fix, I can finally release mcx 1.0 without major concerns :-)

fix photon sharing normalization and issue #41 for WP/DCS output

fix photon sharing normalization and issue fangq#41 for WP/DCS output

…hijieYan

…hijieYan

add double-buffer to solve fangq/mcx#41

fangq self-assigned this Jul 20, 2018

fangq added the bug label Jul 20, 2018

fangq modified the milestones: MCX Dark Matter (1.0) Beta, MCX Dark Matter (1.0) Final Jul 20, 2018

fangq closed this as completed in db5a349 Jul 20, 2018

fangq added a commit that referenced this issue Jul 21, 2018

support photon numbers over 2^31, new limit 2^63-1 after fixing #41

7850c32

ShijieYan added a commit to ShijieYan/mcx that referenced this issue Jan 14, 2019

fix photon sharing normalization and issue fangq#41 for WP/DCS output

d66aebd

fangq added a commit that referenced this issue Jan 15, 2019

Merge pull request #59 from ShijieYan/etherdome

139bad8

fix photon sharing normalization and issue #41 for WP/DCS output

fangq added a commit that referenced this issue Feb 21, 2019

use double precision for data output,alternative to #41, for sm_60

ce1abbe

jdtatz pushed a commit to jdtatz/mcx that referenced this issue Jul 15, 2020

use extra global memory to help reduce roundoff error, fix fangq#41

93188f1

jdtatz pushed a commit to jdtatz/mcx that referenced this issue Jul 15, 2020

support photon numbers over 2^31, new limit 2^63-1 after fixing fangq#41

51d8f32

jdtatz pushed a commit to jdtatz/mcx that referenced this issue Jul 15, 2020

fix photon sharing normalization and issue fangq#41 for WP/DCS output

9861c28

jdtatz pushed a commit to jdtatz/mcx that referenced this issue Jul 15, 2020

Merge pull request fangq#59 from ShijieYan/etherdome

77da1c8

fix photon sharing normalization and issue fangq#41 for WP/DCS output

jdtatz pushed a commit to jdtatz/mcx that referenced this issue Jul 15, 2020

use double precision for data output,alternative to fangq#41, for sm_60

5749e52

fangq mentioned this issue Dec 8, 2021

Estimation of the pathlengh #132

Closed

fangq added a commit to fangq/mcxcl that referenced this issue Sep 21, 2022

making double-buffer finally work to solve fangq/mcx#41, thanks to @S…

3e71eac

…hijieYan

ShijieYan added a commit to ShijieYan/mmc that referenced this issue Sep 21, 2022

add double-buffer to solve fangq/mcx#41

e661011

fangq added a commit to fangq/mmc that referenced this issue Sep 21, 2022

Merge pull request #80 from ShijieYan/master

4e1999e

add double-buffer to solve fangq/mcx#41

fangq mentioned this issue Oct 31, 2023

Failed to apply double-float-buffer approach in #41 to dref cause loss of accuracy #195

Closed

fangq added a commit that referenced this issue Oct 31, 2023

[bug] apply #41 like 2xfloat-buffer for dref accumulation, fix #195

9220578

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce large floating point accumulation error in high photon simulations #41

Reduce large floating point accumulation error in high photon simulations #41

fangq commented Jul 20, 2018 •

edited

Loading

fangq commented Jul 21, 2018

Reduce large floating point accumulation error in high photon simulations #41

Reduce large floating point accumulation error in high photon simulations #41

Comments

fangq commented Jul 20, 2018 • edited Loading

fangq commented Jul 21, 2018

fangq commented Jul 20, 2018 •

edited

Loading