partially update documentation

fangq · Oct 29, 2019 · 4d7d94f · 4d7d94f
1 parent a9c4732
commit 4d7d94f
Show file tree

Hide file tree

Showing 5 changed files with 216 additions and 49 deletions.
diff --git a/ChangeLog.txt b/ChangeLog.txt
@@ -1,5 +1,138 @@
 Change Log
 
+== MMC 1.9 (v2019.10, Moon Cake - alpha), Qianqian Fang <q.fang at neu.edu> ==
+
+ 2019-10-29 [a9c4732] rename mmcl executable to mmc
+ 2019-10-29*[0c3f19f] Merge branch 'mmcl' to 'master', now mmcl is official!
+ 2019-10-27 [206bedf] fix skinvessel example
+ 2019-10-18 [a3823b8] add the missing -d 1 flag
+ 2019-10-14 [4940761] fix end-of-line markers
+ 2019-10-14 [2139721] fix example file permissions and end-of-line markers
+ 2019-10-12 [373459a] remove commented lines
+ 2019-10-12 [e321340] add initial element search for wide-field sources; update mmcl examples
+ 2019-10-11 [8e6f561] bug fixing; update mmclab examples
+ 2019-10-11*[1a8e81f] bug fixing; add support of surface diffuse reflectance for mmcl
+ 2019-10-07 [3262326] resolve some compiling issues,e.g. missing argument in functions; missing fields in data struct
+ 2019-10-07 [261fe69] manually resolve merge conflicts
+ 2019-09-09 [9a2ad2a] download colin27 mesh from github instead
+ 2019-08-31 [7e8ad7a] fix .mch file header due to wrong history data structure
+ 2019-08-24 [4dc1228] fix memory crash due to wrong output data length for plucker, havel & badouel ray-tracer when basisorder is 0
+ 2019-08-20 [ca4d675] allow photons that exit into 0-label elements to be detected
+ 2019-08-20 [26477c7] add gpu parameter specifier to make RGA happy
+ 2019-07-26*[9e800e0] fix output detected photon information for SSE-MMCL and GPU-MMCL
+ 2019-07-25 [a3b7714] fix maximum time gate rounding bug
+ 2019-07-24 [eb109e0] return detected photon info in mmclabcl,print progress bar
+ 2019-07-18 [7995941] compile on new mac
+ 2019-07-18 [9f50a7f] hacky workaround to avoid convert_float error for -1 returned by vectorized isgreater on Intel GPU
+ 2019-07-16 [ef0ef3f] use mmclab('gpuinfo') to query gpu devices
+ 2019-07-16 [a45f5a1] undo the revert
+ 2019-07-16 [cfe52b4] fix rng bug on mac
+ 2019-07-16 [72ce3a2] fix RNG error for SSE MMC on windows - long is 32bit on windows
+ 2019-07-16 [c242112] long is only 32bit on windows, fix incorrect mmc results
+ 2019-07-15 [c8c1cb9] Merge branch 'master' into mmcx
+ 2019-07-12 [04565c2] compile for mac with static gcc and gomp
+ 2019-07-12 [afcfda1] mac opencl does not accept more than 8 constant inputs
+ 2019-07-12 [57880f1] allow to compile on windows
+ 2019-07-12 [1e5455d] changes to compile on mac
+ 2019-07-12 [3621fc9] make mmcl compile on mac
+ 2019-07-12 [167ec74] output oct file with correct name
+ 2019-07-12 [95f65bf] disable dref demo as mmcl has not fully merged with master
+ 2019-07-08 [9e622d5] fix index issue for branchless ray-tracer 0-basisorder
+ 2019-07-05 [23f1159] merge with master
+ 2019-07-04 [a78760b] fix normalization indexing bug
+ 2019-07-03 [522e21b] add matlab scripts to create plots for the paper, paper ready to submit
+ 2019-07-02 [7779235] change line color
+ 2019-07-02 [9fc9d0a] revert the mua change made yesterday for dmmc, thanks to Shijie
+ 2019-07-01 [5f58a5f] update benchmark 4, correct alignment in benchmark 1
+ 2019-07-01 [d3ebb41] change prefix in mmclab printing
+ 2019-07-01 [a36c408] update run benchmark script
+ 2019-07-01 [e1e567f] update mmcl bench mmclab script
+ 2019-07-01 [f5aeac9] add benchmark scripts for mmcl
+ 2019-07-01 [c1ec5f1] group 1/mua to normalization
+ 2019-06-30 [5a29aa0] fix double summation and oldidx bug in method=elem
+ 2019-06-30*[4d9013f] mmclabcl is working
+ 2019-06-29 [acba8db] save nii for non-grid ray-tracers
+ 2019-06-29 [fe9c83d] add b2 run_mmc script
+ 2019-06-29 [727e89d] change b2 mesh
+ 2019-06-28 [e537021] further update benchmark script
+ 2019-06-28*[5b7e840] benchmark script to run on different host
+ 2019-06-28 [3fb9a9c] fix param priority from command line
+ 2019-06-28 [5325e53] fix cl build error
+ 2019-06-28 [4fd89e3] make dual-mode mmc again, remove unneeded registers
+ 2019-06-28 [ad13d2a] revert code back to 03/28 version
+ 2019-06-28 [fba4b95] update benchmark script
+ 2019-06-27 [420c0d4] b3 test script
+ 2019-06-27 [d2f193f] update script
+ 2019-06-27 [8b7e3ff] save output to bin
+ 2019-06-27 [827b30d] add mmcl benchmark master script
+ 2019-06-27 [d87e999] reduce colin27 photon
+ 2019-06-27 [7f1e3ee] add spherical_shell demo
+ 2019-06-27 [b971de3] add DMMC paper figure 1 mmclab demo script
+ 2019-06-27 [f1876ba] add dmmc example mesh file
+ 2019-06-27 [422a9da] add skinvessel mmc and dmmc example
+ 2019-06-27 [02f208a] add mmc2json script to convert mmclab cfg to json
+ 2019-06-27 [f0ff69b] fix the ray-tracer after dref related changes in the master branch
+ 2019-06-26 [454de4f] update code variant name
+ 2019-06-26*[d81fd9a] dualmode mmc - support both SSE4+cpu (-G -1) and CPU, rename to mmcl
+ 2019-06-26 [4c65a21] merge with the latest master branch
+ 2019-06-26 [90d0d20] fix outputtype=fluence and wp output, fix #36
+ 2019-06-26 [45d3711] make mmc functions compatible with mcx output
+ 2019-05-24 [0494f23] 2nd attempt to fix the reflection when mirror bc is used
+ 2019-05-24 [a7ac195] allow internal reflections when mirror bc is set
+ 2019-05-16*[24e2bb0] use isreflect=2 for total absorption on outer surf, 3 for perfect mirror
+ 2019-04-30 [a2dda44] allow point sources to use initial elements
+ 2019-04-30 [cbdcb1e] avoid initial elem search in cone/arcsine source launch
+ 2019-04-24 [caa0a65] fix output format in both dmmc and mmc mode
+ 2019-04-24 [cb93d6c] remove mac compilation error
+ 2019-04-22 [565ad68] fix bugs and finally get diffuse reflectance output to work
+ 2019-04-21 [5b79f08] save diffuse reflectance on surface
+ 2019-04-21 [44d1865] copy initial test from plucker to other 3 ray tracers
+ 2019-04-21 [ed8dbdd] saving dref on surface, support saveref option, feature incomplete
+ 2019-04-17 [f3623ff] minor update to function parameter type
+ 2019-04-16 [492aa3e] restore the capability to save mch files
+ 2019-04-02 [6407e74] update loadmch to support user defined output
+ 2019-03-28 [5312319] use DO_NOT_SAVE flag to remove memory operations
+ 2019-03-26 [22fe07f] remove unused variables
+ 2019-03-26 [eeccd6d] fix a bug found by Shijie
+ 2019-03-25 [fb51a41] fix the missing energy loss for the first step in new voxel
+ 2019-03-24 [78622ef] merge with master from github
+ 2019-03-24 [2508642] first step to make mmc cl kernel cuda compatible
+ 2019-03-23 [0e0f96f] fix bug in writing compression
+ 2019-03-23 [752753d] use atomic call to return raytet counts
+ 2019-03-23 [3676918] disable buffer fanning in the kernel
+ 2019-03-23 [3f52cb2] only write to memory when moving out of a voxel
+ 2019-03-23 [370e91d] use macro to dynamically select dmmc vs elem-mmc
+ 2019-03-21 [0b647e7] fix export data length
+ 2019-03-21 [109910e] use a better random number to distribute the writing location
+ 2019-03-21 [1b9d645] use --buffer to set copy of memory to reduce racing
+ 2019-03-21 [2f2aec3] matching the branchless badouel algorithm in mmc, thanks to Shijie
+ 2019-03-21 [0477ade] avoid mac compiler error
+ 2019-03-21 [9f5eaf9] fix a critical bug for dmmc ray-tracer
+ 2019-03-21 [c6c0721] disable volume saving if --save2pt is set to 0
+ 2019-03-21 [bc67cc8] fix incorrect results on AMD devices
+ 2019-03-21 [4ec96ed] update printed program name
+ 2019-03-20 [5dd80a1] no need to convert char lookup in string
+ 2019-03-20 [cfaf43b] merge with mmc v2019.3 master branch
+ 2019-03-20 [4efb7b7] fix OpenCL-precision-induced ray-tracing accuracy issue in Branchless-badouel ray-tracer
+ 2019-03-05 [ae6de41] update change log and README for v2019.3 release
+ 2019-03-05 [10f72ea] support mc2 and nii output for DMMC
+ 2019-03-04 [f403303] disable linking with iomp5 to avoid crash in older matlab
+ 2019-03-01 [996f765] add USC 19.5 atlas example, Fig9a in TranYan2019(submitted)
+ 2019-02-11 [8acc8d7] really reduce register count, fix DMMC output crash
+ 2019-02-10*[61ef773] fix dmmc, 5x speed increase from normal mmc
+ 2019-02-10 [2417416] fix infinite loop, thanks Shijie!
+ 2019-02-10 [5ef4499] return total ray-tet intersection counts
+ 2019-02-09 [7fab7dd] moving node,elem,type,facenb,normal,srcelem to constant mem
+ 2019-02-09 [ec0e183] optimized based on vtune profiling on intel cpu
+ 2019-02-09 [84e87e4] add xorshift128+ RNG, seed each thread by host RNG
+ 2019-02-06 [fe76503] convert output weight to double
+ 2019-01-31*[5c571cf] now can run on cuda and cpu
+ 2019-01-30 [29ae3e1] need debugging, but very close to bug free for the ray-tracing
+ 2019-01-27 [75fbcde] mmcx now can compile, no error
+ 2019-01-18 [53fda23] fix mmclab crash due to racing in multi-thread, similar to mcx issue #60
+ 2019-01-17 [2f61bfe] a very rough draft of the cl kernel, converted ray-tracer from SIMD to float3
+ 2019-01-14*[43aae8f] sync internal mmcx branch with master, mmcx branch was started in 2018
+
 == MMC 1.4.8-2 (v2019.4, Pork Rinds - beta, update 2), Qianqian Fang <q.fang at neu.edu> ==
 
  2019-04-24 [8270b96] fix #35 - incorect mch file header in photon-sharing implementation

diff --git a/README.txt b/README.txt
@@ -1,11 +1,11 @@
 ===============================================================================
 =                       Mesh-based Monte Carlo (MMC)                          =
-=                     Multi-threaded Edition with SSE4                        =
+=            Supporting both OpenCL and Multi-threading with SSE4             =
 ===============================================================================
 
 Author:  Qianqian Fang <q.fang at neu.edu>
 License: GNU General Public License version 3 (GPL v3), see License.txt
-Version: 1.4.8-2 (v2019.4, Pork Rinds - beta, update 2)
+Version: 1.9 (v2019.10, Moon Cake - alpha)
 URL:     http://mcx.space/mmc
 
 -------------------------------------------------------------------------------
@@ -27,40 +27,40 @@ VIII.Reference
 
 O.    What's New
 
-In MMC v2019.4 (1.4.8-2), the follow feature was added
+MMC v2019.10 (1.9) is a major update to MMC. For the first time, MMC adds
+GPU support via the newly implemented OpenCL version. The released package
+simultaneously supports CPU-only multi-threading with SSE4 (standard MMC)
+and OpenCL-based MMC on a wide variety of CPU/GPU devices across vendors.
+Using up-to-date GPU hardware, the MMC simulation speed was increased by
+100x to 400x compared to single-threaded SSE4-based MMC simulation. The detailed
+description of the GPU accelerated MMC can be found in the below in-press
+paper [Fang2019] and its preprint online.
 
-* Support -X/--saveref to save diffuse reflectance/transmittance on mesh surface
-* Speed up DMMC memory operations
+One can choose between the SSE4 and OpenCL based simulation modes using
+the -G or cfg.gpuid input options. A device ID of -1 enables SSE4 CPU based
+MMC, and a number 1 or above chooses the supported OpenCL device (using 
+"mmc -L" or "mmclab('gpuinfo')" to list).
 
-It also fixed the below critical bugs:
+A detailed (long) list of updates can be found in the ChangeLog.txt or
+the Github commit history: https://github.com/fangq/mmc/commits/master
 
-* fix #35 - incorect mch file header in photon-sharing implementation
-* restore the capability to save mch files without needing --saveexit 1 
-* for Win64, use a newer version of libgomp-1.dll to run mmclab without dependency errors
+To highlight a few most important updates:
 
+* Supported GPU using OpenCL in both binary and mmclab
+* GPU MMC (or MMCL) had been rigirously validated across a range of benchmarks
+* Charactrized the speed improvement of MMCL simulations over standard MMC
+* Created "mmc" and "octave-mmclab" official Fedora packages and disseminate via Fedora repositories
+* Implemented xorshift128+ RNG unit and used as default for both CPU/GPU MMC
+* Fixed a list of bugs in both SSE4/OpenCL MMC
+* Created 6 standard benchmarks (B1:cube60, B1D:d-cube60, B2:sphshells, B2D:d-sphshells, B3:colin27, B4:skin-vessel) for comparisons
 
-Also, in MMC v2019.3 (1.4.8), we added a list of major new additions, including
+Please file bug reports to https://github.com/fangq/mmc/issues
 
-* Add 2 built-in complex domain examples - USC_19-5 brain atlas and mcxyz skin-vessel benchmark
-* Initial support of "photon sharing" - a fast approach to simultaneouly simulate multiple pattern src/det, as detailed in our Photoncs West 2019 talk by Ruoyang Yao/Shijie Yan [Yao&Yan2019]
-* Dual-grid MMC (DMMC) paper published [Yan2019], enabled by "-M G" or cfg.method='grid'
-* Add clang compiler support, nightly build compilation script, colored command line output, and more
+Reference:
 
-In addition, we also fixed a number of critical bugs, such as
-
-* fix mmclab gpuinfo output crash using multiple GPUs
-* disable linking to Intel OMP library (libiomp5) to avoid MATLAB 2016-2017 crash
-* fix a bug for doubling thread number every call to mmc, thanks to Shijie
-* fix mmclab crash due to photo sharing update
-
-'''[Yan2019]''' Shijie Yan, Anh Phong Tran, Qianqian Fang*, "A dual-grid mesh-based\
-Monte Carlo algorithm for efficient photon transport simulations in complex 3-D media,"\
-J. of Biomedical Optics, 24(2), 020503 (2019). URL: https://doi.org/10.1117/1.JBO.24.2.020503
-
-'''[Yao&Yan2019]''' Ruoyang Yao, Shijie Yan, Xavier Intes, Qianqian Fang,  \
-"Accelerating Monte Carlo forward model with structured light illumination via 'photon sharing'," \
-Photonics West 2019, paper#10874-11, San Francisco, CA, USA. \
-[https://www.spiedigitallibrary.org/conference-presentations/10874/108740B/Accelerating-Monte-Carlo-forward-model-with-structured-light-illumination-via/10.1117/12.2510291?SSO=1 Full presentation for our invited talk]
+'''[Fang2019]''' Qianqian Fang* and Shijie Yan, "GPU-accelerated mesh-based \
+Monte Carlo photon transport simulations," J. of Biomedical Optics, in press, 2019. \
+Preprint URL: https://www.biorxiv.org/content/10.1101/815977v1
 
 ------------------------------------------------------------------------------- 
 
@@ -75,9 +75,10 @@ mesh to represent curved boundaries and complex structures, making it
 even more accurate, flexible, and memory efficient. MMC uses the
 state-of-the-art ray-tracing techniques to simulate photon propagation in 
 a mesh space. It has been extensively optimized for excellent computational
-efficiency and portability. MMC currently supports both multi-threaded 
-parallel computing and Single Instruction Multiple Data (SIMD) parallism 
-to maximize performance on a multi-core processor.
+efficiency and portability. MMC currently supports multi-threaded 
+parallel computing via OpenMP, Single Instruction Multiple Data (SIMD) 
+parallism via SSE and, starting from v2019.10, OpenCL to support a wide
+range of CPUs/GPUs from nearly all vendors.
 
 To run an MMC simulation, one has to prepare an FE mesh first to
 discretize the problem domain. Image-based 3D mesh generation has been 
@@ -92,6 +93,13 @@ or even thousand-fold acceleration in speed similar to what we
 have observed in our GPU-accelerated Monte Carlo software (Monte Carlo 
 eXtreme, or MCX [2]).
 
+The most relevant publication describing this work is the GPU-accelerated
+MMC paper:
+
+  Qianqian Fang and Shijie Yan, "GPU-accelerated mesh-based Monte Carlo 
+  photon transport simulations," J. of Biomedical Optics, in press, 2019.
+  Preprint URL: https://www.biorxiv.org/content/10.1101/815977v1
+
 Please keep in mind that MMC is only a partial implementation of the 
 general Mesh-based Monte Carlo Method (MMCM). The limitations and issues
 you observed in the current software will likely be removed in the future
@@ -195,16 +203,16 @@ and type
 
   make
 
-this will create a fully optimized, multi-threaded and SSE4 enabled 
-mmc executable, located under the mmc/src/bin/ folder.
+this will create a fully optimized OpenCL based mmc executable, 
+located under the mmc/src/bin/ folder.
 
 Other compilation options include
 
+  make ssemath  # this uses SSE4 for both vector operations and math functions
   make omp      # this compiles a multi-threaded binary using OpenMP
   make release  # create a single-threaded optimized binary
   make prof     # this makes a binary to produce profiling info for gprof
   make sse      # this uses SSE4 for all vector operations (dot, cross), implies omp
-  make ssemath  # this uses SSE4 for both vector operations and math functions
 
 if you want to generate a portable binary that does not require external 
 library files, you may use (only works for Linux and Windows with gcc)
@@ -290,7 +298,7 @@ same direction. Otherwise, MMC will give incorrect results.
 The full command line options of MMC include the following:
 <pre>
 ###############################################################################
-#                         Mesh-based Monte Carlo (MMC)                        #
+#                     Mesh-based Monte Carlo (MMC) - OpenCL                   #
 #          Copyright (c) 2010-2019 Qianqian Fang <q.fang at neu.edu>          #
 #                            http://mcx.space/#mmc                            #
 #                                                                             #
@@ -299,7 +307,7 @@ The full command line options of MMC include the following:
 #                                                                             #
 #                Research funded by NIH/NIGMS grant R01-GM114365              #
 ###############################################################################
-$Rev::8270b9$2019.4 $Date::2019-04-24 14:18:58 -04$ by $Author::Qianqian Fang $
+$Rev::57e5d6$2019.10$Date::Qianqian Fang          $ by $Author::Qianqian Fang $
 ###############################################################################
 
 usage: mmc <param1> <param2> ...
@@ -321,7 +329,7 @@ where possible parameters include (the first item in [] is the default value)
                                to calculate the mua/mus Jacobian matrices
  -P [0|int]    (--replaydet)   replay only the detected photons from a given 
                                detector (det ID starts from 1), use with -E 
- -M [H|PHBSG] (--method)      choose ray-tracing algorithm (only use 1 letter)
+ -M [G|SG] (--method)      choose ray-tracing algorithm (only use 1 letter)
                                P - Plucker-coordinate ray-tracing algorithm
 			       H - Havel's SSE4 ray-tracing algorithm
 			       B - partial Badouel's method (used by TIM-OS)
@@ -330,6 +338,11 @@ where possible parameters include (the first item in [] is the default value)
  -e [1e-6|float](--minenergy)  minimum energy level to trigger Russian roulette
  -V [0|1]      (--specular)    1 source located in the background,0 inside mesh
  -k [1|0]      (--voidtime)    when src is outside, 1 enables timer inside void
+ -A [0|int]    (--autopilot)   auto thread config:1 enable;0 disable
+ -G [0|int]    (--gpu)         specify which GPU to use, list GPU by -L; 0 auto
+      or
+ -G '1101'     (--gpu)         using multiple devices (1 enable, 0 disable)
+ -W '50,30,20' (--workload)    workload for active devices; normalized by sum
  --atomic [1|0]                1 use atomic operations, 0 use non-atomic ones
 
 == Output options ==
@@ -338,6 +351,7 @@ where possible parameters include (the first item in [] is the default value)
                                J - Jacobian, L - weighted path length, P -
                                weighted scattering count (J,L,P: replay mode)
  -d [0|1]      (--savedet)     1 to save photon info at detectors,0 not to save
+ -H [1000000] (--maxdetphoton) max number of detected photons
  -S [1|0]      (--save2pt)     1 to save the fluence field, 0 do not save
  -x [0|1]      (--saveexit)    1 to save photon exit positions and directions
                                setting -x to 1 also implies setting '-d' to 1