Skip to content

Developer Notes

Mo Tiwari edited this page Feb 19, 2023 · 17 revisions

Welcome to the BanditPAM wiki!

This is a space for code contributors to keep track of notes and learnings that don't belong in Github issues.

Highly Requested Features that Mo won't have time to work on:

  • An R implementation of BanditPAM
  • An MATLAB implementation of BanditPAM

Less Requested Features that Mo won't have the time to work on:

  • An integration with PySpark

Gotchas:

  • setuptools will always, at least partly, use the compiler that Python was compiled with. This causes a problem, e.g., when trying to install clang-compiled BanditPAM on gcc-compiled Python and was resulting in errors. This CANNOT be fixed by modifying the CC environment variable. See https://github.com/pypa/setuptools/issues/1732
  • You may occasionally get a bug like (Producer: 'LLVM13.0.0' Reader: 'LLVM 12.0.0'); somehow this was the case in base after uninstalling and reinstalling some brew packages. Weirdly, it was resolved by creating a new Python 3.8 conda environment, in which BanditPAM could be installed successfully, and then somehow (?!) fixed in base
  • Building the PyPy wheels on MacOS via cibuildwheel does not work properly; see install_mac.md. We get an error in the Github Actions like the one below. I separately tried adding this gist to the .yml, as well as this suggestion, but neither worked. A future possibility is to a) upgrade the Accelerate framework on the runner, b) avoid using the Accelerate framework for the PyPy builds, c) try a version of macos on the runner that's later than macos 10.15 (but this might hurt backwards compatibility), d) suggestions from here like python -mpip install numpy, or e) try to modify the PyPy build's numpy installation once it has been instantiated
RuntimeError: Polyfit sanity test emitted a warning, most likely due to using a buggy Accelerate backend. If you compiled yourself, more information is available at https://numpy.org/doc/stable/user/building.html#accelerated-blas-lapack-libraries Otherwise report this to the vendor that provided NumPy.
    RankWarning: Polyfit may be poorly conditioned
  • It appears that CPython >= 3.10 is compiled with clang in cibuildwheel, whereas CPython <= 3.9 is compiled with gcc. This affects how libraries like omp vs. gomp should be linked.

Potential Cache Improvements:

  • potentially transpose cache to avoid false sharing
  • Move to multi-producer single-consumer queue for cache so that cache can be dynamically resized
  • Give each thread a local copy of cache
  • Helpful resource: Lecture 9 of series in OMP

Potential OpenMP Improvements:

  • Good practice to have default(none) inside all omp parallel workspace constructs
  • Prevent false sharing among threads for better speedups (This is dependent on local cache line size and datatype sizes)
  • Consider using loop reductions via OpenMP

Github actions

  • Right now, we compile with system python on the MacOS Github runners. It appears to work, though I'm not sure if the runners are using gcc or clang -- or if it matters, since the setup.py should detect it properly.

Potential frameworks to investigate:

C++ frameworks to investigate:

  • Eigen (pybind11 supports it out of the box, and we will likely no longer need carma or armadillo)
  • Boost
  • Folly