Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added vectorized pak version #72

Merged
merged 8 commits into from
Jul 19, 2022
Merged

added vectorized pak version #72

merged 8 commits into from
Jul 19, 2022

Conversation

diegodoimo
Copy link
Collaborator

I added a ''faster'' version of pak, vectorizing all the for loops and adding a cython version to optimize the likelihood of the full dataset and not just that of a single point. Unfortunately, the performance improvement can be appreciated only for relatively large data sizes:

15% faster for 90k points
50% faster for 250k points

@codecov-commenter
Copy link

codecov-commenter commented Jul 16, 2022

Codecov Report

Merging #72 (17c7f24) into main (f290feb) will decrease coverage by 1.81%.
The diff coverage is 76.19%.

@@            Coverage Diff             @@
##             main      #72      +/-   ##
==========================================
- Coverage   80.61%   78.79%   -1.82%     
==========================================
  Files          10       10              
  Lines        1145     1160      +15     
==========================================
- Hits          923      914       -9     
- Misses        222      246      +24     
Impacted Files Coverage Δ
dadapy/density_estimation.py 76.29% <75.00%> (-0.40%) ⬇️
dadapy/_utils/density_estimation.py 54.83% <76.47%> (-34.96%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f290feb...17c7f24. Read the comment docs.

@AldoGl
Copy link
Collaborator

AldoGl commented Jul 18, 2022

Thanks @diegodoimo this is a change that me and @imacocco had in mind for a long time. Apart from the small code improvements suggested I have the following curiosity: for small datasets do we see a near zero improvement or do we actually see worse results using the full Cython implementation?

@diegodoimo
Copy link
Collaborator Author

diegodoimo commented Jul 18, 2022

Better in any case. It can be easily tested as I left the flag 'optimized' to select the original implementation, when set to false. (I left it as I didn't want to remove a big piece of the code until we are sure the 'optimized' one can be fine)

@diegodoimo diegodoimo merged commit 9cffc40 into main Jul 19, 2022
@diegodoimo diegodoimo deleted the optimization_pak branch July 19, 2022 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants