Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method(s) to support larger EKOs #244

Merged
merged 40 commits into from
Mar 4, 2024
Merged

Add method(s) to support larger EKOs #244

merged 40 commits into from
Mar 4, 2024

Conversation

cschwan
Copy link
Contributor

@cschwan cschwan commented Oct 10, 2023

TODO list:

  • add Python interfaces: started in commits 90d9efd and 1e417a0
  • test EKOs with scale-varied factorization scale (see commit 082a764): implemented in commit 5ae6f9c
  • move code in mod eko to EKO
  • write test case for migrated Grid::evolve: implemented in commit 0292ac8
  • deprecate Grid::evolve: commit 9124624
  • check whether for all PIDs used in the grid there's a non-zero operator; if not, warn (photon contributions from grids are often neglected)

@cschwan cschwan self-assigned this Oct 10, 2023
@cschwan cschwan linked an issue Oct 10, 2023 that may be closed by this pull request
@cschwan
Copy link
Contributor Author

cschwan commented Oct 10, 2023

Commit 53acd7f implements a first candidate for Grid::evolve_slice. The idea is first to reuse as much code as possible, write tests, then migrate Grid::evolve to use Grid::evolve and finally improve performance.

@cschwan
Copy link
Contributor Author

cschwan commented Oct 12, 2023

@alecandido @felixhekhorn please have a look at https://github.com/NNPDF/pineappl/pull/244/files#diff-b5009bd7e6df3b2ec6c5ae2bca2a3e23ae3402990d34a878da2f81f9ee21bd56R71-R255 which introduces two new types, EkoSlices and EkoSlicesIter.

You first

  1. create an EkoSlices object by calling EkoSlices::new() with the corresponding path for the tarball and additional information (renormalization scales, alphas and scale-variation factors). This reads just general information and doesn't yet read operators or slices of them.
  2. Then you call EkoSlices::iter_mut() on your object, which creates an Iterator that goes through the operator slices one after another. For each slice the iterator produces a tuple of OperatorSliceInfo and Array4<f64>, the corresponding operator slice.
  3. This is meant to be used with the newly created Grid::evolve_slice, which as stated above, is very similar to Grid::evolve except it only evolves the slice of the Grid that matches the factorization scale of the operator.
  4. By iterating through all slices, and checking that every slice of the Grid has been evolved, we reproduce what Grid::evolve does, but without having the entire 5-dimensional operator in memory.

I think these two types best belong into NNPDF/eko#260, but here's a good playground to test them. I'm not completely convinced of the names, if you have better suggestions let me know.

I've already written a modified version of pineappl evolve and confirmed that it's working in the one case I tested. However, I still want to support the older EKO formats; that's currently the missing bit.

@cschwan
Copy link
Contributor Author

cschwan commented Oct 12, 2023

In commits c8e85f8, 7935cea and 5d1a4e9 I added backwards compatibility, in the sense that previous file formats are now also supported.

Since commit 695e0d5 Grid::evolve_slice now also works for DIS grids, and since commit b94ceed pineappl evolve uses Grid:evolve_slice instead of Grid::evolve. Have a look at the function evolve_grid in pineappl_cli/src/evolve.rs to understand how Pineko would need to be changed.

It seems to work quite well already, but the evolution of E906 fails - some bins in the evolved grids are zero. I will investigate this tomorrow.

Comment on lines 69 to +71
V0(MetadataV0),
V1(MetadataV1),
V2(MetadataV2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please change these names and use, e.g., V012, V013 ? since eventually we may have an offical eko data version (if we find a way to stabilize 🙈 ): https://github.com/NNPDF/eko/blob/master/src/eko/version.py

Copy link
Contributor Author

@cschwan cschwan Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely! You just need to tell me which ones they are. V2 is from v0.13, V1 already has the sliced operators, but has a different metadata.yaml format and V0 stores one big operator. See also the EKOs PineAPPL uses for testing: https://data.nnpdf.science/pineappl/test-data/.

@cschwan
Copy link
Contributor Author

cschwan commented Oct 13, 2023

Commit b7872a7 fixed the evolution test with E906, for which the grid was the 'culprit'; it had some duplicated bins with the same bin limits. See also #237.

@cschwan
Copy link
Contributor Author

cschwan commented Oct 13, 2023

Now that we have an Iterator over EKO slices, we can finally write the method that we wanted from the start:

impl Grid {
    pub fn evolve_with_slices<'a>(
        &self,
        eko_slices: impl IntoIterator<Item = (OperatorSliceInfo, ArrayView4<'a, f64>)>,
        order_mask: &[bool],
    ) -> Result<FkTable, GridError> {
        // implementation
    }
}

Instead of a callback we have an Iterator, of course. We can then replace Grid::evolve with Grid::evolve_with_slices, where we pass the EkoSlices object and the order mask to it. To make it work in Python we probably have to replace impl IntoIterator<...> with the actual type (EkoSlices).

Instead of using a cartesian product of two PIDs which is subsequently
filtered use two iterators of PIDs first filtered and subsequently
zipped. This is a change of O(n^2) to O(n) which makes a big difference
if n is large enough. This is the case for dijet grids, for example.
@cschwan
Copy link
Contributor Author

cschwan commented Oct 14, 2023

Before commit ef962fa, evolving CMS_2JET_7TEV.pineappl.lz4 with its 2.1-GByte-sized EKO on my laptop takes

real    95m34.725s
user    95m32.242s
sys     0m1.605s

and after it

real    28m26.322s
user    28m23.897s
sys     0m1.542s

That's an improvement by a factor of more than three!

Profiling shows that roughly 72% of the CPU cycles is spent on dot and another 15% on scaled_add, both in this line:

fk_table.scaled_add(factor, &opa.dot(&array.dot(&opb.t())));

that's pretty good as well; we want the time to be spent on doing the linear algebra, not on the rest of the code, which is only organizational.

For this grid memory consumption is less than 150 MByte (resident).

@cschwan
Copy link
Contributor Author

cschwan commented Oct 14, 2023

On my desktop PC the numbers are similar:

real    73m18.566s
user    73m17.138s
sys     0m1.340s

vs.

real    20m15.134s
user    20m13.712s
sys     0m1.380s

@cschwan cschwan changed the title Add method Grid::evolve_slice to support larger EKOs Add method(s) to support larger EKOs Oct 24, 2023
@alecandido alecandido removed their request for review January 19, 2024 17:40
@cschwan
Copy link
Contributor Author

cschwan commented Feb 25, 2024

Note to myself: adding a test with scale varied EKOs should be top priority, everything else can be added afterwards.

@cschwan
Copy link
Contributor Author

cschwan commented Mar 3, 2024

In commits d2a4f3d, b3450a5 and 6a86a88 I've removed the variables xir, xif, ren1 and alphas from OperatorSliceInfo, which reflects the changes between the old a new EKO versions. This should make the Rust EKO reader simpler.

@cschwan
Copy link
Contributor Author

cschwan commented Mar 4, 2024

Commits 90d9efd and 1e417a0 add a first Python interface for Grid::evolve_with_slice_iter, but I haven't been able to test it. @alecandido @felixhekhorn, if you'd like to go ahead with this please let me know, otherwise I will do it in a few days.

@alecandido
Copy link
Member

I'm not going to test myself in the short term, but thanks for this.

However, I still believe it's more promising to avoid passing arrays through the Python interface (as much as possible, of course). So, I'd rather invest in driving the Rust loader (differently from grids, the EKO is on disk, and this is all happening to avoid loading it entirely - so, communicating through files is not a limitation).

@cschwan cschwan merged commit 20ccea6 into master Mar 4, 2024
7 checks passed
@cschwan cschwan deleted the grid-evolve-slice branch March 4, 2024 15:50
@cschwan
Copy link
Contributor Author

cschwan commented Mar 4, 2024

I've just merged this branch. The Python interface will be added in a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evolution with large EKOs
3 participants