Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evolution with large EKOs #209

Closed
alecandido opened this issue Feb 22, 2023 · 13 comments · Fixed by #244
Closed

Evolution with large EKOs #209

alecandido opened this issue Feb 22, 2023 · 13 comments · Fixed by #244
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@alecandido
Copy link
Member

alecandido commented Feb 22, 2023

As required in #185 (comment) I'm opening a new issue for the problem.

The problem is not a problem at the moment, since we only have a memory problem with the EKO for the whole NNPDF4.0 data set altogether, but we always split by process. And those who require the biggest EKOs are jets, which require an EKO of ~4GB.

However, at the moment PineAPPL needs to load the whole 4GB or whatever they are. But this is only a problem in PineAPPL, since EKO can break them and load one piece at a time (i.e. one
scale at a time).


Actually, the format is already available. As soon as I will have solved the current "top-priority", I will step in and solve myself, providing a lazy-loader to reduce memory consumption (based on the current EKO loader and the new format).

@alecandido alecandido added the enhancement New feature or request label Feb 22, 2023
@alecandido alecandido self-assigned this Feb 22, 2023
@cschwan
Copy link
Contributor

cschwan commented Feb 22, 2023

Do we have evidence that there really is a problem with Grid::evolve?

@alecandido
Copy link
Member Author

Do we have evidence that there really is a problem with Grid::evolve?

The problem is in the signature:

operator: &ArrayView5<f64>,

If you ask for a rank-5, that's the full one. And we know the size of the full EKO for these processes.
In particular, we need to pass instead a callable for ArrayView4, and only when you call for that $Q^2$ element the object is loaded in memory. It is available in Python, but not available in Rust (however, it is just a matter of looking the correct .np{y,z}.lz4 file inside a tar folder, so not much different from what you have already done.

@cschwan
Copy link
Contributor

cschwan commented Feb 22, 2023

I understand that the whole thing will be big, but have you ever 1) run Grid::evolve with a large EKO, 2) saw it really is slow and 3) verified the culprit is that you pass the entire operator? From your description it seems that you anticipate it being a performance problem, but so far it seems we don't have evidence for it.

@alecandido
Copy link
Member Author

It is not a performance problem, just a memory consumption problem.

Passing a 5-dim operator constructed from the 4-dim ones, or a loader to 4-dim operator, will have exactly the same performance (unless that there are more pieces involved, since you have to join them in Python and might be slower).
However, no one is concerned about this performance differences, but only that the 5-dim object might be too big for some specific data sets on not too powerful machines, without a good reason to consume all that memory.

@cschwan
Copy link
Contributor

cschwan commented Feb 22, 2023

OK, I see! In that case, a straightforward optimization would need functions to perform the following operations:

  • which (pid1, pid0) combinations have a non-zero operator? This is currently done here for the indices of the pids:
    // list of all non-zero PID indices
    let pid_indices: Vec<_> = (0..operator.dim().3)
    .cartesian_product(0..operator.dim().1)
    .filter(|&(pid0_idx, pid1_idx)| {
    // 1) at least one element of the operator must be non-zero, and 2) the pid must be
    // contained in the lumi somewhere
    operator
    .slice(s![.., pid1_idx, .., pid0_idx, ..])
    .iter()
    .any(|&value| value != 0.0)
    && pid1_nonzero(if gluon_has_pid_zero && info.pids1[pid1_idx] == 21 {
    0
    } else {
    info.pids1[pid1_idx]
    })
    })
    .collect();

    However, this going over the entire operator and I think after generation EKO should 'know' this.
  • for each non-zero PID combination found above we next need a subset of the remaining three-dimensional operator, see here:
    // create the corresponding operators accessible in the form [muf2, x0, x1]
    let operators: Vec<_> = pid_indices
    .iter()
    .map(|&(pid0_idx, pid1_idx)| {
    operator
    .slice(s![.., pid1_idx, .., pid0_idx, ..])
    .select(Axis(0), &fac1_indices)
    .select(Axis(1), &x1_indices)
    .permuted_axes([0, 2, 1])
    .as_standard_layout()
    .into_owned()
    })
    .collect();

    This slices the full operator in its muf2- and x1-axes. These 3D operators I think we can safely pass around without any memory problems.

@alecandido
Copy link
Member Author

alecandido commented Feb 23, 2023

We are more or less slicing the other way round, as described above ($\mu_F^2 \mapsto O_4$).

The reason is that different scales are really independent, while in flavor basis in general there are no non-zero pairs (pid0, pid1).
Indeed, the operator is block-diagonal in evolution basis (QCD: singlet/gluon + diagonal, QCD+QED: singlet-u/singlet-d/gluon/photon + valence-u/valence-d + diagonal), and once you rotate even a single basis (input or output) to flavor basis, you have all entries filled.

The only special case is intrinsic: if you evolve charm below its matching scale (usually charm mass) then the evolution is the identity in flavor basis. But this is really the only case.

Moreover, in order to construct a slice containing the elements related to all $\mu_F^2$ values you have to load the full EKO (even though not all at once, so no memory problem, just inefficient because you load many times the same files from the disk - if you cache them, you have back the memory issue).

@cschwan
Copy link
Contributor

cschwan commented Feb 23, 2023

We are more or less slicing the other way round, as described above (μF2↦O4).

This can be done as well, but I'm not sure it's as efficient, but we'd simply have to try it out.

The reason is that different scales are really independent, while in flavor basis in general there are no non-zero pairs (pid0, pid1). Indeed, the operator is block-diagonal in evolution basis (QCD: singlet/gluon + diagonal, QCD+QED: singlet-u/singlet-d/gluon/photon + valence-u/valence-d + diagonal), and once you rotate even a single basis (input or output) to flavor basis, you have all entries filled.

That's not entirely correct: for Nf=5 PDFs there are no top quarks at both process and fitting scale (PIDs -6 and 6), often there's no photon and at fitting scale there are no bottom quarks (PIDs -5 and 5). For some PDFs there aren't even charm quarks! For NNPDF4.0 we can decrease the dimensions from 14x14 to 9x11, and that's a reduction of the operator size of 50%!

@alecandido
Copy link
Member Author

Ok, you're right.

But for most of the processes even bottom is there, it is just a fraction of DIS data that do not have the bottom, and almost always not for every value of $\mu^2$, so you can't drop the slice.
The further limitations are true only when fitting a specific PDF set (e.g. Nf3/Nf4/...), but these are considered variations in NNPDF (and mostly also by the other collaborations), so you only have a couple of them in a full release, and you usually develop only on the VFNS.

Moreover, as soon as we will have QED (that is decided to be almost done) the default fits and evolution will include the photon.
So you will have all flavors but bottom and top at the input scale, and all flavors but top at, at least, one output scale. So your typical operator will be 10x12 (for as long as someone won't propose to fit the bottom, but for that we have time, since we're still digesting $c^-$).

This can be done as well, but I'm not sure it's as efficient, but we'd simply have to try it out.

This is extremely convenient to evolve PDFs, and the only way to compute an EKO (since we can solve DGLAP only for one final scale at a time, but all x values and flavors at the same time).

However, that EKO loader will be available anyhow, with the $\mu^2 \mapsto O_4$ layout, because it's the one matching the new output format. If you wish, once it will be ready, you'll be able to test the most convenient way (and decide if using as it is, or reconstruct the $O_3$ slices, joining different final scales).

@cschwan
Copy link
Contributor

cschwan commented Oct 10, 2023

@alecandido I thought a bit more about the problem of what interface we should have for this, and one idea would be to have a Grid::evolve_slice. This is basically Grid::evolve, but only evolves one (factorization-scale) slice of the Grid, corresponding to a 4-dimensional EKO (slice), and ignores everything else. Then the user can combine Grid::evolve_slice and Grid::merge to sum every evolved slice. In this way we should be able to avoid the questions about callbacks. We can also rewrite Grid::evolve in terms of Grid::evolve_slice. What do you think?

@cschwan cschwan added this to the v0.6.3 milestone Oct 10, 2023
@alecandido
Copy link
Member Author

I believe it's a very good solution. The spirit was actually to consume them one by one, and I just never thought that the grid were sliced as well.

For completeness: sliced EKOs will be the only EKOs from now on, so you have to merge them anyhow to feed Grid::evolve. I would propose to even deprecate it, and eventually we could make it a wrapper for Grid::evolve_slice the moment the EKO Rust loader will be available.

The only unpleasant bit is that we should still improve Grid::merge to support small EKOs convolution fully in memory from Python (without intermediate grid write), because at the moment Python has only access to merge_from_file.

/// Note
/// ----
/// For a current limitation with the implementation of the bound object `Grid` is not possible
/// to operate with two `Grid`s in memory, since is not possible to pass a `Grid` by argument
pub fn merge_from_file(&mut self, path: PathBuf) -> PyResult<()> {

@cschwan
Copy link
Contributor

cschwan commented Oct 10, 2023

Grid is Clone now, so I don't think there's any showstopper of exposing Grid::merge to the Python interface.

@alecandido
Copy link
Member Author

Grid is Clone now, so I don't think there's any showstopper of exposing Grid::merge to the Python interface.

Ah, great. I just didn't notice it happened. We can add immediately Grid.merge to the Python interface.

@cschwan
Copy link
Contributor

cschwan commented Oct 10, 2023

PR #244 will implement the necessary changes.

@cschwan cschwan linked a pull request Oct 10, 2023 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants