Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMS: Fix relationship tag between NANO and MINI for 2016 collision datasets #3707

Open
tpmccauley opened this issue Dec 12, 2024 · 3 comments

Comments

@tpmccauley
Copy link
Member

tpmccauley commented Dec 12, 2024

If one looks at for example /Tau/Run2016H-UL2016_MiniAODv2-v1/MINIAOD one can see that there is a "ParentDataset:" tag which links to the corresponding NANOAOD dataset /Tau/Run2016H-UL2016_MiniAODv2_NanoAODv9-v1/NANOAOD. There is no corresponding tag link back to MINIAOD.

In the json this is governed by the isChildOf field. Strictly speaking the relationship between NANOAOD and MINIAOD should be a sibling relationship since they are both produced from the AOD(?) so the field in the json should be isSiblingOf. Regardless, the NANO is not the parent of MINI.

We can either remove this altogether since there are already links in the record or use the isSiblingOf field properly.

@jmhogan
Copy link
Contributor

jmhogan commented Dec 12, 2024

@tpmccauley When I've produced MC, the NANO step takes MiniAOD as its input (or they are run together sequentially in one job). But I agree with your idea -- seems wrong as it is, could be corrected by not using these flags, using sibling flags, or doing parent/child in the right order.

@tpmccauley
Copy link
Member Author

It makes more sense to me that the MINIAOD isParentOf of NANOAOD. I would be happy with either fixing the relationship to reflect this or not using them at all. I'll see how and where they are generated.

@katilp
Copy link
Member

katilp commented Dec 13, 2024

It looks we are using the "qualifier" (i.e. type) in the relations json field to refer either to the dataset of the record itself or to the dataset mentioned in that relations field.
The meaning is defined in the description field.

The former is what we do for the derived datasets, e.g. https://opendata.cern.ch/record/31316/export/json

   "relations": [
      {
        "description": "This dataset was derived from:",
        "recid": "30516",
        "type": "isChildOf"
      }
    ],

But for MINI vs NANO we use the latter, i.e:

   "relations": [
      {
        "description": "The corresponding NANOAOD dataset:",
        "recid": "30565",
        "type": "isChildOf"
      }
    ],

In this case, the tag "Parent Dataset:" pointing to the NANOAOD in the MINIAOD record header is certainly wrong

image

In the 2015 collision data, we used isSiblingOf (because technically, as Tom mentioned, in the legacy processing, they were both produced from AOD).
E.g. https://opendata.cern.ch/record/24132:

   "relations": [
      {
        "description": "The corresponding AOD dataset:",
        "recid": "24115",
        "type": "isSiblingOf"
      }
    ],

In any case, it would be good to have the usage somewhat unified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants