Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support limit and last_affected range events #470

Closed
nscuro opened this issue Jul 1, 2022 · 12 comments
Closed

Support limit and last_affected range events #470

nscuro opened this issue Jul 1, 2022 · 12 comments

Comments

@nscuro
Copy link

nscuro commented Jul 1, 2022

I came across the guava vulnerability GHSA-5mg8-w23w-74h3 for which GHSA declares the affected version range as <= 29.0.

In OSV however, this is represented as:

"ranges": [
    {
        "type": "ECOSYSTEM",
        "events": [
            {
                "introduced": "0"
            }
        ]
    }
],
"database_specific": {
    "last_known_affected_version_range": "<= 29.0"
}

Given the constraint <= 29.0, I would've expected the following:

"ranges": [
    {
        "type": "ECOSYSTEM",
        "events": [
            {
                "introduced": "0"
            },
            {
                "last_affected": "29.0"
            }
        ]
    }
]

The current situation makes automated processing unnecessarily hard. If I rely on the ECOSYSTEM range, I'll trigger lots of false positives due to it indicating a >0 constraint. database_specific is not intended to influence vulnerability evaluation according to the spec. This is also visible when inspecting the (auto-generated) Affected versions section on OSV's website: https://osv.dev/vulnerability/GHSA-5mg8-w23w-74h3

At the moment, there are about 1990 advisories affected by this:

$ rg -l '"last_known_affected_version_range"' advisory-database | wc -l
1990

google/osv.dev#474 (comment) already hinted that GHSA currently does not support the limit or last_affected events. Is it planned to be addressed anytime soon?

@chrisbloom7
Copy link
Contributor

@nscuro thank you for bringing this up. The last_affected field is relatively new to the spec and was introduced after our initial build out of our OSV transformers. We have a work item to address this. I've make a note of your ask in our internal issue tracking this work.

@chrisbloom7
Copy link
Contributor

For additional context, the work we have scheduled will allow us to transform upper bound inclusive ranges (i.e. <=) to use the last_affected range event in OSV, however the OSV schema still does not support upper bound exclusive ranges (i.e. <) so there will still be some ranges that continue to use the database specific last_known_affected_version_range when it is determined that using < best represents the upper bound of the advisory based on available info.

@chrisbloom7
Copy link
Contributor

chrisbloom7 commented Jul 1, 2022

We will also continue to use it for ranges where the upper bound inclusive <= x.y.z does not match the patched version when it's available. This is likely rare, and in this case we will include a fixed range event matching the patched version in the OSV file so you will be able to parse them correctly with automation, but we will still include the last_known_affected_version_range as a way to retain the context of the original advisory for internal purposes. Just want to be clear that there will still be some instances of last_known_affected_version_range remaining even after we switch to using last_affected.

@nscuro
Copy link
Author

nscuro commented Jul 2, 2022

@chrisbloom7 Thanks so much for the quick and thorough response! Good to know that last_known_affected_version_range will continue to be used in some cases still.

However, I'm wondering whether the ECOSYSTEM range should be omitted if last_known_affected_version_range is used? At least in scenarios like the one described above, where it's both redundant and incomplete next to <= 29.

@Shnatsel
Copy link

TL;DR: The non-standard field causes issues for anyone trying to produce or consume OSV data. Please prioritize the migration to the last_affected field from OSV 1.3.

Right now GHSA data exported to OSV causes numerous false positives, if parsed according to the spec. This makes all OSV data about crates.io unusable, unless special measures are taken to filter out GHSA data or support the custom extension field. Which in turn makes any tools consuming crates.io data from OSV unusable at present - which includes big-name tools such as Trivy.

@KateCatlin
Copy link
Collaborator

Hi all, thanks for your feedback on this!

We will be prioritizing this update in the coming weeks. I'll post again here when it is shipped!

@KateCatlin
Copy link
Collaborator

Hi all, this fix is now shipped!

Existing OSV JSON files won't have the new field yet until they are regenerated, but new ones will start to use it. I'll go ahead and close this issue as a result.

Thanks everyone for driving us forward in resolving this.

@Shnatsel
Copy link

Is re-generation of the existing files planned?

It's great that the new files are not affected, but the existing ones still cause false positives when parsed according to the OSV standard.

@KateCatlin
Copy link
Collaborator

@Shnatsel yes indeed, we have a nightly job that will resync 25,000 at a time so they'll all be regenerated with the new format over the next week.

@DmitriyLewen
Copy link

Hello @KateCatlin
Can you clarify - are there possible cases when only last_known_affected_version_range field is used? ( i mean without fixed or last_affected).
Or are these not resynchronized advisories?
example - https://github.com/github/advisory-database/blob/88a23ee5cad40396bcd47a136bd21753b96e86da/advisories/github-reviewed/2023/04/GHSA-32qq-m9fh-f74w/GHSA-32qq-m9fh-f74w.json

We have a problem with missing fixed/last_affected field.
So what I want to understand is: do we need to update our logic (add support of last_known_affected_version_range) or is this field not currently used separately (as in https://github.com/github/advisory-database/blob/main/advisories/github-reviewed/2017/10/GHSA-543v-gj2c-r3ch/GHSA-543v-gj2c-r3ch.json) and it's better to notify GitHub to just update this advisory?

Thanks in advance for your answer and thanks for what you do!

Best Regards, Dmitriy

@chrisbloom7
Copy link
Contributor

chrisbloom7 commented Mar 14, 2024

Hi @DmitriyLewen. @KateCatlin asked me to follow up with you. There are still two scenarios in which we must use that custom field, and the two example OSV files you cited hit both of them. It's helpful to preface any explanation by saying that the reason we use a database specific field at all is because we try to keep our translations between OSV and our internal format as lossless as possible since we both write to and read from that format for the purposes of community contributions, and because the OSV schema doesn't support all of the potential ways we represent version ranges internally.

In GHSA-32qq-m9fh-f74w, we use it because the affected range uses the upper bound exclusive operator < 3.5.3.1 which indicates that the last known affected version is not known1, only the first known good version2. Since that vulnerability also lacks a fix version it's unclear if 3.5.3.1 was the fix or some earlier version was, or if there ever was a fix. This is not equivalent to "last_affected": "3.5.3.1" which would translate to <= 3.5.3.1 meaning 3.5.3.1 was the last affected version, and OSV doesn't have native support for a last known good event3. If you know of any references that indicates a last affected version and/or that 3.5.3.1 was indeed the fixed version for this vulnerability then we'd welcome a community contribution that update the OSV to better represent this.

In GHSA-543v-gj2c-r3ch, we use it because the vulnerability's upper bound inclusive affected version range of >= 4.2.0, <= 4.2.5.0 is not deterministic even though a fix version is known4. In this case, the last_known_affected_version_range is there to help us retain this mismatched information when we pull community contributions back into our internal system. Since a fixed version is present here, you should prefer that and ignore the last_known_affected_version_range. If you know of any references that would clear this discrepancy up, a community contribution to correct it is again appreciated.

EDIT: Added then removed then re-added a 4th footnote; added some additional info about the ranges.

Footnotes

  1. The ecosystem's package manager may be able to say what the version prior to the first known good version is, but we can't reasonably run every package manager for every ecosystem we support so we don't attempt to reconcile things like this - our curators just go off of the information available in the canonical sources for the vulnerability.

  2. Perhaps it would be better if the database specific field was first_known_good_version_range in that case, but we continue to use last_known_affected_version_range for historic reasons.

  3. Some additional context here and here if you're interested

  4. We would expect to see this represented as >= 4.2.0, < 4.2.5.1 which is deterministic. Note that the upper bound exclusive < operator works here, as opposed to the previous example, because we do have a fix commit and because we also know when it was introduced, which can all be represented in native OSV syntax.

@DmitriyLewen
Copy link

Hello @chrisbloom7
Thanks for the quick and detailed answer

I got all needed information!
Thank you very mach!

Best Regards, Dmitriy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants