Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload-sarif action doesn't seem to respect "uriBaseId" in SARIF files #2215

Open
scottjasso opened this issue Mar 28, 2024 · 7 comments
Open

Comments

@scottjasso
Copy link

scottjasso commented Mar 28, 2024

We're using "Android Lint" to generate a sarif file. The sarif locations use this pattern:

            "originalUriBaseIds": {
                "%SRCROOT%": {
                    "uri": "file:///runner/_work/myrepo/myrepo/some/repo/dir/"
                }
            },
            ...
                             "physicalLocation": {
                                "artifactLocation": {
                                    "uriBaseId": "%SRCROOT%",
                                    "uri": "src/main/kotlin/Foo.kt"
                                },

The artifact location is relative to this uriBaseId. This is a reference to originalUriBaseIds, which the SARIF spec says should be used by consumers to find the absolute path.

However, the upload-sarif action debug logs show lines such as this:

##[debug]Unable to compute fingerprint for non-existent file: /runner/_work/myrepo/myrepo/src/main/kotlin/Foo.kt

which shows that it's not resolving paths using that %SRCROOT% path -- the correct path would be /runner/_work/myrepo/myrepo/some/repo/dir/src/main/kotlin/Foo.kt. We also see that the code scanning page says src/main/kotlin/Foo.kt can't be found in our repo ("Sorry, we couldn't find this file in the repository.").

(Caveat: we're using v2 because we can't use node20 in our private runners yet)

@nickfyson
Copy link
Contributor

While originalUriBaseIds is indeed present in the SARIF spec, I’m afraid it is not supported by Code Scanning or the CodeQL Action. The Sorry, we couldn't find this file in the repository error is good example of how originalUriBaseId can cause confusion, as when loading code location for display in the GitHub UI the we don’t want to be using the arbitrary filesystem location for the file when the code was being analysed, but rather the path relative to the root of the repository. Use of originalUriBaseIds can also lead to leaking of personal/private information about the system on which the analysis was run.

In this case I’m afraid Android Lint isn’t producing SARIF that is compatible with Code Scanning, so I think you’ll need to open a ticket with them to address the problem. Documentation on supported SARIF properties can be found here.

@scottjasso
Copy link
Author

Thanks for the response!

The Sorry, we couldn't find this file in the repository error is good example of how originalUriBaseId can cause confusion, as when loading code location for display in the GitHub UI the we don’t want to be using the arbitrary filesystem location for the file when the code was being analysed, but rather the path relative to the root of the repository. Use of originalUriBaseIds can also lead to leaking of personal/private information about the system on which the analysis was run.

Well, the upload-sarif action already handles the case where the sarif has an absolute path, and makes it relative to the repo root. (In fact that's how we're working around this bug - we use uriBaseId to replace the uri field with an absolute path, and then upload-sarif works as expected.) It seems really easy then for upload-sarif to just merge uriBaseId and uri before making the paths relative.

@nickfyson
Copy link
Contributor

Sorry for the further delay in responding!

Perhaps what might help here is to specify the checkout path when calling upload-sarif? You can see that option detailed here...

https://github.com/github/codeql-action/blob/main/upload-sarif/action.yml#L12

That should hopefully allow you override the default (root of the repo checkout) and append an arbitrary/folder/path as needed.

@scottjasso
Copy link
Author

Well there are multiple lint "runs" in the sarif file. Each gradle subproject gets a different lint result, and each one has a different originalUriBaseId path.

@nickfyson
Copy link
Contributor

nickfyson commented Apr 16, 2024

In that case, if you're looking for a way to get quickly unblocked you could try post-processing the SARIF in order to ensure the uri for each artifactLocation has the correct path from the route of the repository. That might give you a way forward if you don't have any luck getting a compatibility fix from Android Lint.

Also, it's worth reiterating that even if Code Scanning did use originalUriBaseIds things would still be broken with SARIF output like this, as you would end up with the arbitrary path from the Actions runner rather than (as needed) the path within the repository, and hence the GitHub UI would remain broken. So compatibility with Code Scanning will definitely require changes on the Android Lint side, I'm afraid.

@scottjasso
Copy link
Author

scottjasso commented Apr 16, 2024

In that case, if you're looking for a way to get quickly unblocked you could try post-processing the SARIF in order to ensure the uri for each artifactLocation has the correct path from the route of the repository. That might give you a way forward if you don't have any luck getting a compatibility fix from Android Lint.

We're already doing that, as I mentioned above. Except we don't use relative paths - we specify absolute paths in the uri field, and it works just fine. Because upload sarif is designed to do that.

Also, it's worth reiterating that even if Code Scanning did use originalUriBaseIds things would still be broken with SARIF output like this, as you would end up with the arbitrary path from the Actions runner rather than (as needed) the path within the repository, and hence the GitHub UI would remain broken. So compatibility with Code Scanning will definitely require changes on the Android Lint side, I'm afraid.

This is not correct. I explained above that upload-sarif already handles absolute paths (specific to GHA). So the fact that originalUriBaseIds has an absolute path is not the issue. Clearly upload-sarif does some preprocessing of the SARIF file first, so why can't it handle uriBaseId?

It's quite simple - upload sarif only reads the "uri" field, which doesn't match the SARIF spec. Instead it should read originalUriBaseIds[uriBaseId] + uri - literally just concatenate the two paths. Our current workaround preprocesses the SARIF to do exactly that, and it works as expected, even though the resulting path is an absolute path specific to GHA.

Again, no changes are needed for Android Lint - they are the ones following the recommendations from the SARIF spec. It's upload-sarif that is not following the spec.

@nickfyson
Copy link
Contributor

We're already doing that, as I mentioned above.

Apologies, so you did. Returning sporadically to this issue was not a recipe for the most coherent response... 😬

explained above that upload-sarif already handles absolute paths (specific to GHA)

Ah, I have learned it is not in fact the Action that handles this, but in GitHub itself...

Code scanning interprets results that are reported with relative paths as relative to the root of the repository analyzed. If a result contains an absolute URI, the URI is converted to a relative URI. The relative URI can then be matched against a file committed to the repository.

This explains why the GitHub interface does in fact work, even though upload-sarif does not itself alter the absolute URIs.

As documented Code Scanning only commits to supporting a subset of the SARIF spec, but I'll create a feature request for adding support for originalUriBaseIds when processing SARIF. And glad you had already implemented a functional workaround for the mismatch between Android Lint and Code Scanning!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants