Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I contribute to "Repo lookup from PyPI too conservative to use" #3249

Closed
joshgc opened this issue Jul 7, 2023 · 3 comments · Fixed by #3423
Closed

Can I contribute to "Repo lookup from PyPI too conservative to use" #3249

joshgc opened this issue Jul 7, 2023 · 3 comments · Fixed by #3423
Labels
kind/bug Something isn't working question Further information is requested

Comments

@joshgc
Copy link
Contributor

joshgc commented Jul 7, 2023

Describe the bug
Many PyPI packages specify their GitHub source repository in ways that scorecard cannot parse. The issue seems to be that PyPI places no restriction on the names in the project_urls map. scorecard however only looks at exactly one key and then bails if it doesn't find a github repo there.

Consider the structure of these very well supported projects, which all use different structures

$ curl --silent https://pypi.org/pypi/scipy/json | jq '.info.project_urls'
{
  "Documentation": "https://docs.scipy.org/doc/scipy/",
  "Download": "https://github.com/scipy/scipy/releases",
  "Homepage": "https://scipy.org/",
  "Source": "https://github.com/scipy/scipy",
  "Tracker": "https://github.com/scipy/scipy/issues"
}

$ curl --silent https://pypi.org/pypi/numpy/json | jq '.info.project_urls'
{
  "Bug Tracker": "https://github.com/numpy/numpy/issues",
  "Documentation": "https://numpy.org/doc/1.25",
  "Download": "https://pypi.python.org/pypi/numpy",
  "Homepage": "https://www.numpy.org",
  "Source Code": "https://github.com/numpy/numpy"
}

$ curl --silent https://pypi.org/pypi/tensorflow/json | jq '.info.project_urls'
{
  "Download": "https://github.com/tensorflow/tensorflow/tags",
  "Homepage": "https://www.tensorflow.org/"
}

Reproduction steps
scorecard --pypi numpy # This fails

Expected behavior
I think scorecard ought to be able to handle this non-uniformity to find a github repo.

Additional context
I'm happy to put in a PR to fix this (already started) if the reviewers agree with the approach (and will help me with my golang =D).

Proposal 1: Construct and ordered list of case & whitespace insensitive keys to check in project urls. If any look like a github repo use the first one that matches.
Proposal 2: Find all the github repos listed in project_urls. Fail if there are any number of repos besides exactly 1 found.

Thank you so much for this tool!

@joshgc joshgc added the kind/bug Something isn't working label Jul 7, 2023
@joshgc joshgc changed the title Repo lookup from PyPI too conservative to use Can I contribute to "Repo lookup from PyPI too conservative to use" Jul 17, 2023
@naveensrinivasan naveensrinivasan added the question Further information is requested label Jul 26, 2023
@spencerschrock
Copy link
Member

Proposal 1: Construct and ordered list of case & whitespace insensitive keys to check in project urls. If any look like a github repo use the first one that matches. Proposal 2: Find all the github repos listed in project_urls. Fail if there are any number of repos besides exactly 1 found.

Thank you so much for this tool!

I think 1 could work, or a modified version of 2. :

  1. Look for github/gitlab repos
  2. If github, canonicalize to github.com/owner/repo. GitLab supports some nesting unfortunately where we can't do this
  3. De-duplicate, and bail if more than 1.

Additional context I'm happy to put in a PR to fix this (already started) if the reviewers agree with the approach (and will help me with my golang =D).

happy to help review

@joshgc
Copy link
Contributor Author

joshgc commented Aug 23, 2023

@spencerschrock Local commit is ready for review. Can you provide me access to make a PR here?

$     git push --set-upstream origin joshgc_pypi

ERROR: Permission to ossf/scorecard.git denied to joshgc.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Also I'm not familiar with Gitlab URLs but if you have some examples I'm happy to add them to this PR

@spencerschrock
Copy link
Member

@spencerschrock Local commit is ready for review. Can you provide me access to make a PR here?

$     git push --set-upstream origin joshgc_pypi

ERROR: Permission to ossf/scorecard.git denied to joshgc.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Also I'm not familiar with Gitlab URLs but if you have some examples I'm happy to add them to this PR

You're trying to push your branch to our repo. You'll need to push the branch to your fork, and send the PR from the fork (relevant doc links: fork and PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants