Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: allow proxies to supply only some modules #26334

Closed
arschles opened this issue Jul 11, 2018 · 35 comments
Closed

cmd/go: allow proxies to supply only some modules #26334

arschles opened this issue Jul 11, 2018 · 35 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@arschles
Copy link

This is a proposal for extending the vgo download API to add a mechanism to allow proxies to redirect vgo to a VCS. This functionality would be useful if a proxy has only a subset of packages in a go.mod file.

Currently, if someone adds two modules "a" v1.0.0 and "b" v1.0.0 to their go.mod file and they then run GOPROXY=myprox.com vgo install, everything works as expected if the proxy has both modules at the given versions. If not, the command fails.

It would be helpful to allow the proxy to tell vgo to fetch one or both of the modules from the VCS if it doesn't have them in its cache. This would be useful for proxy implementations where the proxy will not/cannot cache the module in its own storage or can but doesn't have that module/version in its cache. The Athens project is a present day use case for the latter - it fills its caches asynchronously.

One implementation possibility is adding a $GOPROXY/a/@v/v1.0.0?go-get=1 network request that expects the same output as already-existing ?go-get=1 requests define. This request could be made before starting the download protocol as it currently exists. The new mechanism would allow the proxy to choose to do one of the following for the given module and version identifier:

  1. Send vgo to the standard download API via the meta tag
  2. Send vgo to the VCS via the meta tag
  3. Return a 404
@arschles
Copy link
Author

arschles commented Jul 11, 2018

Update after slack discussion with @myitcv and @zeebo:

  1. we could certainly do cache fills synchronously
  2. there could be some "gotchas" with hosting the proxy, like timeouts when doing a cache fill for a large module (i.e. github.com/kubernetes/kubernetes)
  3. assuming the proxy synchronously fills caches, the user experience would be about the same as if vgo get downloaded from VCS
  4. if the proxy has a partial failure and cannot serve code, GOPROXY=myprox.com vgo get will always fail with the current behavior

Another concern that was brought up is that with GOPROXY set, the current behavior of GOPROXY=myprox.com vgo get github.com/kubernetes/kubernetes doesn't let the proxy redirect vgo to another location (i.e. a CDN) it still has to serve metadata and the zipped source code itself.

@marwan-at-work
Copy link
Contributor

marwan-at-work commented Jul 11, 2018

The proxy can redirect vgo to another URL as long as that URL implements the download protocol. That's because the net/http client handles redirect status codes internally (up to 10 times by default)

The big question here is: can the GOPROXY tell Vgo (during a build) to go use a VCS source instead of the proxy itself? Which is a little different than a simple redirect to another Download Protocol enabled URL.

@rsc rsc modified the milestones: vgo, Go1.11 Jul 12, 2018
@rsc rsc added the modules label Jul 12, 2018
@rsc rsc changed the title x/vgo: Allow proxies to send vgo to the VCS for a package cmd/go: Allow proxies to send vgo to the VCS for a package Jul 12, 2018
@marwan-at-work
Copy link
Contributor

marwan-at-work commented Jul 12, 2018

@rsc How do you envision the flow of the Proxy telling Vgo to switch to VCS?

I'm trying to follow the vgo code and it seems that on an initial build, the initial contact with the proxy's download protocol can be any of these endpoints: /@latest or /list or /@v/{certainRevision}.info.

This means that the proxy would need to return a consistent code (maybe a 404) on each of these endpoints to signal vgo to reconstruct the modfetch.Repo interface from a vcs source instead of the *proxyRepo one. Vgo would then have to back up a few steps and retry again.

Vgo can potentially always hit /list as the first entry point to the download protocol, and if the list is empty then switch to vcs. Or it can always hit @latest first and if it returns 404 then switch to vcs.

Another solution, is that the Download Protocol could implement a @probe call with a couple of parameters (module path and revision) and then the Proxy can early on tell vgo to just go for the VCS source for this particular module.

For efficiency, vgo can potentially send the entire list of modules it wants to probe to the proxy.

I haven't fully understood the last 2 days worth of changes to vgo so apologies if I'm a bit off.

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 16, 2018
@rsc
Copy link
Contributor

rsc commented Jul 17, 2018

I don't think it makes sense for a proxy to tell the go command "go to this VCS instead". We're trying to migrate to proxy by default and while VCS will probably always be with us, I'd rather not mix the two.

I do think it would probably be OK to let GOPROXY be preference list and to also allow some setting like GOPROXY=direct as an explicit name for what the default behavior is. So you could say GOPROXY=https://myproxy/,direct and just let myproxy return a 404 for the things it doesn't know about. Then the proxy isn't in charge of the actual redirect; it's only in charge of "it's not me".

@rsc rsc changed the title cmd/go: Allow proxies to send vgo to the VCS for a package cmd/go: allow proxies to supply only some modules Jul 17, 2018
@marwan-at-work
Copy link
Contributor

marwan-at-work commented Jul 17, 2018

@rsc that's about what I had in mind. The proxy shouldn't say "go to this vcs", it can just say "I don't have this module"

Does that mean the /@probe endpoint makes sense? Since that means Go can just ask the proxy whether it can work with a specific module before it even asks for @latest or /list.

I've altered the Go code if you'd like to look at a reference of what I'm suggesting: marwan-at-work@3767be8

@rsc
Copy link
Contributor

rsc commented Jul 18, 2018

@marwan-at-work I'd rather not have newProxyRepo make any network calls. It turns out to be important to delay those as long as possible. I'd rather have the existing GET paths return 404s and then have the methods be able to return some kind of recognizable "not found error" (maybe satisfying os.IsNotExist is enough) and then something at a higher level will try the next repo method down the list. I think we should wait until Go 1.12 regardless.

@rsc rsc modified the milestones: Go1.11, Go1.12 Jul 18, 2018
@marwan-at-work
Copy link
Contributor

@rsc That makes sense since cmd/go checks the cache before making network calls. I'm happy to take on this task if you'd like me to as I'll try to make it work for Athens in the near future.

Either way, I'm happy to know the Proxy can dynamically delegate modules fetching back to Go.

Thanks!

@marwan-at-work
Copy link
Contributor

@rsc I have another pass at making this work. This time, we won't hit the network until necessary, and we won't need a @probe endpoint either. I'm hoping to see if this change is not too invasive for 1.11

The idea is that a *proxyRepo can take an alternative Repo interface that it can switch to in case of 404 (or other future codes). Similar to how *cachingRepo works.

Feel free to take a look if you get the chance marwan-at-work@5117c8c

I see other ways of doing this, such as having a top level Repo interface that accepts a slice of Repos and just tries one at a time in order: (cache, proxy, vcs, etc)

So my solution above of course may still be not what how you'd like to solve this problem but would love to hear your thoughts

Thanks :)

@hyangah
Copy link
Contributor

hyangah commented Nov 2, 2018

Since this issue discusses a change to GOPROXY to allow the list of proxies or direct method -

A slightly different case I am thinking of is when the proxy server I use by default is temporarily unreachable or unavailable (possibly in the middle of fetching all dependencies) and even we can't get
404s. Reruning go get again with GOPROXY=direct upon network failure is an option when noticing this failure, but I would be happier if I can specify a set of proxy servers or even 'direct' fetch option for fallback.

But @bcmills raised a concern about leaking private package paths in the event of the private proxy failure if we just fall back to the next (direct) blindly.

@marwan-at-work
Copy link
Contributor

@hyangah I'm concerned with blindly falling back to git for two reasons:

  1. The proxy won't be able to block certain modules from being downloaded (maybe a company doesn't want their employees to use a certain open source package)
  2. If a proxy is down, maybe a build shouldn't happen at all so that the user is aware of what's happening. And of course, the user can switch to turning off the GOPROXY environment explicitly.

marwan-at-work added a commit to marwan-at-work/go that referenced this issue Nov 2, 2018
This change makes it so that a Go Modules proxy can return a 404
http status code which would signal cmd/go to end up fetching the
module from VCS instead. Note, any other status besides 404 and 200
will still fail.

Fixes golang#26334

Change-Id: Idcb0409d7fa52c8dc2601aec7f8a4274550afe3a
@hyangah
Copy link
Contributor

hyangah commented Nov 5, 2018

@marwan-at-work code freeze date is today. do you plan to mail in the change for review as described in the contribution guideline? https://golang.org/doc/contribute.html#sending_a_change_github

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/147177 mentions this issue: cmd/go: fallback to VCS if GOPROXY 404s

@spf13
Copy link
Contributor

spf13 commented Feb 22, 2019

Speaking with @hyangah yesterday, I'm concerned about the approach discussed here.

The scenario we discussed is offline yesterday was when a company wants to provide a proxy that serves private modules.

In this scenario the company should run an internal (intermediary) proxy that contains the information about the private modules. This proxy should be configured to have an upstream proxy that it uses for public requests.

If the company wants to have high availability it can run multiple internal proxies.

The client should only be configured to use the internal proxy(ies).

The company can also provide a whitelist / blacklist for the internal proxy for what upstream packages are allowed/disallowed.

@marwan-at-work
Copy link
Contributor

@spf13

  1. The use case for an internal company multi-proxy setup is to be able to switch not from one proxy to another, but from one Proxy to VCS. Where a Proxy is not trusted to have credentials but the machine where the Go command is run has VCS credentials. For example, when the Go Module Index comes out but the company is not yet ready to roll their own internal proxy for private modules yet, they should be able to do GOPROXY=moduleIndex,direct go build and trust that both public and private modules are provided.

  2. On another note, I'm also wondering how having multiple mirrors such as the Module Index (and potentially other companies) will play out? If the user can't do GOPROXY=mirror1,mirror2 go build, how would Go be able to get a module from one of many mirrors? Will every user need to have their own proxy implementation that fans out to different proxies? It sees much easier to just do the comma-separated command from the client side.

  3. With the Module Index being built to be the default public mirror, I imagine all other proxy implementations must be aware of it? Meaning, if a module does not exist, we need to redirect the user to the public proxy as a worst case scenario.

Thanks!

@hyangah
Copy link
Contributor

hyangah commented Feb 22, 2019

@marwan-at-work

  1. It will leak the company's private modules, packages and repo paths to the public Go Module Index, so it's best to avoid. I think it's better for the Go command itself to be configured to route the requests to the right proxies if an internal proxy is hard.

  2. This is a reasonable use case I can think of, but in this case, I think the primary purpose is the redundancy. In this case, 404 HTTP error code-based chaining doesn't seem right. (What if mirror1 is down and can't respond with 404?)

@marwan-at-work
Copy link
Contributor

@hyangah

  1. The only thing will leak is the module path and nothing else, not sure how bad this is but I can understand that it should be avoided.

  2. My thinking was that if your "first" proxy is down, it's best to stop the build. If your thinking is that we can have multiple proxies for redundancy reasons but not care about what the returned code is, then this is potentially bad because then a proxy will not be able to block a build for security reasons: for example if the first proxy did not allow anyone to download "github.com/malicious/package", it would return a 400 bad request, but then Go will just move on to the next proxy which might not have the same security rule.

@rsc
Copy link
Contributor

rsc commented Mar 12, 2019

I am a little confused about all this discussion. I thought we were going to do, for GOPROXY=proxy1,proxy2,proxy3:

  • Try proxy1. If proxy1 says 200, we're done. If proxy1 says anything but 404, we're also done (error). Otherwise, proxy1 said 404, so continue down the list.
  • Try proxy2. Anything but 404? Done.
  • Try proxy3.

It's an ordered list, not a parallel lookup. By saying GOPROXY=proxy1,proxy2,proxy3 you are directing the go command to send every import path to proxy1. If you should be splitting half your traffic to proxy1 and half to proxy2 and can't send the proxy2 paths to proxy1, then yes, you need a new proxy0 to split the traffic. But that is (1) fine and (2) not the envisioned use case.

The envisioned use case is some company has their own modules on an internal static file server that can pretend to be a Go proxy (because we made static file servers able to do that), and people use GOPROXY=,proxy1. Or another use case is people preferring GoCenter, but that's an incomplete module mirror, so it needs to be backed by a fallback, like "direct" or a more complete mirror. Again, if you care about not leaking paths for proxy2 to proxy1, you wouldn't do this. But there are other cases where you would.

Does anyone object to implementing the above semantics for GOPROXY=proxy1,proxy2,proxy3? If so, please explain why. Thanks.

@marwan-at-work
Copy link
Contributor

@rsc I'm not sure the above discussions were about splitting traffic or concurrently pinging the "proxy list". AFAIK, it was about whether we want to have GOPROXY be able to provide multiple URLs or provide only one highly available URL that takes care of proxying to other proxies if it needs to.

I'm happy either way, but the CL from above does what you suggested

@RohitRox
Copy link

What would be the best work around at the moment?
I have few private modules defined in go.mod and those are in my GOPROXY.
When I try go mod download, it fails for modules that are not in GOPROXY.
I can't make sure all the modules and version are in GOPROXY all the times.

@marwan-at-work
Copy link
Contributor

@RohitRox one work around is to have that proxy go mod download anything that's not existent and serve it back to the client. This way, it can provide all the modules and wouldn't 404.

@RohitRox
Copy link

@marwan-at-work That will be a chore for developers :|

@jorng
Copy link

jorng commented Apr 16, 2019

I've found this to be a big problem today when trying to setup a GOPROXY at my company.

There are many instances where there may be a mix of public and private dependencies. The problem we have is that not all of the repos on our internal GitHub instance can be made publicly accessible. If a project has any dependency that is "private", we cannot use GOPROXY at all.

@ankushchadha
Copy link

Tools like Athens and JFrog Artifactory can be used to store private Go modules and in addition, GoCenter can be used to fetch public Go modules.

@rsc
Copy link
Contributor

rsc commented Apr 23, 2019

@jorng, for Go 1.13 we expect to add a GONOPROXY environment variable that will let you set GOPROXY to a public proxy but avoid the proxy for modules matching a given pattern.

@jorng
Copy link

jorng commented Apr 23, 2019

@rsc: That will be very helpful, at least to work around the issue.

I think the GOAUTH stuff may be the best option, once implemented. I’m imagining setting up a custom proxy that can handle authentication (perhaps using our internal SSO) and gate access appropriately.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/173441 mentions this issue: cmd/go: add support for GOPROXY list

@storyicon
Copy link

I've found this to be a big problem today when trying to setup a GOPROXY at my company.

There are many instances where there may be a mix of public and private dependencies. The problem we have is that not all of the repos on our internal GitHub instance can be made publicly accessible. If a project has any dependency that is "private", we cannot use GOPROXY at all.

@jorng Do you have any help with gos? https://github.com/storyicon/gos

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/183845 mentions this issue: cmd/go/internal/modfetch: halt proxy fallback if the proxy returns a non-404/410 response for @latest

gopherbot pushed a commit that referenced this issue Jun 26, 2019
…non-404/410 response for @latest

The @latest proxy endpoint is optional. If a proxy returns a 404 for
it, and returns an @v/list with no matching versions, then we should
allow module lookup to try other module paths. However, if the proxy
returns some other error (say, a 403 or 505), then the result of the
lookup is ambiguous, and we should report the actual error rather than
"no matching versions for query".

(This fix was prompted by discussion with Dmitri on CL 183619.)

Updates #32715
Updates #26334

Change-Id: I6d510a5ac24d48d9bc5037c3c747ac50695c663f
Reviewed-on: https://go-review.googlesource.com/c/go/+/183845
Run-TryBot: Bryan C. Mills <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Jay Conrod <[email protected]>
@mikecook
Copy link

mikecook commented Sep 4, 2019

Any chance https://golang.org/cl/173441 can be backported into 1.12.x?

@dmitshur
Copy link
Contributor

dmitshur commented Sep 4, 2019

@mikecook The change to the GOPROXY behavior is too significant for a minor release. Minor releases are meant only for security fixes, serious issues with no workarounds, and documentation fixes. See https://golang.org/wiki/MinorReleases for more information.

You can get the new behavior by updating to Go 1.13.

@bcmills
Copy link
Contributor

bcmills commented Sep 4, 2019

No: there were a lot of interrelated changes in the fetch paths.

Besides, we don't generally backport features (only critical bug-fixes, which this was not).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests