Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting gcloud archiver out to a submodule, step 1 and guide #5584

Closed
wants to merge 4 commits into from

Conversation

Groxx
Copy link
Member

@Groxx Groxx commented Jan 8, 2024

Why

The recently-updated versions of the gcloud dependencies pulled in some breaking changes to other modules that our internal monorepo cannot upgrade, in particular google.golang.org/genproto which seems to break frequently.

The good news for us is that we don't use the gcloud archiver internally, so we can just remove that dependency and unblock ourselves.
The next PR will make that possible.

The good news for others (soon) is that quite possibly you have similar problems with this or other optional / pluggable dependencies.
E.g. maybe you don't use Cassandra.
This lays the groundwork for a repeatable pattern that we can/should/will? use to break apart such things, to reduce how many dependencies we force on our users / server operators.

Things missing from this PR

For the short term, these changes will break the gcloud archiver, because the code is not yet rewritten to support a clean submodule break.

That doesn't appear too hard, but in the interest of unblocking other upgrades, it is not covered in these two PRs.
These PRs are just setting the groundwork, and allowing us to remove the un-used breaking-dependency upgrades.

So, in PRs to be built after these two are made, we will need to build a registration-based system where this is currently hardcoded.
At a high level that will likely look like:

  • Get rid of the hardcoded schemes and config structures in github.com/uber/cadence/common/archiver
    • We do not yet have a pluggable config pattern, that will need to be figured out. Multiple options definitely exist, it's just a question of which is best.
  • Create a way for the existing archivers (gcloud first) to add themselves to the shared registry of archivers
  • gcloud archiver users will import _ "github.com/uber/cadence/common/archiver/gcloud" runs the init func to register in github.com/uber/cadence/archiver (or you import and call Register() or similar)
    • If you don't use it, you don't need to do this, and you won't be bound by those dependencies.

None of this particularly matters until a release is made anyway, as release-users will not notice until everything is back in place.

How

TL;DR steps:

  • Change code so submodule references parent, not ever the reverse
  • Add a go.mod to the submodule folder
  • Tidy the parent (removes the submodule's dependencies)
  • Push
  • Tidy the submodule (does not work until previous push)
  • Push

More detailed steps (as planned anyway):

1: #5584

  • Remove parent references to the to-be-submodule (else its dependencies cannot be removed from the parent)
  • Init the submodule with an empty go.mod
    • Submodule cannot yet tidy as there are now two modules providing the same code (contents of the new-submodule)
  • Tidy the parent to remove any no longer needed dependencies
  • Push the current code, but DO NOT tag it
    • Tagging will leave a small broken window, and it just doesn't have to be done yet

2: #5585

  • Reference the parent's newest commit (anything with the submodule) in the submodule (now there is no duplicate-provide)
    • Note that this means you cannot fully prepare this step ahead of time, as it needs the new parent-SHA from step 1
  • Tidy the submodule
    • I would recommend copy/pasting the parent module's requires into this mod, it seems to preserve more versions, possibly due to varying behavior around replaces
    • Make sure it builds, any added dependencies may mean raising minimum versions
  • Push the current code

3: #TBD

  • Make the archiver plugin system, so gcloud archiving can be restored

4:

  • Now a tag can be pushed to release the new module, which will release both modules
    • go get submod@latest will not be able to ensure the parent is at the latest code yet

5: (optional)

  • Update the submodule's go.mod to point to the parent's released tag
  • Push and tag another release, which will release both modules
    • The parent and child will now have a technically-broken vN, and a fully-working vN+1 (a complete no-op for the parent)
    • Redacting vN in the child (unnecessary in the parent) is fine for precision if you want, but it is likely unnecessary in practice (getting latest works fine)

Testing these steps

To test this with a fork and make sure we could still build everything afterward, push step 1 above to a fork, and:

  1. replace the parent in the submodule's go.mod
  2. tidy the submodule
  3. push again
  4. start a new project somewhere to make sure importing the forked changes works
  5. in that external go.mod, replace BOTH the parent and child import paths with SHA-refs to the pushed fork
  6. now go mod tidy and it should work (you may need to do the same copy/paste with replaces to get identical/compatible versions)

E.g. in the separate project, make a blank main.go with:

// main.go
package main
import (
	_ "github.com/uber/cadence/cmd/server"
	_ "github.com/uber/cadence/common/archiver/gcloud"
)
func main(){}

// go.mod, use a fork sha with both modules prepped
replace github.com/uber/cadence => github.com/Groxx/cadence 7b7e0d6c135da932c24a9a6858596c734e3a182a
replace github.com/uber/cadence/common/archiver/gcloud => github.com/Groxx/cadence/common/archiver/gcloud 7b7e0d6c135da932c24a9a6858596c734e3a182a

and then test with:

// server works with gcloud-imposed upgrades
go build github.com/uber/cadence/cmd/server
// gcloud builds with this server version
go build github.com/uber/cadence/common/archiver/gcloud/...

Consequences

Congrats! You now have two modules in the same repository.

In most ways this Just Works, but some scenarios are kinda tricky.

First and foremost though, note that doing this (or not-doing this and reverting the upgrade) does not restrict our options in the future (because we don't mind breaking changes).

Modules can be split, merged, moved to separate repositories, etc regardless of what we choose now or in the future. At worst it's roughly the same amount of work as here: change, push, change, push, done.
All of these kinds of changes are breaking, but this is basically an application, not a library. Operators can be expected to make changes when upgrading if they have customized things (though obviously we should aim to keep that effort low), so we essentially don't care if changes are breaking.

Submodule's parent-version will always be behind HEAD

The submodule can't reference a version of the parent that has yet to be pushed, so its reference will ALWAYS trail the parent.
This is fine, you just have to live with it. It doesn't prevent people from using both at the same version, it just doesn't guarantee that updating the submodule will also update the parent module to a compatible version... which is addressed next.

Breaking changes cannot be consistent in one commit

Breaking changes are a bit funky, but it very closely follows how breaking out the submodule worked in the first place.

Making an incompatible change in the parent and using that in the child can be developed fairly easily, but there is no way to update the submodule's parent-module version to point to a SHA that does not yet exist.
So to be "complete" you will have to:

  1. Make the change to the whole repository
  2. Push. Anyone who does a go get submodule@SHA-step-2 (without -u) at the current SHA or tag will NOT get the newest version of the parent, and their build will fail.
  3. Update the submodule's parent-version to point to the commit/tag at step 2.
  4. Push. go get submodule@SHA-step-4 will now upgrade the parent to SHA-step-2, which is compatible.

But this is broadly optional.

Anyone who does go get submodule@SHA-step-2 and go get mainmodule@SHA-step-2 will get the latest versions of both, which are compatible. This is very easy to automate and lint.
Module requires are only minimum versions, so updating either or both beyond what they need is entirely valid and stable as far as Go modules are concerned.

This kinda sucks in terms of cleanliness. But it does function, and there really isn't any option for single-repo submodules.
The main alternative is to host the submodule in a completely separate repository. Doing this lets you release vN of the parent, update the child to use vN, and then tag child-vN.
I believe this is possible to break apart afterward, but I have not yet attempted it, and it's possible it won't work with a github host.

Builds of packages no longer ensure builds together work

Generally this will work fine in practice, but since the submodule only depends on part of the parent module, building it does not ensure that a combination works.
It's possible for an unrelated section of the parent to have a dependency that does not work with the submodule's dependencies.

This is fairly easy to verify by hand though, and will be included in step 3's PR: #TBD

  1. Make a new module
  2. Import both:
    • all of the main module's main packages
    • and the submodule (registering is unnecessary, this is just checking builds)
  3. Build everything

For testing purposes:

  • E2E tests making new binaries need to be in an internal folder so they cannot be imported externally, and:
    • Add a go.mod (this prevents this module's dependencies from appearing in the parent go.mod, like our internal/tools/go.mod)
    • Add a main.go that creates your binary with the submodule's plugin added
    • No other changes should be necessary
  • Unit tests:
    • Internal to the submodule: no changes, they just remain in the submodule.
    • Using the parent module and the submodule together: follow the same pattern as E2E tests, internal + go.mod, but otherwise no changes.

There may also be an easier option with Go workspaces, but I'm not familiar enough with them yet.


Alternatives

1: Don't do this, revert the upgrade

Currently an option, but it's not like this is the only time it'll happen.
We might as well solve it now, and document how, so we can repeat it in the future.

2: Move gcloud archiver to a separate repository rather than keeping it here

This simplifies a fair number of things because it becomes an entirely normal library. We already know how to use those. The code changes needed are almost identical, they'll just get a different import path (we can't break this module apart with github hosting AFAIK).

But this also means separate github, CI, etc maintenance :\

3: Change Go modules to not exclusively use git tags

This is the core cause of why single-repo multi-module stuff is complicated.
I'd love it, but I doubt this will happen any time soon.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 018cf9d0-6a34-43fe-81dc-33c742e52153

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.06%) to 62.269%

Totals Coverage Status
Change from base Build 018ceb07-c985-4bbe-a4d6-7e12c3d557d7: -0.06%
Covered Lines: 92840
Relevant Lines: 149096

💛 - Coveralls

Copy link
Member

@davidporter-id-au davidporter-id-au left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a great idea, it'll probably be a little confusing and may require some iteration, but well worth it

Groxx added 4 commits January 16, 2024 19:03
# Why

The recently-updated versions of the gcloud dependencies pulled in some breaking changes to other modules that our internal monorepo cannot upgrade, in particular google.golang.org/genproto .

The good news for us is that we don't use the gcloud archiver internally, so we can just remove that dependency and unblock ourselves.
The next PR will make that possible.

The good news for others (soon) is that quite possibly you have similar problems with this or other optional / pluggable dependencies.
E.g. maybe you don't use Cassandra.
This lays the groundwork for a repeatable pattern that we can/should/will? use to break apart such things, to reduce how many dependencies we force on our users / server operators.

# Things missing from this PR

For the short term, these changes will break the gcloud archiver, because the code is not yet rewritten to support a clean submodule break.

That doesn't appear too hard, but in the interest of unblocking other upgrades, it is not covered in these two PRs.
These PRs are just setting the groundwork, and allowing us to remove the un-used breaking-dependency upgrades.

So, in PRs to be built after these two are made, we will need to build a registration-based system where this is currently hardcoded.
At a high level that will likely look like:
- Get rid of the hardcoded schemes and config structures in `github.com/uber/cadence/common/archiver`
  - We do not yet have a pluggable config pattern, that will need to be figured out.  Multiple options definitely exist, it's just a question of which is best.
- Create a way for the existing archivers (gcloud first) to add themselves to the shared registry of archivers
- gcloud archiver users will `import _ "github.com/uber/cadence/common/archiver/gcloud"` runs the init func to register in `github.com/uber/cadence/archiver` (or you import and call `Register()` or similar)
  - If you don't use it, you don't need to do this, and you won't be bound by those dependencies.

None of this particularly matters until a release is made anyway, as release-users will not notice until everything is back in place.

# How

Steps (as planned anyway):

1:
- Remove parent references to the to-be-submodule (else its dependencies cannot be removed from the parent)
- Init the submodule with an empty go.mod
  - Submodule cannot yet tidy as there are now two modules providing the same code (contents of the new-submodule)
- Tidy the parent to remove any no longer needed dependencies
- Push the current code, but DO NOT tag it
  - Tagging will leave a small broken window, and it just doesn't have to be done yet

2:
- Reference the parent's newest commit (anything with the submodule) in the submodule (now there is no duplicate-provide)
  - Note that this means you cannot fully prepare this step ahead of time, as it needs the new parent-SHA from step 1
- Tidy the submodule
  - I would recommend copy/pasting the parent module's requires into this mod, it seems to preserve more versions, possibly due to varying behavior around replaces
  - Make sure it builds, any added dependencies may mean raising minimum versions
- Push the current code

3:
- Now a tag can be pushed to release the new module, which will release both modules
  - `go get submod@latest` will not be able to ensure the parent is at the latest code yet

4: (optional)
- Update the submodule's go.mod to point to the parent's released tag
- Push and tag another release, which will release both modules
  - The parent and child will now have a technically-broken vN, and a fully-working vN+1 (a complete no-op for the parent)
  - Redacting vN in the child (unnecessary in the parent) is fine for precision if you want, but it is likely unnecessary in practice (getting latest works fine)

# Testing

To test this with a fork, push step 1 above to a fork, and:

1. replace the parent in the child's go.mod
2. tidy
3. push
4. start a new project somewhere to make sure importing the forked changes works
5. in that external go.mod, replace BOTH the parent and child import paths with SHA-refs to the pushed fork
6. now `go mod tidy` and it should work (you may need to do the same copy/paste with replaces to get identical/compatible versions)

# Consequences

Congrats!  You now have two modules in the same repository.

In most ways this Just Works, but some scenarios are kinda tricky.

## Submodule's parent-version will always be behind HEAD

The child can't reference a version of the parent that has yet to be pushed, so its reference will ALWAYS trail the parent.
This is fine, you just have to live with it.

## Breaking changes cannot be consistent in one tag/commit

Breaking changes are a bit funky, but it very closely follows how breaking out the submodule worked in the first place.

Making an incompatible change in the parent and using that in the child can be *developed* fairly easily, but there is no way to update the submodule's parent-module version to point to a SHA that does not yet exist.
So to be "complete" you will have to:

1. Make the change to the whole repository
2. Push.  Anyone who does a `go get submodule@SHA-step-2` (without `-u`) at the current SHA or tag will NOT get the newest version of the parent, and their build will fail.
3. Update the submodule's parent-version to point to the commit/tag at step 2.
4. Push.  `go get submodule@SHA-step-4` will now work.

**But this is broadly optional**.

Anyone who does `go get submodule@SHA-step-2` *and* `go get mainmodule@SHA-step-2` will get the latest versions of both, which are compatible.
If your step 2 used release semver tags, anyone who does `go get -u submodule@vSemver-at-step-2` will implicitly also get the same version
Module requires are only *minimum* versions, so updating either or both beyond what they need is entirely valid and stable as far as Go modules are concerned.

This kinda sucks in terms of cleanliness.  But it does function, and there really isn't any option for single-repo submodules.
The main alternative is to host the submodule in a completely separate repository.  Doing this lets you release vN of the parent, update the child to use vN, and then tag child-vN.
I believe this is possible to break apart afterward, but I have not yet attempted it, and it's possible it won't work with a github host.

## Builds of packages no longer ensure builds together work

Generally this will work fine in practice, but since the submodule only depends on *part* of the parent module, building it does not ensure that a combination works.
It's possible for an unrelated section of the parent to have a dependency that does not work with the submodule's dependencies.

This is fairly easy to verify by hand though:

1. Make a new module
2. Import both all of the main module's `main` packages, and the submodule (registering is unnecessary, this is just checking builds)
3. Build everything

There may also be an easier option with Go workspaces, but I'm not familiar enough with them yet.
@Groxx
Copy link
Member Author

Groxx commented Jan 30, 2024

Closing in favor of the easier-to-maintain #5609

@Groxx Groxx closed this Jan 30, 2024
@Groxx Groxx deleted the submod branch January 31, 2024 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants