Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootsnap compile cache isn't as useful as it could be on CI and production #336

Open
casperisfine opened this issue Nov 30, 2020 · 7 comments

Comments

@casperisfine
Copy link
Contributor

Problem 1: cache keys are mtime based

Bootsnap's compile cache is very efficient for development workflows, but on CI or production, it becomes almost unusable.

The main reason is that git (and most other VCS) don't store mtime, so on CI or production, unless your setup manage to preserve mtime, all the compile cache will be invalidated. And most CI / production systems start from a fresh clone of the repository.

The solution to this would be to use file digests instead of mtime, of course hashing a source file is slower than just accessing the mtime, but compared to parsing the Ruby source file, fast hash functions would still offer a major speed up.

Problem 2: the cache isn't self cleaning

The compile cache entries are stored based on the path of the source file. e.g. the cache for path/to.rb will be stored in <cache-dir>/<fnv1a_64(path)>. So if you keep persisting the cache between CI builds or production deploys, over time as you delete some source files, update gems etc, new entries will be created, but outdated ones won't be removed, which might lead to a very bloated cache.

Hence why we have a note in the README about regular flushing on the cache.

And the problem can be even worse with some deploy methods like capistrano, with which the real path of the source files change on every deploy.

So even if we were to fix the mtime issue, we'd need to address cache GC otherwise users would run into big troubles.

Here I'm not too sure what the best solution could be, but I have a few ideas

Solution 2.1: Splitting the cache

Assuming the biggest source of cache garbage is gem upgrades, we could have one compile cache directory per gem, e.g. we could store cache for $GEM_ROOT/gems/my-gem-1.2.3/lib/my-gem.rb in $GEM_ROOT/gems/my-gem-1.2.3/.bootsnap/<fnv1a_64(path)>, or even $GEM_ROOT/gems/my-gem-1.2.3/lib/my-gem.rb.bootsnap.

This way when you upgrade or remove a gem you automatically get rid of the old cache.

However:

  • This is assuming the gem directory is writable, that's not always the case.
  • It requires to lookup the gem root directory, which might be costly (unless we use the second path format)

I think that if we were to implement this, the vast majority of the GC problem would be solved, as path changes insides the application are much less likely to be frequent enough to produce the problem unless you keep the cache for a very long time.

Solution 2.2: bootsnap precompile --clean

This is much less of a general solution as I don't think is is likely that a large portion of users would integrate bootsnap precompile in their workflow, but in theory we could have it clean the outdated cache entries. Since it will go over all the source files to precompile them, it can make a list of up to date cache entries and delete the rest.

Thoughts

This two changes aren't necessarily that hard to implement, but they are a quite important change, likely justifying a major version bump. So rather than to start writing PRs head on, I'd like to have some feedback on the idea.

@burke I saw you removed yourself from the CODEOWNERS, but if you have a bit of time your insights here would be more than welcome.

@rafaelfranca @DazWorrall I think you may have opinions or hindsights on this.

@rafaelfranca
Copy link
Member

For problem 1 I think we should make the cache invalidation configurable so we can keep mtime in development.

For problem 2 I think the solution 2.2 is the less invasive and less prone to problems and less likely to add runtime penalty.

@casperisfine
Copy link
Contributor Author

I think we should make the cache invalidation configurable so we can keep mtime in development.

I think we should first measure how much slower the digest based cache would be.

I think the solution 2.2 is the less invasive and less prone to problems and less likely to add runtime penalty.

The two proposed solutions weren't exclusive. My issue with 2.2 is that it is harder to integrate.

@kimbilida
Copy link

For Problem 1 could we use something like https://github.com/rosylilly to restore the mtime from the latest github commit? @esnunes tried it locally and it was quite slow when going over all files but we could apply it on a subset of files.

@casperisfine
Copy link
Contributor Author

could we use something like

I suppose you mean: https://github.com/rosylilly/git-set-mtime

it was quite slow when going over all files

that's what I'd expect yes.

but we could apply it on a subset of files.

I don't think there is any subset, bootsnap has one entry for every single ruby source file. You might save a bit by skipping various text files etc, but it's not as reliable.

Also note that this issue is not just a Shopify thing, the idea is to improve the situation for all users, not just us.

@kimbilida
Copy link

It could be configurable to use the get set mtime so it could be used for CI environments. How would the speed compare to to using file digests?

@schneems
Copy link

Does it sound like this issue is the same as:

heroku/heroku-buildpack-ruby#979

I originally thought that this issue was due to a different dir at boot and runtime. But I tried to reproduce that issue locally and it looks like moving an app after the cache is generated works just fine. I'm not sure what conditions reproduce it other than deploying on Heroku.

@casperisfine
Copy link
Contributor Author

Does it sound like this issue is the same as:

Not quite. The issue I'm describing is not supposed to make the cache grow. But the issue you link is another interesting thing. Since bootsnap use realpaths, with buildpacks moving the code around I suppose even the cache generated by assets:precompile can't be re-used.

casperisfine pushed a commit that referenced this issue Jan 30, 2024
Ref: #336

Bootsnap was initially designed for improving boot time
in development, so it was logical to use `mtime` to detect changes
given that's reliable on a given machine.

But is just as useful on production and CI environments, however
there its hit rate can vary a lot because depending on how the
source code and caches are saved and restored, many if not all
`mtime` will have changed.

To improve this, we can first try to revalidate using the `mtime`,
and if it fails, fallback to compare a digest of the file content.
Digesting a file, even with `fnv1a_64` is of course an overhead,
but the assumption is that true misses should be relatively rare
and that digesting the file will always be faster than compiling it.
So even if it only improve the hit rate marginally, it should be
faster overall.

Also we only recompute the digest if the file mtime changed, but
its size remained the same, which should discard the overwhelming
majority of legitimate source file changes.

Co-authored-by: Jean Boussier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants