Bootsnap compile cache isn't as useful as it could be on CI and production #336

casperisfine · 2020-11-30T14:19:57Z

Problem 1: cache keys are `mtime` based

Bootsnap's compile cache is very efficient for development workflows, but on CI or production, it becomes almost unusable.

The main reason is that git (and most other VCS) don't store mtime, so on CI or production, unless your setup manage to preserve mtime, all the compile cache will be invalidated. And most CI / production systems start from a fresh clone of the repository.

The solution to this would be to use file digests instead of mtime, of course hashing a source file is slower than just accessing the mtime, but compared to parsing the Ruby source file, fast hash functions would still offer a major speed up.

Problem 2: the cache isn't self cleaning

The compile cache entries are stored based on the path of the source file. e.g. the cache for path/to.rb will be stored in <cache-dir>/<fnv1a_64(path)>. So if you keep persisting the cache between CI builds or production deploys, over time as you delete some source files, update gems etc, new entries will be created, but outdated ones won't be removed, which might lead to a very bloated cache.

Hence why we have a note in the README about regular flushing on the cache.

And the problem can be even worse with some deploy methods like capistrano, with which the real path of the source files change on every deploy.

So even if we were to fix the mtime issue, we'd need to address cache GC otherwise users would run into big troubles.

Here I'm not too sure what the best solution could be, but I have a few ideas

Solution 2.1: Splitting the cache

Assuming the biggest source of cache garbage is gem upgrades, we could have one compile cache directory per gem, e.g. we could store cache for $GEM_ROOT/gems/my-gem-1.2.3/lib/my-gem.rb in $GEM_ROOT/gems/my-gem-1.2.3/.bootsnap/<fnv1a_64(path)>, or even $GEM_ROOT/gems/my-gem-1.2.3/lib/my-gem.rb.bootsnap.

This way when you upgrade or remove a gem you automatically get rid of the old cache.

However:

This is assuming the gem directory is writable, that's not always the case.
It requires to lookup the gem root directory, which might be costly (unless we use the second path format)

I think that if we were to implement this, the vast majority of the GC problem would be solved, as path changes insides the application are much less likely to be frequent enough to produce the problem unless you keep the cache for a very long time.

Solution 2.2: `bootsnap precompile --clean`

This is much less of a general solution as I don't think is is likely that a large portion of users would integrate bootsnap precompile in their workflow, but in theory we could have it clean the outdated cache entries. Since it will go over all the source files to precompile them, it can make a list of up to date cache entries and delete the rest.

Thoughts

This two changes aren't necessarily that hard to implement, but they are a quite important change, likely justifying a major version bump. So rather than to start writing PRs head on, I'd like to have some feedback on the idea.

@burke I saw you removed yourself from the CODEOWNERS, but if you have a bit of time your insights here would be more than welcome.

@rafaelfranca @DazWorrall I think you may have opinions or hindsights on this.

The text was updated successfully, but these errors were encountered:

rafaelfranca · 2020-11-30T22:17:03Z

For problem 1 I think we should make the cache invalidation configurable so we can keep mtime in development.

For problem 2 I think the solution 2.2 is the less invasive and less prone to problems and less likely to add runtime penalty.

casperisfine · 2020-12-01T07:39:09Z

I think we should make the cache invalidation configurable so we can keep mtime in development.

I think we should first measure how much slower the digest based cache would be.

I think the solution 2.2 is the less invasive and less prone to problems and less likely to add runtime penalty.

The two proposed solutions weren't exclusive. My issue with 2.2 is that it is harder to integrate.

kimbilida · 2020-12-01T16:36:16Z

For Problem 1 could we use something like https://github.com/rosylilly to restore the mtime from the latest github commit? @esnunes tried it locally and it was quite slow when going over all files but we could apply it on a subset of files.

casperisfine · 2020-12-01T16:40:20Z

could we use something like

I suppose you mean: https://github.com/rosylilly/git-set-mtime

it was quite slow when going over all files

that's what I'd expect yes.

but we could apply it on a subset of files.

I don't think there is any subset, bootsnap has one entry for every single ruby source file. You might save a bit by skipping various text files etc, but it's not as reliable.

Also note that this issue is not just a Shopify thing, the idea is to improve the situation for all users, not just us.

kimbilida · 2020-12-01T21:03:06Z

It could be configurable to use the get set mtime so it could be used for CI environments. How would the speed compare to to using file digests?

schneems · 2021-01-25T22:03:28Z

Does it sound like this issue is the same as:

heroku/heroku-buildpack-ruby#979

I originally thought that this issue was due to a different dir at boot and runtime. But I tried to reproduce that issue locally and it looks like moving an app after the cache is generated works just fine. I'm not sure what conditions reproduce it other than deploying on Heroku.

casperisfine · 2021-01-26T08:01:09Z

Does it sound like this issue is the same as:

Not quite. The issue I'm describing is not supposed to make the cache grow. But the issue you link is another interesting thing. Since bootsnap use realpaths, with buildpacks moving the code around I suppose even the cache generated by assets:precompile can't be re-used.

Ref: #336 Bootsnap was initially designed for improving boot time in development, so it was logical to use `mtime` to detect changes given that's reliable on a given machine. But is just as useful on production and CI environments, however there its hit rate can vary a lot because depending on how the source code and caches are saved and restored, many if not all `mtime` will have changed. To improve this, we can first try to revalidate using the `mtime`, and if it fails, fallback to compare a digest of the file content. Digesting a file, even with `fnv1a_64` is of course an overhead, but the assumption is that true misses should be relatively rare and that digesting the file will always be faster than compiling it. So even if it only improve the hit rate marginally, it should be faster overall. Also we only recompute the digest if the file mtime changed, but its size remained the same, which should discard the overwhelming majority of legitimate source file changes. Co-authored-by: Jean Boussier <[email protected]>

casperisfine mentioned this issue Jan 26, 2021

Bootsnap cache incorrect or not being used? heroku/heroku-buildpack-ruby#979

Closed

casperisfine mentioned this issue May 22, 2023

sharing remote caching bootsnap cache dir across machines #443

Closed

casperisfine mentioned this issue Jan 30, 2024

Revalidate cache based on source digest #468

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bootsnap compile cache isn't as useful as it could be on CI and production #336

Bootsnap compile cache isn't as useful as it could be on CI and production #336

casperisfine commented Nov 30, 2020

rafaelfranca commented Nov 30, 2020

casperisfine commented Dec 1, 2020

kimbilida commented Dec 1, 2020

casperisfine commented Dec 1, 2020

kimbilida commented Dec 1, 2020

schneems commented Jan 25, 2021

casperisfine commented Jan 26, 2021

Bootsnap compile cache isn't as useful as it could be on CI and production #336

Bootsnap compile cache isn't as useful as it could be on CI and production #336

Comments

casperisfine commented Nov 30, 2020

Problem 1: cache keys are mtime based

Problem 2: the cache isn't self cleaning

Solution 2.1: Splitting the cache

Solution 2.2: bootsnap precompile --clean

Thoughts

rafaelfranca commented Nov 30, 2020

casperisfine commented Dec 1, 2020

kimbilida commented Dec 1, 2020

casperisfine commented Dec 1, 2020

kimbilida commented Dec 1, 2020

schneems commented Jan 25, 2021

casperisfine commented Jan 26, 2021

Problem 1: cache keys are `mtime` based

Solution 2.2: `bootsnap precompile --clean`