Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in-memory caching #994

Closed
Swatinem opened this issue Jan 27, 2023 · 0 comments · Fixed by #1028
Closed

Add in-memory caching #994

Swatinem opened this issue Jan 27, 2023 · 0 comments · Fixed by #1028
Assignees
Labels
enhancement New feature or request

Comments

@Swatinem
Copy link
Member

#979 changes the internals of the Cacher to use moka.

That crate also allows us to keep a number cache items in memory for some time, avoiding disc access and loading of cache items, which depending on the item type can be expensive.

To fully take advantage of this, we would need to add the following things:

  • Properly implement RFC: Re-design cache keys (filesystem paths) #983 so that our caches are truely immutable.
  • Figure out a way how to "weigh" cache items, especially though that are variable-size.
  • Decide on how to do expiration. We currently expose the expiration_time of the underlying cache file to the CacheItemRequest. The idea originally was to then combine that with the different sources that lead to the cache. But with RFC: Re-design cache keys (filesystem paths) #983, that might not be necessary anymore, as caches will be truely immutable. Therefore, we can keep the expiration_time completely internal to the Cacher abstraction and use that.

To inform us about the "weight" of the cache item, and a guadance on how to size these caches, here is a table of the different caches that we have and a few properties of them:

Cache Accesses Cost to load Weight
object_meta very high: on every cfi/symcache/sourcebundle request, for every candidate low: parsing tiny json low: just a tiny json
objects medium: when needed for conversion, for sourcebundles mixed: depending on type, zlib, parsing manifest JSON for sourcebundle mixed: an open FD, parsed obj container, parsed manifest JSON for sourcebundle
cfi medium: for minidumps that have cfi high: parsing the breakpad format high: in-memory structures of parsed breakpad
symcache high: for every symbolication request low: mmap, validating headers medium: an open FD, parsed header
auxdifs mixed: for every existing symcache, depending on candidates mixed: does XML parsing on every request medium: an open FD, only parsed when needed
il2cpp mixed: for every existing symcache, depending on candidates low: just mmap medium: an open FD, only parsed when needed
portable pdb medium: just for .NET stack traces low: mmap, validating headers medium: an open FD, parsed header

Some conclusions from this table:

The item most worth caching are object_meta, as they are being accessed for every cfi/symcache/sourcebundle request, for every candidate. They are relatively cheap to parse, and cheap to hold in memory.

CFI items might be worth caching mostly as parsing them is expensive. CFI for public symbols (Microsoft, Apple) are also accessed very frequently.

Objects might not be worth caching. Objects used for cfi/symcache conversion are infrequently used. Sourcebundles are used a lot however, but they are project specific and public sourcebundles shared across all requests pretty much don’t exist. However they are quite expensive to parse.

SymCache / PortablePdb are optimized for fast parsing. Both can be shared across all requests for public symbols though.

Auxdifs / il2cpp: This needs more investigation. Probably low priority?


We should definitely cache object_meta as that has by far the most accesses, and has low weight in-memory.
CFI is worth caching, but needs more work to measure / expose its weight.

@ashwoods ashwoods added the enhancement New feature or request label Jan 27, 2023
Swatinem added a commit that referenced this issue Feb 13, 2023
Fixes #994 by adding the infrastructure and defaults for in-memory caching.

This implements weighing of cache items (based on a bunch of `size_of`s),
and per-item TTL based on the `ExpirationTime`.

Each cache defaults to ~100k in-memory size, which should be roughly ~1k items.
Except for `object_meta` which defaults to ~100M in-memory and is very hot,
and `cficaches` which defaults to ~400M and are expensive to parse.
Swatinem added a commit that referenced this issue Feb 14, 2023
Fixes #994 by adding the infrastructure and defaults for in-memory caching.

This implements weighing of cache items (based on a bunch of `size_of`s),
and per-item TTL based on the `ExpirationTime`.

Each cache defaults to ~100k in-memory size, which should be roughly ~1k items.
Except for `object_meta` which defaults to ~100M in-memory and is very hot,
and `cficaches` which defaults to ~400M and are expensive to parse.
Swatinem added a commit that referenced this issue Feb 14, 2023
Fixes #994 by adding the infrastructure and defaults for in-memory caching.

This implements weighing of cache items (based on a bunch of `size_of`s),
and per-item TTL based on the `ExpirationTime`.

Each cache defaults to ~100k in-memory size, which should be roughly ~1k items.
Except for `object_meta` which defaults to ~100M in-memory and is very hot,
and `cficaches` which defaults to ~400M and are expensive to parse.
Swatinem added a commit that referenced this issue Feb 15, 2023
Fixes #994 by adding the infrastructure and defaults for in-memory caching.

This implements weighing of cache items (based on a bunch of `size_of`s),
and per-item TTL based on the `ExpirationTime`.

Each cache defaults to ~100k in-memory size, which should be roughly ~1k items.
Except for `object_meta` which defaults to ~100M in-memory and is very hot,
and `cficaches` which defaults to ~400M and are expensive to parse.
Swatinem added a commit that referenced this issue Feb 17, 2023
Fixes #994 by adding the infrastructure and defaults for in-memory caching.

This implements weighing of cache items (based on a bunch of `size_of`s),
and per-item TTL based on the `ExpirationTime`.

Each cache defaults to ~100k in-memory size, which should be roughly ~1k items.
Except for `object_meta` which defaults to ~100M in-memory and is very hot,
and `cficaches` which defaults to ~400M and are expensive to parse.
Swatinem added a commit that referenced this issue Feb 20, 2023
Fixes #994 by adding the infrastructure and defaults for in-memory caching.

This implements weighing of cache items (based on a bunch of `size_of`s),
and per-item TTL based on the `ExpirationTime`.

Each cache defaults to ~100k in-memory size, which should be roughly ~1k items.
Except for `object_meta` which defaults to ~100M in-memory and is very hot,
and `cficaches` which defaults to ~400M and are expensive to parse.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants