Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add global string de-duplication cache #3470

Closed
tiif opened this issue Apr 16, 2024 · 7 comments · Fixed by rust-lang/rust#127638
Closed

Add global string de-duplication cache #3470

tiif opened this issue Apr 16, 2024 · 7 comments · Fixed by rust-lang/rust#127638
Labels
A-interpreter Area: affects the core interpreter C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement E-good-first-issue A good way to start contributing, mentoring is available

Comments

@tiif
Copy link
Contributor

tiif commented Apr 16, 2024

Currently, in localtime_r and also for some cases where we raise panics, we allocate a fresh string each time. These strings never get deallocated so they just keep accumulating. Conceptually these are static strings so we can deduplicate them with a global cache.

The cache should be used everywhere that allocate_str is called with MiriMemoryKind::Machine.

@RalfJung RalfJung changed the title String de-duplication for localtime_r Add global string de-duplication cache Apr 16, 2024
@RalfJung
Copy link
Member

(I have updated the issue title and description.)

@RalfJung RalfJung added the E-good-first-issue A good way to start contributing, mentoring is available label Apr 16, 2024
@RalfJung RalfJung added C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement A-interpreter Area: affects the core interpreter labels Apr 18, 2024
@dev-ardi
Copy link

There exist many string interning crates, are you leaning towards any specific one?

@RalfJung
Copy link
Member

RalfJung commented Apr 19, 2024

None of them are usable I think, since the strings need to be interned in machine memory.

This should be just a FxHashMap<Vec<u8>, Pointer<Option<Provenance>>>.

@Cgettys
Copy link

Cgettys commented Jun 22, 2024

Isn't this also part of a more general gap in allocate_str? For example, certain string constants in rustc const evaluation?
https://github.com/rust-lang/rust/blob/f944afe3807104803976484e7ee3aff49b6ac070/compiler/rustc_const_eval/src/util/caller_location.rs#L24
"See rust-lang/rust#89920 (comment)"
I'm guessing the performance characteristics required are different, as well as what the cache "value" should be. But thought it was worth mentioning.

Also, though currently I don't think there any current MiriMemoryKind::Machine allocate_str calls where this isn't the case, isn't an additional necessary check for interning these strings that mutability is Mutability::Not? That "seems" obvious to me, but wanted to call it out.

@RalfJung
Copy link
Member

caller_location strings could also use the cache, yeah. In fact the entire caller_location could be cached, but it's harder to handle that with the same generic infrastructure.

And indeed, only immutable strings should be cached.

bors added a commit to rust-lang-ci/rust that referenced this issue Jul 15, 2024
Add cache for `allocate_str`

Best effort cache for string allocation in const eval.

Fixes [rust-lang/miri#3470](rust-lang/miri#3470).
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jul 15, 2024
Add cache for `allocate_str`

Best effort cache for string allocation in const eval.

Fixes [rust-lang/miri#3470](rust-lang/miri#3470).
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 16, 2024
Add cache for `allocate_str`

Best effort cache for string allocation in const eval.

Fixes [rust-lang/miri#3470](rust-lang/miri#3470).
github-actions bot pushed a commit that referenced this issue Jul 16, 2024
Add cache for `allocate_str`

Best effort cache for string allocation in const eval.

Fixes [#3470](#3470).
lnicola pushed a commit to lnicola/rust-analyzer that referenced this issue Jul 16, 2024
Add cache for `allocate_str`

Best effort cache for string allocation in const eval.

Fixes [rust-lang/miri#3470](rust-lang/miri#3470).
@tiif
Copy link
Contributor Author

tiif commented Nov 6, 2024

Initially I am thinking of removing the FIXME in

miri/src/shims/time.rs

Lines 198 to 199 in 052bdcb

// FIXME: String de-duplication is needed so that we only allocate this string only once
// even when there are multiple calls to this function.

But it seems that this FIXME is not resolved yet. If I understand it correctly, rust-lang/rust#127638 only added de-duplication cache for allocate_str, but alloc_os_str_as_c_str is using allocate.

Is it possible to use allocate_str for alloc_os_str_as_c_str?

@RalfJung
Copy link
Member

RalfJung commented Nov 6, 2024

Is it possible to use allocate_str for alloc_os_str_as_c_str?

No. We should have allocate_bytes which allocates a byte slice, and that should use the cache. allocate_str can call that, and transmute the result to &str.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interpreter Area: affects the core interpreter C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement E-good-first-issue A good way to start contributing, mentoring is available
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants