Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move InstKinds out to their own vector. #4683

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

jonmeow
Copy link
Contributor

@jonmeow jonmeow commented Dec 13, 2024

This is intended to make InstKind consume a quarter the current space (1 byte instead of 4). The expectation is this'll be roughly performance neutral, although we can start changing things to enable better performance for things that are happy to only examine the InstKind.

For a file generated by source_gen, before (with the mem usage fix):

sem_ir_.insts_.loc_ids_:
  used_bytes:      34694064
  reserved_bytes:  54525948
sem_ir_.insts_.values_:
  used_bytes:      138776256
  reserved_bytes:  218103792

After:

sem_ir_.insts_.loc_ids_:
  used_bytes:      34694064
  reserved_bytes:  54525948
sem_ir_.insts_.kinds_:
  used_bytes:      8673516
  reserved_bytes:  10747903
sem_ir_.insts_.types_and_args_:
  used_bytes:      104082192
  reserved_bytes:  163577844
...
Total:
  used_bytes:      493221460
  reserved_bytes:  650325525

And for bazel run -c opt :compile_benchmark -- --benchmark_filter=Check, before:

----------------------------------------------------------------------------------------------------------------------------
Benchmark                                                 Time             CPU   Iterations      Bytes      Lines     Tokens
----------------------------------------------------------------------------------------------------------------------------
BM_CompileAPIFileDenseDecls<Phase::Check>/256       1057130 ns      1056661 ns         1024 4.98457M/s 184.543k/s 1.04196M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/1024      1960553 ns      1959331 ns         1024 16.5817M/s  499.66k/s  2.9357M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/4096      5503917 ns      5500405 ns          256  25.795M/s  732.31k/s 4.33768M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/16384    19112623 ns     19102923 ns           64  31.982M/s 853.953k/s 5.06682M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/65536    79289121 ns     79233317 ns           16 31.8608M/s 826.168k/s 4.90462M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/262144  346024763 ns    345771854 ns            8 29.5566M/s 758.098k/s 4.50093M/s

Without InstStore::Is changes:

----------------------------------------------------------------------------------------------------------------------------
Benchmark                                                 Time             CPU   Iterations      Bytes      Lines     Tokens
----------------------------------------------------------------------------------------------------------------------------
BM_CompileAPIFileDenseDecls<Phase::Check>/256       1079970 ns      1079691 ns         1024 4.87825M/s 180.607k/s 1.01974M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/1024      2028099 ns      2027076 ns         1024 16.0275M/s 482.962k/s 2.83758M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/4096      5584768 ns      5581419 ns          256 25.4206M/s  721.68k/s 4.27472M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/16384    20294882 ns     20283454 ns           64 30.1206M/s 804.252k/s 4.77192M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/65536    83486939 ns     83452746 ns           16 30.2499M/s 784.396k/s 4.65664M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/262144  351386153 ns    351165409 ns            8 29.1027M/s 746.455k/s  4.4318M/s

With InstStore::Is changes:

----------------------------------------------------------------------------------------------------------------------------
Benchmark                                                 Time             CPU   Iterations      Bytes      Lines     Tokens
----------------------------------------------------------------------------------------------------------------------------
BM_CompileAPIFileDenseDecls<Phase::Check>/256       1077198 ns      1076381 ns         1024 4.89325M/s 181.163k/s 1.02287M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/1024      2017288 ns      2015795 ns         1024 16.1172M/s 485.665k/s 2.85347M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/4096      5578900 ns      5574719 ns          256 25.4511M/s 722.548k/s 4.27986M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/16384    20038101 ns     20024204 ns           64 30.5106M/s 814.664k/s  4.8337M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/65536    81702279 ns     81648274 ns           16 30.9184M/s 801.732k/s 4.75955M/s
BM_CompileAPIFileDenseDecls<Phase::Check>/262144  349500301 ns    349287794 ns            8 29.2591M/s 750.467k/s 4.45562M/s

Note, benchmarks on my machine are expected to be noisy. But it does look like an incremental performance decrease in this test case.

@jonmeow
Copy link
Contributor Author

jonmeow commented Dec 13, 2024

Note, I'll lean on @chandlerc to decide whether this is worth it. The reduction in memory locality might be a net loss, performance versus space.

@zygoloid
Copy link
Contributor

Yeah, this is what I was worried about -- a small performance decrease seems to be visible here, but we can probably get some performance benefits by avoiding looking at the operands of instructions in cases where we only need the kind. It's hard to know if that'll be a net benefit or regression unless we do the work to only look at the kind where possible.

Can you try changing InstStore::Is and InstStore::TryGetAs to look at the kind directly instead of first grabbing the whole Inst, and see if that makes any difference? (I'm not expecting a big change there, but you never know.)

@jonmeow
Copy link
Contributor Author

jonmeow commented Dec 13, 2024

Yeah, this is what I was worried about -- a small performance decrease seems to be visible here, but we can probably get some performance benefits by avoiding looking at the operands of instructions in cases where we only need the kind. It's hard to know if that'll be a net benefit or regression unless we do the work to only look at the kind where possible.

Can you try changing InstStore::Is and InstStore::TryGetAs to look at the kind directly instead of first grabbing the whole Inst, and see if that makes any difference? (I'm not expecting a big change there, but you never know.)

Should be done. It looks like a small improvement, but it's hard to tell.

Note, down this road, there's also similar methods on TypeStore. But those are harder to adjust quickly due to dependencies and the templating (since it's templated, whereas TypeStore currently relies on the cpp file for inst access methods)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants