Significant Overhaul of the Interpreter's Timing Model #2235

Jaklyy · 2024-12-13T22:01:06Z

Heavily reworks the ARM9 & ARM7 timing models to greatly improve accuracy (and slaughter performance).
Builds upon my work in #2125 and uses the excellent cache implementation found in #1955 (probably want to merge those two first). (hopefully building this pr upon those two doesn't cause any stupid or weird issues with git...? Fingers crossed?)

Implements:

Cache streaming
Write buffer
Bus cycle rounding
Main RAM contention
Improvements to certain instruction timings
Memory stage cycles are now distinguished from the execute stage
Interlocks
Improvements to memory access timings
Minor improvements to DMA timings
ARM9 now only stops for DMA when accessing the bus
Fix ExMemCnt having the incorrect default state. (at least for direct boot, non-direct boot state shouldn't matter...?) (also prevents software from toggling certain bits).
Removes a few non-existent cp15 cache commands

Known Issues:

JIT is completely broken and will most likely need a significant amount of effort to work again.
Write Buffer is very approximate; it needs a lot more work to really be accurate...
There are actually two different types of interlock, this treats all interlocks as identical, which is wrong.
Most DSi stuff has either not been implemented, or extensively tested yet.
There are probably oodles of regressions, freezes, and crashes I have yet to spot.
Main RAM DMA Timings are slightly worse for long DMAs.
Interpreter is roughly half the speed. This is unfortunately just a consequence of chasing high levels of accuracy, and unlikely to be fixed.
ARM7 DMA has yet to be touched.
Full ExMemCnt defaults have yet to be validated; all I know for sure is that bit 15 should be set by default. (TwilightMenu++ relies on this to boot).
Write buffer also uses a shortcut of sorts. It doesn't actually use and increment the address value passed via the fifo. (should be the same as how hw does it?) Im not entirely sure why, but it caused issues.
Nothing is included in savestates yet, so they may be a little broken.

also remove no longer needed variable

remove some checks for interlock that im pretty sure can't trigger

not implemented for direct boot

I believe this also applies to other loads as well, but currently untested.

need to verify if they apply to all store instructions

might be less accurate

something *has* to rely on this, as stupid as it seems

fixes bw2

IM SORRY GENERIC

fixes twilight menu

JesseTG · 2024-12-13T22:05:18Z

src/ARM.h

@@ -171,20 +219,48 @@ class ARM
    u32 DataRegion;
    s32 DataCycles;

-    u32 R[16]; // heh
+    alignas(64) u32 R[16]; // heh


Did you mean u64 here?

the alignas? no. i explicitly meant to align it to a host cacheline. which should be 64 bytes. it seemed to give a noticeable performance boost doing so in a few places (though maybe that was just luck?)

In that case, I think you might want std::hardware_destructive_interference_size or std::hardware_constructive_interference_size, so that you don't need to hardcode the cacheline size.

what's the difference between the two?

From the linked reference page:

Minimum offset between two objects to avoid false sharing. Guaranteed to be at least alignof(std::max_align_t)

Maximum size of contiguous memory to promote true sharing. Guaranteed to be at least alignof(std::max_align_t)

It has details and examples.

oh no that was covering up SO many bugs hhhhsdfghhg

caused innumerable issues will need a more comprehensive rewrite later

this should fix something?

the hack is to make arm9 dma contention work with prior improvements to synchronization

Jaklyy added 30 commits June 7, 2024 23:46

fully implement r15 stores being +12 of addr

2b0ed45

idk why it took me two tries to get these instructions to work properly

7350762

fix some more instructions?

0c88720

mcr is also affected

8191f92

fix bits fixed to 0 for pu region sizing being set

5f97dfc

most cpsr bits can't actually be updated (or at least can't be read?)

3699768

clarification

659763f

imma be real, i have no idea what is going on here

849d4e5

remove out of date comments

b846c6f

more weirdness

be60c68

what the actual F*** is going on

b90d5c2

it all makes sense now...

ae0824f

ldrd is just ldm

ca04710

verified

3ddccde

also remove no longer needed variable

swp/swpb jumps work on the arm 7?

048b0b8

verify writable msr bits

4221810

track interlock cycles for load instructions

5a174a2

track interlock cycles for the ALU

aa1217a

initial implementation of interlock cycles

a973c0b

don't do interlocks for the arm7

4495576

fix performance regression for disabling interlock emulation path

debaaa0

implement correct/guess interlocks for remaining instructions

5b37ca7

im smart

f00f1f6

implement two regs i missed

a9e2c7e

verify interlocks for alu and load/store

c5258d6

remove some checks for interlock that im pretty sure can't trigger

correct interlocked reg for umlal

e6ba407

implement configurable vram bus width

f1b71fe

not implemented for direct boot

disable interlock emulation, needs more research

3583d82

improve ldm timings

109bbed

I believe this also applies to other loads as well, but currently untested.

improve stm timings

dbe00e7

need to verify if they apply to all store instructions

Jaklyy added 22 commits December 8, 2024 11:19

only recalc mpu lut if it changed

bda05a7

jakly pls

8e6755c

fix emulator hanging under certain circumstances

91752c1

tweak scheduler for better performance

0df4369

might be less accurate

...removing the (s32) fixes sign extension? ig???

1a1934d

fix writeback when rn is also rd in ldr

7a4234d

something *has* to rely on this, as stupid as it seems

fix branches being able to break the queue system

f823a92

fixes bw2

optimize one of the main loops

aa2cdc3

avoid checking T bit every instruction

33f6218

actually those do literally nothing

fe9a9ee

cacheline align register array

cbdd6a0

IM SORRY GENERIC

cache line boundary align condition lut table

918df04

micro-optimization

0111ee7

probably faster to directly access main ram?

52e1461

fix a main loop freeze; exmemcnt bit 15 starts set

8382769

fixes twilight menu

improve ExMemCnt handling and defaults

b048e0c

implement bit 10 of exmemcnt

96c8f67

clarify some more write buffer details

feb1cd5

dma rewrite 1

d341260

tweak dmas to be more accurate (actually less?)

73be2f3

probably unborks gxfifo stalls

642f085

unbork gxfifo stalls

456d07d

JesseTG reviewed Dec 13, 2024

View reviewed changes

Jaklyy added 7 commits December 13, 2024 21:32

probably not any faster

cce5070

this makes a bit more sense

a445c0d

fix the system timestamp being run wayyyy too fast

ac1d790

oh no that was covering up SO many bugs hhhhsdfghhg

disable main ram contention for arm9 dma

610ac24

caused innumerable issues will need a more comprehensive rewrite later

hopefully reduce desync potential a little?

5e94566

minor fix(?)

4ea0e60

this should fix something?

implement MR cont. for arm7 dma; also a hack?

2051d41

the hack is to make arm9 dma contention work with prior improvements to synchronization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant Overhaul of the Interpreter's Timing Model #2235

Significant Overhaul of the Interpreter's Timing Model #2235

Jaklyy commented Dec 13, 2024 •

edited

Loading

JesseTG Dec 13, 2024

Jaklyy Dec 13, 2024

JesseTG Dec 14, 2024 •

edited

Loading

Jaklyy Dec 14, 2024

JesseTG Dec 14, 2024

Significant Overhaul of the Interpreter's Timing Model #2235

Are you sure you want to change the base?

Significant Overhaul of the Interpreter's Timing Model #2235

Conversation

Jaklyy commented Dec 13, 2024 • edited Loading

JesseTG Dec 13, 2024

Choose a reason for hiding this comment

Jaklyy Dec 13, 2024

Choose a reason for hiding this comment

JesseTG Dec 14, 2024 • edited Loading

Choose a reason for hiding this comment

Jaklyy Dec 14, 2024

Choose a reason for hiding this comment

JesseTG Dec 14, 2024

Choose a reason for hiding this comment

Jaklyy commented Dec 13, 2024 •

edited

Loading

JesseTG Dec 14, 2024 •

edited

Loading