Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIE] add StickyWAW mutator to postscheduler/postpipeliner #222

Open
wants to merge 3 commits into
base: aie-public
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions llvm/lib/Target/AIE/AIEBaseSubtarget.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,6 @@ static cl::opt<bool> EnablePipelinerSchedPropagateIncomingLatencies(
static cl::opt<bool> EnableWAWStickyRegisters(
"aie-pipeliner-waw-sticky-registers", cl::Hidden, cl::init(true),
cl::desc("Apply sticky registers WAW dependency removal"));
static cl::opt<unsigned> WAWStickyRegistersMemOpsThreshold(
"aie-waw-sticky-register-mem-threshold", cl::Hidden, cl::init(4),
cl::desc("Number of memory instructions to enable the register exclusion "
"heuristic in WAW sticky registers dep. removal"));

// These are debugging/testing options.

Expand Down Expand Up @@ -629,6 +625,8 @@ AIEBaseSubtarget::getPostRAMutationsImpl(const Triple &TT) {
std::vector<std::unique_ptr<ScheduleDAGMutation>> Mutations;
Mutations.emplace_back(std::make_unique<LockDelays>());
if (!TT.isAIE1()) {
if (EnableWAWStickyRegisters)
Mutations.emplace_back(std::make_unique<WAWStickyRegistersEdges>());
Mutations.emplace_back(std::make_unique<RegionEndEdges>());
Mutations.emplace_back(std::make_unique<MemoryEdges>());
Mutations.emplace_back(std::make_unique<MachineSchedWAWEdges>());
Expand Down
7 changes: 5 additions & 2 deletions llvm/lib/Target/AIE/AIEMachineScheduler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1212,8 +1212,9 @@ void llvm::AIEPostRASchedStrategy::buildGraph(ScheduleDAGMI &DAG, AAResults *AA,
assert(BS.getRegions().size() == 1);
// Try to wrap the linear schedule within II.
// We virtually unroll the body by the stagecount, computed from rounding
// up the length divided by II.
NCopies = (BS.getScheduleLength() + II - 1) / II;
// up the length divided by II, adding one more stage to account for
// the added resource contention
NCopies = (BS.getScheduleLength() + II - 1) / II + 1;
}
DEBUG_BLOCKS(dbgs() << " buildGraph, NCopies=" << NCopies << "\n");
for (int S = 0; S < NCopies; S++) {
Expand Down Expand Up @@ -1254,6 +1255,8 @@ void AIEScheduleDAGMI::schedule() {
// If it succeeds, we need to implement it, if we fail we fall back on the
// normal loop schedule
SchedImpl->buildGraph(*this, AA);
postProcessDAG();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHECK : Why do we need the call to "postProcessDAG" ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the DAG mutations are required for correctness, e.g. giving a correct latency to memory edges


auto &PostSWP = BS.getPostSWP();
if (PostSWP.schedule(*this, BS.FixPoint.II)) {
BS.setPipelined();
Expand Down
8 changes: 4 additions & 4 deletions llvm/test/CodeGen/AIE/aie2/schedule/postpipeliner/crash.mir
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
# RUN: llc --mtriple=aie2 -O2 --start-before=postmisched %s -o - | FileCheck %s

# This crashed the postpipeliner because it reaches NCopies=1 which causes an out of
# bound access when setting up LCD heuristics.
# bound access when setting up LCD heuristics.
# The filecheck reference is the unpipelined loop

--- |
Expand All @@ -29,13 +29,13 @@
; CHECK-NEXT: .p2align 4
; CHECK-NEXT: .LBB0_1: // %for.body
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: nopb ; lda r0, [p2, #0]; nops ; nopxm ; nopv
; CHECK-NEXT: nopx
; CHECK-NEXT: nopb ; lda r0, [p2, #0]; nops ; nopx ; mov p2, p1; nopv
; CHECK-NEXT: nopa ; nopx
; CHECK-NEXT: nop
; CHECK-NEXT: nop
; CHECK-NEXT: nop
; CHECK-NEXT: nop
; CHECK-NEXT: nop
; CHECK-NEXT: mov p2, p1
; CHECK-NEXT: .L_LEnd0:
; CHECK-NEXT: nopb ; nopa ; st r0, [p0, #0]; nopxm ; nopv
; CHECK-NEXT: // %bb.2: // %for.cond.cleanup
Expand Down
36 changes: 22 additions & 14 deletions llvm/test/CodeGen/AIE/aie2/schedule/postpipeliner/round.mir
Original file line number Diff line number Diff line change
Expand Up @@ -34,41 +34,49 @@
; CHECK-NEXT: nop // Delay Slot 2
; CHECK-NEXT: nop // Delay Slot 1
; CHECK-NEXT: // %bb.1: // %for.body.preheader
; CHECK-NEXT: add.nc lc, r0, #-1
; CHECK-NEXT: add.nc lc, r0, #-3
; CHECK-NEXT: movxm ls, #.LBB0_2
; CHECK-NEXT: movxm le, #.L_LEnd0
; CHECK-NEXT: nopb ; vlda.ups.s32.s8 cm0, s0, [p0], #32; nops ; nopxm ; nopv
; CHECK-NEXT: nopb ; vlda.ups.s32.s8 cm1, s0, [p0], #32; nops ; nopxm ; nopv
; CHECK-NEXT: nopb ; nopa ; nops ; nopxm ; nopv
; CHECK-NEXT: nopb ; nopa ; nops ; nopxm ; nopv
; CHECK-NEXT: nopb ; nopa ; nops ; nopxm ; nopv
; CHECK-NEXT: nopb ; nopa ; nops ; nopxm ; nopv
; CHECK-NEXT: nopa ; nopb ; nopxm
; CHECK-NEXT: nopb ; vlda.ups.s32.s8 cm0, s0, [p0], #32; nops ; nopxm ; nopv
; CHECK-NEXT: nopb ; vlda.ups.s32.s8 cm1, s0, [p0], #32; nops ; nopxm ; nopv
; CHECK-NEXT: nop
; CHECK-NEXT: nop
; CHECK-NEXT: vsrs.s8.s32 wh0, cm0, s1
; CHECK-NEXT: vlda.ups.s32.s8 cm0, s0, [p0], #32; vsrs.s8.s32 wh2, cm1, s1
; CHECK-NEXT: vlda.ups.s32.s8 cm1, s0, [p0], #32
; CHECK-NEXT: nop
; CHECK-NEXT: vups.s32.s8 cm2, wh0, s1
; CHECK-NEXT: vsrs.s8.s32 wh0, cm0, s1; vups.s32.s8 cm3, wh2, s1
; CHECK-NEXT: .p2align 4
; CHECK-NEXT: .LBB0_2: // %for.body
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: vlda.ups.s32.s8 cm0, s0, [p0], #32; nopb ; vsrs.s8.s32 wh0, cm0, s1
; CHECK-NEXT: vlda.ups.s32.s8 cm1, s0, [p0], #32; vsrs.s8.s32 wh2, cm1, s1
; CHECK-NEXT: nop
; CHECK-NEXT: nop
; CHECK-NEXT: nopb ; vlda.ups.s32.s8 cm0, s0, [p0], #32; vsrs.s8.s32 wh2, cm1, s1; nopxm ; nopv
; CHECK-NEXT: vlda.ups.s32.s8 cm1, s0, [p0], #32; nopb ; vst.srs.s8.s32 cm2, s0, [p1], #32
; CHECK-NEXT: vst.srs.s8.s32 cm3, s0, [p1], #32
; CHECK-NEXT: vups.s32.s8 cm2, wh0, s1
; CHECK-NEXT: vups.s32.s8 cm3, wh2, s1
; CHECK-NEXT: nop
; CHECK-NEXT: vst.srs.s8.s32 cm2, s0, [p1], #32
; CHECK-NEXT: .L_LEnd0:
; CHECK-NEXT: nopb ; nopa ; vst.srs.s8.s32 cm3, s0, [p1], #32; nopxm ; nopv
; CHECK-NEXT: nopb ; nopa ; vsrs.s8.s32 wh0, cm0, s1; nopx ; vups.s32.s8 cm3, wh2, s1; nopv
; CHECK-NEXT: // %bb.3: // %for.cond.cleanup
; CHECK-NEXT: vsrs.s8.s32 wh0, cm0, s1; nopx
; CHECK-NEXT: nopa ; vsrs.s8.s32 wh2, cm1, s1; nopx
; CHECK-NEXT: vst.srs.s8.s32 cm2, s0, [p1], #32
; CHECK-NEXT: vst.srs.s8.s32 cm3, s0, [p1], #32
; CHECK-NEXT: vups.s32.s8 cm2, wh0, s1
; CHECK-NEXT: vsrs.s8.s32 wh0, cm0, s1; vups.s32.s8 cm3, wh2, s1
; CHECK-NEXT: vsrs.s8.s32 wh2, cm1, s1
; CHECK-NEXT: nop
; CHECK-NEXT: nop
; CHECK-NEXT: vst.srs.s8.s32 cm2, s0, [p1], #32
; CHECK-NEXT: vst.srs.s8.s32 cm3, s0, [p1], #32
; CHECK-NEXT: vups.s32.s8 cm2, wh0, s1
; CHECK-NEXT: vups.s32.s8 cm3, wh2, s1
; CHECK-NEXT: nop
; CHECK-NEXT: vst.srs.s8.s32 cm2, s0, [p1], #32
; CHECK-NEXT: vst.srs.s8.s32 cm3, s0, [p1], #32
; CHECK-NEXT: nop
; CHECK-NEXT: nop
; CHECK-NEXT: .p2align 4
; CHECK-NEXT: .LBB0_4: // %for.cond.cleanup
; CHECK-NEXT: nopa ; ret lr
Expand Down
6 changes: 4 additions & 2 deletions llvm/test/CodeGen/AIE/aie2/schedule/status_regs/srWAW.mir
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@
#
# (c) Copyright 2024 Advanced Micro Devices, Inc. or its affiliates

# RUN: llc -mtriple=aie2 --run-pass=postmisched --issue-limit=1 -debug-only=machine-scheduler %s -o - 2>%t.log
# RUN: cat %t.log | FileCheck %s --check-prefix=CHECK-WAW
# RUN: llc -mtriple=aie2 --run-pass=postmisched --issue-limit=1 \
# RUN: -debug-only=machine-scheduler --aie-pipeliner-waw-sticky-registers=0 \
# RUN: %s -o /dev/null 2>&1 | FileCheck %s --check-prefix=CHECK-WAW
# REQUIRES: asserts

# This test checks the write-after-write(WAW) dependencies
# We have disabled the sticky version, since its dump confuses FileChecking the debug output

---
# Here we have two WAW dependencies with srcarry W1->W3, W2->W3, where W3 is a live write
Expand Down
Loading