[AIEX] Schedule SWP epilogue with "free" instructions #247

andcarminati · 2024-12-10T10:57:19Z

This PR adds support for Epilogue scheduling. In this way, it also adds:

Support for top-down scheduling with explicit emission cycle.
Top-down logic for AIEMachineScheduler.

I recommend reviewing the pull request in the order of commits, although some of them are closely related, so I plan to combine them in the future.

Ongoing work related to EmitFixedSUnits: we current add all WAR and RAW dependencies related to the top-insert and the rest. However, the bot-insert handling can be changed to use bot register events as well.

llvm/lib/CodeGen/MachineScheduler.cpp

andcarminati · 2024-12-10T12:08:07Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

+      for (const auto &Bundle : Bundles) {
+        for (MachineInstr *SrcMI : Bundle.getInstrs()) {
+          for (unsigned OpNum = 0; OpNum < SrcMI->getNumOperands(); OpNum++) {
+            unsigned SrcClass = SrcMI->getDesc().getSchedClass();


nit: const.

nit: invariant, can be declared one loop level up.

andcarminati · 2024-12-10T12:08:22Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

+            std::optional<unsigned> OptSrcCycle =
+                InstrItins->getOperandCycle(SrcClass, OpNum);
+            assert(OptSrcCycle);
+            int Latency = *OptSrcCycle;


nit: const.

andcarminati · 2024-12-10T12:11:09Z

llvm/lib/Target/AIE/AIEMachineScheduler.h

@@ -170,6 +175,11 @@ class AIEPostRASchedStrategy : public PostGenericScheduler {
  // After scheduling a block, fill in nops, apply bundling, etc.
  void commitBlockSchedule(MachineBasicBlock *BB);

+  // This function returns true when it is possible to continue
+  // with top-down without entering in loop because all remaining instructions


nit: infinite loop.

Referring to an earlier commit, you say 'all remaining instructions'. Implementing it that way rather than focusing on the N=1 case with a delayslot would improve readability.

martien-de-jong · 2024-12-10T12:03:19Z

llvm/lib/CodeGen/MachineScheduler.cpp

+  // We want to insert above it.
+  return std::lower_bound(IsTopNode ? begin() : bottom(),
+                          IsTopNode ? top() : end(), *EmissionCycle,
+                          HasGreaterOrLessOrEqEmissionCycle);


yeah just split in two lower_bound calls.

martien-de-jong · 2024-12-10T12:38:34Z

llvm/unittests/CodeGen/ScheduleDAGMITestUtils.cpp

-      IsPreRA(IsPreRA), SchedZone(SchedBoundary::BotQID, "Zone") {}
+      IsPreRA(IsPreRA),
+      SchedZone(IsTopDown ? SchedBoundary::TopQID : SchedBoundary::BotQID,
+                "Zone") {}


It looks as if "Zone" could be more descriptive

martien-de-jong · 2024-12-10T12:44:27Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

+  const BlockState &LBS = getBlockState(Loop);
+
+  // Epilogues of pipelined loops should emit the bundles swp epilog.
+  // in a dedicated exit. If there isn't one, spawn a new block,


nit: emit the bundles of the swp epilog in a dedicated exit.

martien-de-jong · 2024-12-10T13:01:13Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

+        if (getBlockState(S).Kind == BlockType::Loop) {
+          getBlockState(L).Kind = BlockType::Epilogue;
+        }
+      });


as discussed, perhaps something along the lines if (! any_of(predecessors, IsLoop)) BS.Kind = BlockType::Regular);

martien-de-jong · 2024-12-10T13:04:42Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

-    ArrayRef<MachineBundle> TopFixedBundles;
+    ArrayRef<MachineBundle> TopFixedBundles =
+        RegionBegin == BB->begin() ? ArrayRef<MachineBundle>(BS.TopInsert)
+                                   : ArrayRef<MachineBundle>();


Check: TopFixedBundles was empty before, triggering no further action.

martien-de-jong · 2024-12-10T13:19:23Z

llvm/lib/Target/AIE/AIEMachineScheduler.cpp

-    const int DeltaCycles = CurrCycle - BotReadyCycle;
-    return FixedSU == &SU && DeltaCycles >= MinDelta;
+    if (Zone.isTop()) {
+      return FixedSU == &SU && CurrCycle == TopReadyCycle;


nit: early return false on FixedSU != &SU

gbossu · 2024-12-10T13:43:40Z

llvm/lib/Target/AIE/AIEMachineScheduler.cpp

@@ -639,7 +640,7 @@ void AIEPostRASchedStrategy::commitBlockSchedule(MachineBasicBlock *BB) {

  // Safety margin, swp epilogue
  // Note that the prologue is handled in a different way. See enterMBB.


Comment is out of date, we now only handle the safety margin here.

Curious: Can we have both a safety margin and a top-fixed region? If not, can we assert it doesn't happen?

We cannot have! If we need to supply safety margin for swp loop, it means an incorrect schedule. We cannot calculate the safety margin for swp-loop without triggering this assert:

llc: ../llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp:903: auto llvm::AIE::InterBlockScheduling::getCyclesToRespectTiming(const llvm::AIE::BlockState &, const llvm::AIE::BlockState &)::(anonymous class)::operator()(const llvm::AIE::Region &) const: Assertion `R.top_fixed_instrs().empty() && "SWP epilogue already emitted?"' failed. + /scratch/llvm-aie/build-public-mem/bin/FileCheck /scratch/llvm-aie/llvm/test/CodeGen/AIE/aie2/schedule/postpipeliner/add-store.mir

martien-de-jong · 2024-12-10T14:16:35Z

llvm/lib/Target/AIE/AIEMachineScheduler.cpp

+// This function returns false when the available queue is empty and there is a
+// single instruction in the pending queue that has a delay slot. Continuing
+// with a top-down approach in this scenario would lead to an infinite loop,
+// since instructions with delay slots are never available for the top zones.


nit: I think the last observation is more important. In fact, progress is blocked if no instruction in the pending queue can become available in top down. The fact that currently only delayslot instructions apply and that we can only have one delay slot instruction in a region is a coincidence

I'm wondering if we should remove the instruction from the pending queue altogether. Basically never have it in the queues of the Top zone.

martien-de-jong · 2024-12-10T14:18:22Z

llvm/lib/Target/AIE/AIEMachineScheduler.cpp

+// single instruction in the pending queue that has a delay slot. Continuing
+// with a top-down approach in this scenario would lead to an infinite loop,
+// since instructions with delay slots are never available for the top zones.
+bool AIEPostRASchedStrategy::canContinueTopDown() {


nit: I would revert the logic value, e.g. mustSwitchToBottomUp

martien-de-jong · 2024-12-10T14:25:19Z

llvm/lib/Target/AIE/AIEMachineScheduler.cpp

@@ -570,6 +595,10 @@ bool AIEPostRASchedStrategy::isAvailableNode(SUnit &SU, SchedBoundary &Zone,
  if (isFixedSU(SU, !Zone.isTop()))
    return false;

+  // Instruction with delay slot should bever be scheduled in top-down.


int Instructions

martien-de-jong · 2024-12-10T14:34:18Z

llvm/lib/Target/AIE/AIEMachineScheduler.cpp

+  // Instruction with delay slot should bever be scheduled in top-down.
+  if (Zone.isTop() && SU.getInstr()->hasDelaySlot())
+    return false;
+


Nit: It would be nice to have a named predicate for this, like 'doesNotProgress' that is used both here and in the logic to switch to BottomUp

martien-de-jong · 2024-12-10T14:47:06Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

+    unsigned getMaxSrcOperandLatency(const MachineInstr &MI) const {
+      unsigned MaxLatency = 0;
+      for (const MachineOperand &MO : MI.all_uses()) {
+        if (!MO.isReg() || !MO.isUse())


Can one of all_uses() be anything else?

martien-de-jong · 2024-12-10T14:54:28Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp


    // First, create SUnits for all "fixed" instructions
    // Those will be chained from/to the EntrySU/ExitSU to ensure they are
    // placed in the correct cycle. The scheduler will enforce that these fixed
    // SUnits get placed exactly at their depth (for the Top zone) or height
    // (for the Bot zone).
+    SUnit *Pred = &DAG->EntrySU;
+    for (MachineInstr &MI : CurRegion.top_fixed_instrs()) {


Perhaps mention that we iterate over bundles

llvm/lib/Target/AIE/AIEMachineScheduler.cpp

gbossu · 2024-12-10T15:04:27Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

@@ -359,6 +442,65 @@ class EmitFixedSUnits : public ScheduleDAGMutation {
          AIE::maxLatency(&MI, *TII, *ItinData, /*IncludeStages=*/true));
      FixedDepSU->addPred(Dep, /*Required=*/true);
    }
+
+    // We only need to focus on top-fixed instructions when there is an Epilog


Nit: Epilogue

martien-de-jong · 2024-12-10T15:00:06Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

+    RAT.computeAvailabilityCycles(LoopTimedBundles, /*PastTheEndCycles*/ true);
+
+    auto IsNotTopFixedSU = [Scheduler](const SUnit &SU) {
+      return !Scheduler->isFixedSU(SU, true);


nit: /*IsTop=*/true ?

martien-de-jong · 2024-12-10T15:05:15Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

+      const MachineInstr &MI = *FixedSU.getInstr();
+      if (const unsigned Latency = RAT.getMaxSrcOperandLatency(MI)) {
+        SDep Dep(&FixedSU, SDep::Artificial);
+        int latency =


nit: Latency

martien-de-jong · 2024-12-10T15:16:52Z

llvm/lib/Target/AIE/Utils/AIELoopUtils.cpp

+  }
+  // Otherwise, the loop is the fallthrough predecessor by construction
+  for (auto *Pred : MBB.predecessors()) {
+    if (Pred->isLayoutSuccessor(&MBB)) {


Hmm. 'By Construction' only holds true for the InterBlock construction.

gbossu · 2024-12-10T15:16:58Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

    const BlockState &BS =
        Scheduler->getInterBlock().getBlockState(DAG->getBB());
    const Region &CurRegion = BS.getCurrentRegion();
+    RegAvailabilityTracker RAT{ItinData, TRI};


Love the name 🤣

It is funny indeed.... ;-)

andcarminati · 2024-12-10T15:20:32Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

+    // separate mutator, doing so could be costly, as it would prevent the
+    // creation of multiple edges from EntrySU to each free instruction that
+    // depends on both timed regions (TopFixed and LoopTimed).
+    RAT.computeAvailabilityCycles(LoopTimedBundles, /*PastTheEndCycles*/ true);


Fact: we don't need a full sweep in LoopTimedBundles.

gbossu · 2024-12-10T15:21:57Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

@@ -314,20 +315,102 @@ class RegionEndEdges : public ScheduleDAGMutation {
 /// "fixed" SUnits.
 class EmitFixedSUnits : public ScheduleDAGMutation {
 public:
+  struct RegAvailabilityTracker {


I would even go a bit further and compute a "register event view". As a first conservative version:

Every instruction would create "read" events in its first cycle for every input operand

and "write" events in its last cycle (determined from maxLatency()) for every output operand

This would allow us to use the view for all cases:

Deps between a non-pipelined loop and its epilogue

Deps between top fixed and free instructions

Deps between free and bottom fixed instructions

etc.

What do you think? This could later become the base of timing-aware live ranges if we ever do register allocation after scheduling.

The downside would be: it cannot be called RAT anymore. REV is still quite cool though.

Humm, interesting. But I miss the point about the usage of maxLatency() here. For example, a post in load produces two outputs in different cycles and so I think it will be too pessimistic.

Hi @gboss, I believe we should adopt a KISS approach for this REV, given our current requirements. Specifically, we need to have a clear understanding of:

The last cycle in which a register is defined - top and bot part.

The first cycle in which a register is read - top and bot part.

With this information, we can replace the findEarliestRef method by directly connecting free SUs to ExitSU. Although this may slightly increase the cost for isAvailableNode due to a higher number of pending SUs, it will simplify the process of comparing all free instructions against the BotFixed bundles to identify the first reference.

I propose creating this event view as a separate class, outside of the subtarget, so that we can extend it as needed in the future.

What are your thoughts?

martien-de-jong · 2024-12-10T15:22:34Z

llvm/lib/Target/AIE/Utils/AIELoopUtils.h

@@ -48,6 +48,10 @@ getSingleBlockLoopMBBs(const MachineFunction &MF);
 /// Check if this block is a single block loop.
 bool isSingleMBBLoop(const MachineBasicBlock *MBB);

+/// Considering that MBB has a single predecessor that is a loop
+/// and also layout predecessor, return it.


Actually, it can return a layout predecessor that is not a loop, a unique predecessor that is not a loop or a null pointer (which is also not a loop)

This was just a refactoring. Any free use is dangerous, so I put this comment. Maybe we should name it as getLayoutPredecessor and then we don't care if it is a loop or not. In this case, it is not a loop utils function anymore.... Any suggestion?

Discussed offline: it is enough to assert isSingleMBBLoop.

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

llvm/test/CodeGen/AIE/aie2/schedule/postpipeliner/load-add-store-renamed.mir

andcarminati · 2024-12-11T10:07:20Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

+      for (const MachineOperand &MO : MI.all_uses()) {
+        if (!MO.isReg() || !MO.isUse())
+          continue;
+        for (MCRegAliasIterator Ali(MO.getReg(), TRI, true); Ali.isValid();


Don't use alias iterator here, because we populated RegisterToCycle with all aliases, so we are creating false alias cases, like bmh0 aliasing to bml0. Eve better is to not populate RegisterToCycle with alias and use alias here.

Similar to what is done for bottom-up.

Only for free instructions.

This commit prepares to schedule top fixed bundles. We also create dedicated loop exits early, handling new blocks along with their corresponding block states.

If we have TopFixed instructions, we start top-down and we change to bottom-up when we fill as much as possible the slots related to those instructions. Special care is needed for instructions with delay slot and bottom-fixed instructions.

andcarminati · 2024-12-12T13:55:41Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

+    // into account MaxLatency.
+    for (SUnit &FixedSU : make_filter_range(DAG->SUnits, IsTopFixedSU)) {
+      const MachineInstr &MI = *FixedSU.getInstr();
+      if (const unsigned Latency = RAT.getMaxSrcOperandLatency(MI)) {


This if was wrongly copy pasted. We need just maxLatency here.

Also add all related dependencies to have safety margins.

gbossu · 2024-12-13T13:21:50Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

+    if (DedicatedExit == BB) {
+
+      // Trim excedent empty bundles.
+      while (BS.TopInsert.back().empty()) {


Note: With the latests changes to the post-pipeliner, it seems we can end up with a pipeline of 1 stage, essentially meaning the loop isn't pipelined. It's probably an oversight, and the loop should not have been considered as isPipelined(). But still, it makes the BS.TopInsert.back() code above crash.

I'd suggest understanding the root cause (you can check llvm/test/CodeGen/AIE/aie2/schedule/postpipeliner/crash.mir which now crashed again 😆), and adding an assert in e.g. isPipelined() that if a loop is pipelined, it has non-empty top and bottom inserts.

andcarminati requested review from abhinay-anubola, abnikant, gbossu, khallouh, konstantinschwarz, martien-de-jong, SagarMaheshwari99 and stephenneuendorffer as code owners December 10, 2024 10:57

andcarminati marked this pull request as draft December 10, 2024 10:57

andcarminati force-pushed the andreu.swp.epilogue.scheduling branch 2 times, most recently from a905dec to 4776f1d Compare December 10, 2024 12:05

andcarminati commented Dec 10, 2024

View reviewed changes

llvm/lib/CodeGen/MachineScheduler.cpp Outdated Show resolved Hide resolved

andcarminati commented Dec 10, 2024

View reviewed changes

martien-de-jong reviewed Dec 10, 2024

View reviewed changes

gbossu reviewed Dec 10, 2024

View reviewed changes

martien-de-jong reviewed Dec 10, 2024

View reviewed changes

gbossu reviewed Dec 10, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEMachineScheduler.cpp Outdated Show resolved Hide resolved

gbossu reviewed Dec 10, 2024

View reviewed changes

martien-de-jong reviewed Dec 10, 2024

View reviewed changes

gbossu reviewed Dec 10, 2024

View reviewed changes

andcarminati commented Dec 10, 2024

View reviewed changes

gbossu reviewed Dec 10, 2024

View reviewed changes

martien-de-jong reviewed Dec 10, 2024

View reviewed changes

gbossu reviewed Dec 10, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp Show resolved Hide resolved

gbossu reviewed Dec 10, 2024

View reviewed changes

llvm/test/CodeGen/AIE/aie2/schedule/postpipeliner/load-add-store-renamed.mir Show resolved Hide resolved

andcarminati commented Dec 11, 2024

View reviewed changes

andcarminati added 5 commits December 11, 2024 12:16

[CodeGen][NFC] Add support for explicit emission cycle for top-down

13a77ec

Similar to what is done for bottom-up.

[AIEX] Add support for TD DeltaCycles in AIEMachineScheduler

78c4564

Only for free instructions.

[AIEX] Add TopInsert bundles to the MBB before the scheduling

db236fe

This commit prepares to schedule top fixed bundles. We also create dedicated loop exits early, handling new blocks along with their corresponding block states.

[AIEX] Extend isAvailableNode to handle top-fixed SUs

2327401

andcarminati force-pushed the andreu.swp.epilogue.scheduling branch from 4776f1d to cc37cc3 Compare December 12, 2024 09:20

andcarminati commented Dec 12, 2024

View reviewed changes

andcarminati force-pushed the andreu.swp.epilogue.scheduling branch from cc37cc3 to 4168aba Compare December 12, 2024 16:05

andcarminati added 2 commits December 12, 2024 16:10

[AIEX] Create SUs for top-fixed instructions

d003841

Also add all related dependencies to have safety margins.

[AIEX] Trim unnecessary empty bundles in TopInsert

c940c2e

andcarminati force-pushed the andreu.swp.epilogue.scheduling branch from 4168aba to c940c2e Compare December 12, 2024 16:11

Xilinx deleted a comment from martien-de-jong Dec 13, 2024

gbossu reviewed Dec 13, 2024

View reviewed changes

		@@ -639,7 +640,7 @@ void AIEPostRASchedStrategy::commitBlockSchedule(MachineBasicBlock *BB) {

		// Safety margin, swp epilogue
		// Note that the prologue is handled in a different way. See enterMBB.

[AIEX] Schedule SWP epilogue with "free" instructions #247

Are you sure you want to change the base?

[AIEX] Schedule SWP epilogue with "free" instructions #247

Conversation

andcarminati commented Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martien-de-jong Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andcarminati Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andcarminati Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andcarminati Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

andcarminati Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andcarminati commented Dec 10, 2024 •

edited

Loading

martien-de-jong Dec 10, 2024 •

edited

Loading

andcarminati Dec 11, 2024 •

edited

Loading

andcarminati Dec 12, 2024 •

edited

Loading

andcarminati Dec 11, 2024 •

edited

Loading

andcarminati Dec 12, 2024 •

edited

Loading