Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Lynx E-STOP #1

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,10 @@ protected void assertOpen()
assertOpen();
delegate.failSafe();
}
@Override public void lockNetworkLockAcquisitions()
{
delegate.lockNetworkLockAcquisitions();
}
@Override public void changeModuleAddress(LynxModule module, int newAddress, Runnable runnable)
{
assertOpen();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -694,6 +694,11 @@ protected void pingAndQueryKnownInterfaces() throws RobotCoreException, Interrup
// Transmitting and receiving
//----------------------------------------------------------------------------------------------

@Override public void lockNetworkLockAcquisitions()
{
this.networkTransmissionLock.lockAcquisitions();
}

protected void resetNetworkTransmissionLock() throws InterruptedException
{
this.networkTransmissionLock.reset();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ are permitted (subject to the limitations in the disclaimer below) provided that
import com.qualcomm.robotcore.util.ElapsedTime;
import com.qualcomm.robotcore.util.RobotLog;

import org.firstinspires.ftc.robotcore.internal.opmode.OpModeManagerImpl;

import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.Condition;
import java.util.concurrent.locks.Lock;
Expand All @@ -54,6 +56,8 @@ public class MessageKeyedLock

private final String name;
private final Lock lock;
private final Lock acquisitionsLock;
private volatile boolean tryingToHangAcquisitions;
private final Condition condition;
private volatile LynxMessage lockOwner;
private int lockCount;
Expand All @@ -71,8 +75,23 @@ public MessageKeyedLock(String name)

public MessageKeyedLock(String name, int msAquisitionTimeout)
{
this.name = name;
/**
* Note: this lock needs to be "fair" (threads are served in a first-come, first-serve manner)
* to prevent a rogue user OpMode from continually "barging in" on the lock and thus never letting
* anyone else grab it. The primary use case for this lock needing to be fair is to allow the code
* that gives the robot controller a "lethal injection" (I love Bob's comments :D) to bust its way
* in here and try to send failsafe commands to all the Lynx modules as a last ditch effort to
* shutdown the robot before the app is restarted. The Java documentation states that a "fair" lock
* is considerably slower than a non-fair one, but I tested loop speed both before this modification
* and after, and found no quantifiable difference.
*
* Also see: {@link OpModeManagerImpl.OpModeStuckCodeMonitor.Runner#run()} // Arrghh 'protected' breaks JavaDocs
* {@link #lockAcquisitions()}
*/
this.acquisitionsLock = new ReentrantLock(true);
this.tryingToHangAcquisitions = false;
this.lock = new ReentrantLock();
this.name = name;
this.condition = this.lock.newCondition();
this.lockOwner = null;
this.lockCount = 0;
Expand Down Expand Up @@ -112,7 +131,35 @@ public void acquire(@NonNull LynxMessage message) throws InterruptedException
{
if (message == null) throw new IllegalArgumentException("MessageKeyedLock.acquire: null message");

this.lock.lockInterruptibly();
/*
* We check whether we're supposed to lock 'acquisitionsLock' interruptibly or
* not. Basically, we want user code to be able to gracefully exit when interrupted
* whenever possible. However, in the case of a rogue OpMode (that we're attempting
* to block from sending further commands to the Lynx module), rather than letting
* it burn a ridiculous amount of CPU cycles continually eating interrupted exceptions,
* we choose to instead lock un-interruptibly, so as to hang it here peacefully until
* the JVM kills it when we restart the app.
*
*/
if(tryingToHangAcquisitions)
{
this.acquisitionsLock.lock();
}
else
{
this.acquisitionsLock.lockInterruptibly();
}

try
{
this.lock.lockInterruptibly();
}
catch (Exception e)
{
this.acquisitionsLock.unlock();
throw e;
}

try {
if (this.lockOwner != message)
{
Expand Down Expand Up @@ -144,13 +191,31 @@ public void acquire(@NonNull LynxMessage message) throws InterruptedException
finally
{
this.lock.unlock();
this.acquisitionsLock.unlock();
}
}

public void release(@NonNull LynxMessage message) throws InterruptedException
{
if (message == null) throw new IllegalArgumentException("MessageKeyedLock.release: null message");

/**
* NOTE: because lockInterruptibly() is used, this has the potential to cause
* a message to never release its lock, because if we were interrupted while a
* command was in flight, then ann InterruptedException will be thrown and we
* will not ever get to the part where we clear the lock owner and release.
*
* This is never really an issue, since any messages after interruption are
* simply suppressed - see https://github.com/FIRST-Tech-Challenge/SkyStone/issues/20
* (Since the lockInterruptibly() call in {@link #acquire(LynxMessage)} would immediately throw).
*
* However, now that we try to send a failsafe command restarting the app if the user code
* doesn't exit on time, (see {@link OpModeManagerImpl.OpModeStuckCodeMonitor.Runner#run()}),
* one might think this would be an issue. (I.e. that the last ditch effort failsafe would
* have to wait until an old lock was abandoned). However, that is actually not the case,
* because the last ditch effort failsafe only runs 1000ms after a stop is requested, and
* the timeout for abandoning a lock is 500ms, so the lock will be immediately abandoned.
*/
this.lock.lockInterruptibly();
try {
if (this.lockOwner == message)
Expand Down Expand Up @@ -190,4 +255,15 @@ public void release(@NonNull LynxMessage message) throws InterruptedException
this.lock.unlock();
}
}

public void lockAcquisitions()
{
if(LynxUsbDeviceImpl.DEBUG_LOG_DATAGRAMS_LOCK)
{
logv("***ALL FUTURE ACQUISITION ATTEMPTS FROM THREADS OTHER THAN %s WILL NOW HANG!***", Thread.currentThread().getName());
}
this.acquisitionsLock.lock();
this.tryingToHangAcquisitions = true;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ are permitted (subject to the limitations in the disclaimer below) provided that
public interface RobotCoreLynxUsbDevice
{
void failSafe();
void lockNetworkLockAcquisitions();
LynxModuleMetaList discoverModules() throws RobotCoreException, InterruptedException;
void close();
}
Original file line number Diff line number Diff line change
Expand Up @@ -477,6 +477,63 @@ protected class Runner implements Runnable {
errorWasSet = RobotLog.setGlobalErrorMsg(message);
RobotLog.e(message);

try
{
/*
* We make a last ditch effort to put any Lynx modules into failsafe
* mode before we restart the app. This has the effect of the modules
* entering failsafe mode up to 2500ms (though in reality >3000ms has
* been observed) before they otherwise would as induced by the no-comms
* timeout.
*/

final CountDownLatch lastDitchEffortFailsafeDone = new CountDownLatch(1);

new Thread(new Runnable() {
@Override
public void run() {
for (RobotCoreLynxUsbDevice dev : hardwareMap.getAll(RobotCoreLynxUsbDevice.class)) {
/*
* First, we lock network lock acquisitions. This has the effect
* of blocking any other threads from being able to acquire the network
* lock, and thus preventing anyone else from being able to send commands
* to the module behind our back.
*
* Once that's done (and that may take a bit, since there may be other
* guys queued ahead of us), we send failsafe commands to all the modules
* attached to this LynxUsbDevice (well, it does that internally, all we
* have to do is call failsafe())
*
* Then, notice that we DO NOT unlock network lock acquisitions. If we
* did, the rogue OpMode could just get right back in there and send, say,
* another setPower command after we *JUST* put the module into failsafe
* mode!
*/
dev.lockNetworkLockAcquisitions();
dev.failSafe();
}
lastDitchEffortFailsafeDone.countDown();
}
}).start();

/*
* We only wait 250ms before proceeding with the restart anyway. That way if
* for some reason the transmission code is in deadlock (or some other condition
* has happened which would cause the the above to hang) we won't HANG the code
* that's supposed to restart the app BECAUSE something hung :)
*/
if(lastDitchEffortFailsafeDone.await(250, TimeUnit.MILLISECONDS))
{
RobotLog.e("Successfully sent failsafe commands to Lynx modules before app restart");
}
else
{
RobotLog.e("Timed out while sending failsafe commands to Lynx modules before app restart");
}
}
/* Paranoia (in honor of Bob :D) */
catch (Exception ignored){}

/*
* We are giving the Robot Controller a lethal injection, try to help its operator figure out why.
*/
Expand Down