Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve performance of unwaited condvar_wake_one()/all()
Previously, condvar_wake_one()/all() took the condvar's internal lock before testing if anyone is waiting; A condvar_wake when nobody was waiting was mutex_lock()+mutex_unlock() time (on my machine, 26 ns) when there is no contention, but much much higher (involving a context switch) when several CPUs are trying condvar_wake concurrently. In this patch, we first test if the queue head is null before acquiring the lock, and only acquire the lock if it isn't. Now the condvar_wake-on-an-empty-queue micro-benchmark (see next patch) takes less than 3ns - regardless of how many CPUs are doing it concurrently. Note that the queue head we test is NOT atomic, and we do not use any memory fences. If we read the queue head and see there 0, it is safe to decide nobody is waiting and do nothing. But if we read the queue head and see != 0, we can't do anything with the value we read - it might be only half-set (if the pointer is not atomic on this architecture) or be set but the value it points to is not (we didn't use a memory fence to enforce any ordering). So if we see the head is != 0, we need to acquire the lock (which also imposes the required memory visibility and ordering) and try again.
- Loading branch information