-
Notifications
You must be signed in to change notification settings - Fork 137
replication: Don't use majority rule for old entries #302
Conversation
4c5f25c
to
8ff7e21
Compare
I've been thinking about a different approach to this. In normal operation no entries are appended during term 1, because all nodes start out as followers and the first leader will have incremented its term to 2. If we could rely on this always being the case, we could omit the barrier at the beginning of term 2, because there are guaranteed to be no entries before that (except the dummy entry at index 0, which all nodes share). That would mean that all of the integration (and fuzzy) tests that only cover one election -- that is, most of them -- wouldn't need updating, because there wouldn't be an extra barrier entry in play. Convenient! I tried implementing this, and ran into the problem that some of the tests write entries to the log before starting the cluster: raft/test/integration/test_replication.c Lines 460 to 481 in 24465ee
When the cluster starts, the leader will have an entry from term 1, which breaks the reasoning above. I could modify this and other tests that write to the log directly, but the question remains whether we're okay imposing the "no entries in term 1" condition on all raft consumers. @MathieuBordere @freeekanayaka thoughts? |
I'm not sure what you mean with "consumers", basically the tests? Because as you point, in real world operation there should be no entries at term 1, for any real world "consumer". That being said, if you are exploiting this property exclusively in order to fix a lot of tests, perhaps it'd be better to bite the bullet and fix the tests even if it's laborious. I don't have a clear idea of the type of failure that occurs tho, if you can paste an example that might help. |
(GitHub ate this comment the first time around, ugh.)
Right, I guess I just wanted to check that I wasn't missing some way to smuggle in log entries at term 1 using the public API.
That's fair, it's definitely a bit of a hack. The test failures that crop up when adding a barrier at term 2 are mostly in places where we call TEST(membership, addNonVoting, setup, tear_down, 0, _params)
{
struct fixture *f = data;
const struct raft_server *server;
struct raft *raft;
CLUSTER_ADD(&f->req);
- CLUSTER_STEP_UNTIL_APPLIED(CLUSTER_LEADER, 2, 2000);
+ CLUSTER_STEP_UNTIL_APPLIED(CLUSTER_LEADER, 3, 2000);
/* Then promote it. */
CLUSTER_ASSIGN(&f->req, RAFT_STANDBY);
- CLUSTER_STEP_UNTIL_APPLIED(CLUSTER_N, 3, 2000);
+ CLUSTER_STEP_UNTIL_APPLIED(CLUSTER_N, 4, 2000);
raft = CLUSTER_RAFT(CLUSTER_LEADER);
server = &raft->configuration.servers[CLUSTER_N - 1];
munit_assert_int(server->id, ==, CLUSTER_N);
return MUNIT_OK;
} These changes aren't complicated (and I have Mathieu's old branch to work from), but I found it somewhat taxing to convince myself that each one was correct and that I wasn't missing any (due to tests passing spuriously), so cutting down the number of tests that needed updating seemed appealing. But you might be right that it's better to stick with the original approach. |
Codecov Report
@@ Coverage Diff @@
## master #302 +/- ##
==========================================
+ Coverage 82.82% 83.67% +0.84%
==========================================
Files 49 49
Lines 8707 9243 +536
Branches 2181 2476 +295
==========================================
+ Hits 7212 7734 +522
- Misses 856 954 +98
+ Partials 639 555 -84
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
1958bb6
to
5d3a689
Compare
Okay, the latest version of this branch has the term 2 barrier and passes tests. Definitely would benefit from a second (and third) pair of eyes on it. I also would like to add at least one test to check that the new barriers are doing their job. @MathieuBordere mentioned that the dqlite test suite will also need updating -- I'll get on that tomorrow. |
So yes, I remember there being a non-trivial issue in the workings of dqlite (or go-dqlite). We better not merge this until we figure out again what it was, I will convert it to draft temporarily so we can't merge it by accident because e.g. LXD builds from the raft master branch and we don't want to start breaking tests if we can avoid it. |
Ok, I believe I understand a little bit better now. I didn't realize that the need to change the tests is due (at least in part) by the newly introduced barrier. Although it's true that the raft paper says that a leader should commit a no-op entry at the beginning of its term, I think this not really a hard requirement, as its purpose is only to find out what the last committed index is. There are cases were passively waiting for a new real-world entry to be submitted is enough for the consumer. In our raft implementation that choice in some sense is left to the consumer of the library. The dqlite consumer already takes care of applying a barrier if needed, see the Unless I'm missing something, we could just keep this approach and still be 100% safe and correct. What happens to the tests if you remove the barrier call that you introduce? I'd suggest to remove the barrier from the code, and possibly add a defdicated barrier to the tests that need it for some reason, or something along those lines. It might be that basically all the tests you've touched here need that barrier, but that would be fine. We make that explicit, in the same way we expect our consumers to be explicit about that (as dqlite is). |
I think we hit the issue in In step I mean it's possible that those entries are barrier entries, and they are harmless, but it sure becomes tricky, I don't think it's a good idea to leave it to the user to use barriers where appropriate. edit: In edit2: In |
I don't fully understand your argument here. Strictly speaking the barrier has no effect on commitment, in the sense that with the logic that @cole-miller has put in place in this branch no entry (either regular or barrier) will ever be committed if it's from an older term (unless of course there's an entry from the current term that is committed). So none of the scenarios in Figure 8 can happen, I think? Regardless of whether you use barriers or not. Barriers are useful basically if want/need to "block" until you know what the latest commit index is. This is not always a requirement, for example I believe it's not for dqlite. In your example, if in step The That is independent from the fix in this branch that basically just delays commitment for entries from older terms. For instance, with this fix in place, in step 2, the Unless I'm missing something. |
Oh sorry I didn't understand you then, yes with this fix in place Figure 8 can't happen, I understood that you thought this fix wasn't needed. edit: After rereading I know understand that you only meant that the no-op barrier might be not needed upon election. |
Ah right, just a misunderstanding then :)
Exactly, I think it's not needed and to me it feels that it's actually better to leave it to consumers to decide what to do exactly, based on their requirements (as dqlite does). |
Where in the code does that |
See the |
Thanks! Let me check my understanding. Currently, our raft leaders (1) will commit entries from previous terms by majority, and (2) don't automatically append a barrier entry at the beginning of their terms. (1) is definitely a bug and is fixed by changing the logic of Assuming that's correct… my preference would be for raft to handle (2) internally with an automatic barrier, rather than pushing it onto consumers. It seems like the |
It is correct! Slight amend: "the leader won't try to commit any outstanding entries from previous terms until an entry from the current term is appended to their log and replicated to a majority of nodes (at that point it commits that entry and all the outstanding ones from previous terms)"
What I was trying to say is that the needsBarrier logic won't be the same for every consumer, because it really depends on your application. For example, the automatic barrier in this branch is not equivalent to the one that dqlite runs, because the barrier in this branch runs every time a server becomes leader, the one in dqlite a is bit more subtle and runs in slightly different although overlapping situations. Even if you put the automatic barrier here, you'd still need the one in dqlite (but not the other way round, you can remove the automatic barrier here, leave the one in dqlite and be safe). I can elaborate on that if it's not clear, but see my other comment above about the scenario in figure 8 of the raft paper. |
Ah, this might be what I'm missing -- why is that the case? |
Basically because you don't want to start any transaction against an out-dated FSM, and a new leader is only one of the cases where it can happen (there could simply be another operation that you are waiting for). |
Note that the general point is still that having the leader commit a no-op at the beginning of its term is not a requirement for all applications. It's not for dqlite because if the leader's FSM is recent enough, there's no need to commit a no-op, a regular transaction entry can be proposed immediately in a safe way. |
Hi, @freeekanayaka thanks for the explain.
how to know the FSM is recent enough? (dqlite from new leader finish a read transaction instead no-op-log) ? correct me if I am wrong, without finish at least one log, new leader will not able to know the FSM is fresh? |
It can be sure that there is no index greater than 3 that is waiting to be committed by this leader itself. For dqlite's needs that's enough, and a new transaction index can be created without any need of a no-op in between. |
Thanks very much for the explain. So I guess all other case for a leader at his new term (no log commit in this term yet), a no-log is needed. |
Signed-off-by: Cole Miller <[email protected]>
Okay, I've pushed an updated branch without the automatic barrier. A few tests needed to be updated because they were relying on the old behavior. Still need to test this branch against dqlite and go-dqlite before it's ready to merge. |
Signed-off-by: Cole Miller <[email protected]>
dqlite tests are passing, but there's an issue with go-dqlite/test/roles.sh -- I'm looking into it, probably just needs an explicit barrier somewhere. |
I think I've finally figured out what's going on with go-dqlite's roles test. We have a leader that does the following:
And the problem is that the config change log entry from (1) gets replicated and committed by the original leader, but it doesn't have the chance to communicate the new commit index to its successsor before dropping out (3). The new leader doesn't apply any kind of barrier, so with the new quorum rules the config change just sticks around uncommitted and prevents any other config changes from getting started. A narrow fix for this would be to update the raft_timeout_now RPC to include the leader's commit index (like the raft_append_entries RPC already does). I've implemented that locally and it's enough for the roles.sh test to pass. But it has some holes -- what if the old leader just crashed without sending raft_timeout_now? And I'm not confident that there aren't more places where we're implicitly relying on some old entries (in particular, RAFT_CHANGE entries) to be committed "on their own". I'm coming back around to the view that an automatic barrier at the beginning of every term is the safe way to deal with this, but short of that, maybe we could have leaders apply a barrier when they start their term with configuration_uncommitted_index != 0? |
Ah, there's another new test failure with this branch, TestIntegration_HighAvailability in go-dqlite/driver. I'll look into that one. |
This was easier to figure out. The test looks like this: The second That runs before we call Edit: oh, this is also canonical/dqlite#210, right? |
I also get the following failure with this branch.
|
I think that should take care of it. |
I did some work on this in the past, never finished though -> https://github.com/MathieuBordere/dqlite/commits/commit-previous-term It could give you some inspiration in case you run against issues. |
Yep, looking at the test this seems to be another case of needing a barrier before sqlite3_prepare_v2. |
I'm going to add some tests in this PR that are able to detect the misbehavior we're trying to fix, taking inspiration from the go-dqlite test failures I was diagnosing last week. |
Signed-off-by: Cole Miller <[email protected]>
Added a regression test -- it's a bit awkward, but does pass on this branch and fail on master (you can cherry-pick the commit to confirm this). |
@freeekanayaka, any new thoughts on the basic approach here? I'm asking because you said in #311 that you were rethinking how we might want to do barriers generally. |
I haven't been able to go through this yet, sorry about that. I understand it might be becoming kind of high-prio on your side, if so, please by all means don't block on me. I still plan to give a look at this, but have been busy on other fronts. |
Mathieu is going to pick this up and implement the "automatic barrier at the beginning of every term" strategy, since it's relevant for his work on fixing #250. dqlite will still run its own barriers in the middle of a term to ensure that SQL requests don't run against a stale database. I'm going to close some other PRs that will be unnecessary once the automatic barrier is in place. |
It is needed to determine if a configuration loaded from disk at startup is committed or not as we can't know. Because we can't know, we have to set |
Replaced by #336 |
WIP. I expect some failing tests at first and will add further commits addressing them.
Closes #220.
Signed-off-by: Cole Miller [email protected]