replication: Don't use majority rule for old entries #302

cole-miller · 2022-08-29T19:29:52Z

WIP. I expect some failing tests at first and will add further commits addressing them.

Closes #220.

Signed-off-by: Cole Miller [email protected]

cole-miller · 2022-08-30T18:58:48Z

I've been thinking about a different approach to this. In normal operation no entries are appended during term 1, because all nodes start out as followers and the first leader will have incremented its term to 2. If we could rely on this always being the case, we could omit the barrier at the beginning of term 2, because there are guaranteed to be no entries before that (except the dummy entry at index 0, which all nodes share). That would mean that all of the integration (and fuzzy) tests that only cover one election -- that is, most of them -- wouldn't need updating, because there wouldn't be an extra barrier entry in play. Convenient!

I tried implementing this, and ran into the problem that some of the tests write entries to the log before starting the cluster:

raft/test/integration/test_replication.c

Lines 460 to 481 in 24465ee

    
           TEST(replication, recvMissingEntries, setUp, tearDown, 0, NULL) 
        
           { 
        
               struct fixture *f = data; 
        
               struct raft_entry entry; 
        
               CLUSTER_BOOTSTRAP; 
        
               /* Server 0 has an entry that server 1 doesn't have */ 
        
               entry.type = RAFT_COMMAND; 
        
               entry.term = 1; 
        
               FsmEncodeSetX(1, &entry.buf); 
        
               CLUSTER_ADD_ENTRY(0, &entry); 
        
               /* Server 0 wins the election because it has a longer log. */ 
        
               CLUSTER_START; 
        
               CLUSTER_STEP_UNTIL_HAS_LEADER(5000); 
        
               munit_assert_int(CLUSTER_LEADER, ==, 0); 
        
               /* The first server replicates missing entries to the second. */ 
        
               CLUSTER_STEP_UNTIL_APPLIED(1, 2, 3000); 
        
               return MUNIT_OK; 
        
           }

When the cluster starts, the leader will have an entry from term 1, which breaks the reasoning above. I could modify this and other tests that write to the log directly, but the question remains whether we're okay imposing the "no entries in term 1" condition on all raft consumers. @MathieuBordere @freeekanayaka thoughts?

freeekanayaka · 2022-08-30T20:11:07Z

I've been thinking about a different approach to this. In normal operation no entries are appended during term 1, because all nodes start out as followers and the first leader will have incremented its term to 2. If we could rely on this always being the case, we could omit the barrier at the beginning of term 2, because there are guaranteed to be no entries before that (except the dummy entry at index 0, which all nodes share). That would mean that all of the integration (and fuzzy) tests that only cover one election -- that is, most of them -- wouldn't need updating, because there wouldn't be an extra barrier entry in play. Convenient!

I tried implementing this, and ran into the problem that some of the tests write entries to the log before starting the cluster:

raft/test/integration/test_replication.c

Lines 460 to 481 in 24465ee

TEST(replication, recvMissingEntries, setUp, tearDown, 0, NULL)

{

struct fixture *f = data;

struct raft_entry entry;

CLUSTER_BOOTSTRAP;

/* Server 0 has an entry that server 1 doesn't have */

entry.type = RAFT_COMMAND;

entry.term = 1;

FsmEncodeSetX(1, &entry.buf);

CLUSTER_ADD_ENTRY(0, &entry);

/* Server 0 wins the election because it has a longer log. */

CLUSTER_START;

CLUSTER_STEP_UNTIL_HAS_LEADER(5000);

munit_assert_int(CLUSTER_LEADER, ==, 0);

/* The first server replicates missing entries to the second. */

CLUSTER_STEP_UNTIL_APPLIED(1, 2, 3000);

return MUNIT_OK;

}

When the cluster starts, the leader will have an entry from term 1, which breaks the reasoning above. I could modify this and other tests that write to the log directly, but the question remains whether we're okay imposing the "no entries in term 1" condition on all raft consumers. @MathieuBordere @freeekanayaka thoughts?

I'm not sure what you mean with "consumers", basically the tests? Because as you point, in real world operation there should be no entries at term 1, for any real world "consumer".

That being said, if you are exploiting this property exclusively in order to fix a lot of tests, perhaps it'd be better to bite the bullet and fix the tests even if it's laborious. I don't have a clear idea of the type of failure that occurs tho, if you can paste an example that might help.

cole-miller · 2022-08-30T20:48:51Z

(GitHub ate this comment the first time around, ugh.)

I'm not sure what you mean with "consumers", basically the tests? Because as you point, in real world operation there should be no entries at term 1, for any real world "consumer".

Right, I guess I just wanted to check that I wasn't missing some way to smuggle in log entries at term 1 using the public API.

That being said, if you are exploiting this property exclusively in order to fix a lot of tests, perhaps it'd be better to bite the bullet and fix the tests even if it's laborious. I don't have a clear idea of the type of failure that occurs tho, if you can paste an example that might help.

That's fair, it's definitely a bit of a hack. The test failures that crop up when adding a barrier at term 2 are mostly in places where we call ASSERT_CONFIGURATION_INDICES, CLUSTER_STEP_UNTIL_APPLIED, or CLUSTER_LAST_APPLIED with hardcoded log indices -- those hardcoded values generally need to be bumped by 1 (or more, if the test covers further elections) to account for the barrier(s). membership/addNonVoting in test/fuzzy/test_membership.c is typical, here's the diff:

TEST(membership, addNonVoting, setup, tear_down, 0, _params)
{
    struct fixture *f = data;
    const struct raft_server *server;
    struct raft *raft;

    CLUSTER_ADD(&f->req);
-   CLUSTER_STEP_UNTIL_APPLIED(CLUSTER_LEADER, 2, 2000);
+   CLUSTER_STEP_UNTIL_APPLIED(CLUSTER_LEADER, 3, 2000);

    /* Then promote it. */
    CLUSTER_ASSIGN(&f->req, RAFT_STANDBY);

-   CLUSTER_STEP_UNTIL_APPLIED(CLUSTER_N, 3, 2000);
+   CLUSTER_STEP_UNTIL_APPLIED(CLUSTER_N, 4, 2000);

    raft = CLUSTER_RAFT(CLUSTER_LEADER);

    server = &raft->configuration.servers[CLUSTER_N - 1];
    munit_assert_int(server->id, ==, CLUSTER_N);
    return MUNIT_OK;
}

These changes aren't complicated (and I have Mathieu's old branch to work from), but I found it somewhat taxing to convince myself that each one was correct and that I wasn't missing any (due to tests passing spuriously), so cutting down the number of tests that needed updating seemed appealing. But you might be right that it's better to stick with the original approach.

codecov-commenter · 2022-08-30T21:20:19Z

Codecov Report

Merging #302 (5d3a689) into master (24465ee) will increase coverage by 0.84%.
The diff coverage is 81.81%.

@@            Coverage Diff             @@
##           master     #302      +/-   ##
==========================================
+ Coverage   82.82%   83.67%   +0.84%     
==========================================
  Files          49       49              
  Lines        8707     9243     +536     
  Branches     2181     2476     +295     
==========================================
+ Hits         7212     7734     +522     
- Misses        856      954      +98     
+ Partials      639      555      -84

Impacted Files	Coverage Δ
src/election.c	`84.10% <0.00%> (+2.16%)`	⬆️
src/convert.c	`86.56% <72.72%> (+3.08%)`	⬆️
src/replication.c	`76.44% <80.95%> (-0.46%)`	⬇️
src/fixture.c	`92.60% <100.00%> (+0.59%)`	⬆️
src/snapshot.c	`92.30% <0.00%> (-3.53%)`	⬇️
src/uv_work.c	`75.55% <0.00%> (-3.02%)`	⬇️
src/uv_writer.c	`77.44% <0.00%> (-3.00%)`	⬇️
src/entry.c	`86.00% <0.00%> (-2.89%)`	⬇️
src/start.c	`75.72% <0.00%> (-2.85%)`	⬇️
src/uv_prepare.c	`84.43% <0.00%> (-1.65%)`	⬇️
... and 39 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

cole-miller · 2022-08-30T21:44:37Z

Okay, the latest version of this branch has the term 2 barrier and passes tests. Definitely would benefit from a second (and third) pair of eyes on it. I also would like to add at least one test to check that the new barriers are doing their job.

@MathieuBordere mentioned that the dqlite test suite will also need updating -- I'll get on that tomorrow.

MathieuBordere · 2022-08-31T07:00:58Z

So yes, I remember there being a non-trivial issue in the workings of dqlite (or go-dqlite). We better not merge this until we figure out again what it was, I will convert it to draft temporarily so we can't merge it by accident because e.g. LXD builds from the raft master branch and we don't want to start breaking tests if we can avoid it.

freeekanayaka · 2022-08-31T09:02:41Z

Ok, I believe I understand a little bit better now. I didn't realize that the need to change the tests is due (at least in part) by the newly introduced barrier.

Although it's true that the raft paper says that a leader should commit a no-op entry at the beginning of its term, I think this not really a hard requirement, as its purpose is only to find out what the last committed index is. There are cases were passively waiting for a new real-world entry to be submitted is enough for the consumer.

In our raft implementation that choice in some sense is left to the consumer of the library. The dqlite consumer already takes care of applying a barrier if needed, see the needsBarrier function in leader.c.

Unless I'm missing something, we could just keep this approach and still be 100% safe and correct.

What happens to the tests if you remove the barrier call that you introduce?

I'd suggest to remove the barrier from the code, and possibly add a defdicated barrier to the tests that need it for some reason, or something along those lines.

It might be that basically all the tests you've touched here need that barrier, but that would be fine. We make that explicit, in the same way we expect our consumers to be explicit about that (as dqlite is).

MathieuBordere · 2022-08-31T10:10:19Z

Ok, I believe I understand a little bit better now. I didn't realize that the need to change the tests is due (at least in part) by the newly introduced barrier.

Although it's true that the raft paper says that a leader should commit a no-op entry at the beginning of its term, I think this not really a hard requirement, as its purpose is only to find out what the last committed index is. There are cases were passively waiting for a new real-world entry to be submitted is enough for the consumer.

In our raft implementation that choice in some sense is left to the consumer of the library. The dqlite consumer already takes care of applying a barrier if needed, see the needsBarrier function in leader.c.

Unless I'm missing something, we could just keep this approach and still be 100% safe and correct.

What happens to the tests if you remove the barrier call that you introduce?

I'd suggest to remove the barrier from the code, and possibly add a defdicated barrier to the tests that need it for some reason, or something along those lines.

It might be that basically all the tests you've touched here need that barrier, but that would be fine. We make that explicit, in the same way we expect our consumers to be explicit about that (as dqlite is).

I think we hit the issue in Figure 8 of raft paper even with the dqlite barriers in place.

In step c) of Figure 8 imagine we are replicating the barrier entry with ID 4 at index 3. But once server 3 has finished replicating entry with ID 2 at index 2, we will commit it and it's an error.

I mean it's possible that those entries are barrier entries, and they are harmless, but it sure becomes tricky, I don't think it's a good idea to leave it to the user to use barriers where appropriate.

edit: In Figure 8 step (b) server 5 won't have requested a barrier, because last_applied == last_log_index, so entry with ID 3 could be a write transaction.

edit2: In Figure 8 log entry ID 2 could also be a non-barrier entry I think.

freeekanayaka · 2022-08-31T15:53:55Z

Ok, I believe I understand a little bit better now. I didn't realize that the need to change the tests is due (at least in part) by the newly introduced barrier.
Although it's true that the raft paper says that a leader should commit a no-op entry at the beginning of its term, I think this not really a hard requirement, as its purpose is only to find out what the last committed index is. There are cases were passively waiting for a new real-world entry to be submitted is enough for the consumer.
In our raft implementation that choice in some sense is left to the consumer of the library. The dqlite consumer already takes care of applying a barrier if needed, see the needsBarrier function in leader.c.
Unless I'm missing something, we could just keep this approach and still be 100% safe and correct.
What happens to the tests if you remove the barrier call that you introduce?
I'd suggest to remove the barrier from the code, and possibly add a defdicated barrier to the tests that need it for some reason, or something along those lines.
It might be that basically all the tests you've touched here need that barrier, but that would be fine. We make that explicit, in the same way we expect our consumers to be explicit about that (as dqlite is).

I think we hit the issue in Figure 8 of raft paper even with the dqlite barriers in place.

In step c) of Figure 8 imagine we are replicating the barrier entry with ID 4 at index 3. But once server 3 has finished replicating entry with ID 2 at index 2, we will commit it and it's an error.

I mean it's possible that those entries are barrier entries, and they are harmless, but it sure becomes tricky, I don't think it's a good idea to leave it to the user to use barriers where appropriate.

edit: In Figure 8 step (b) server 5 won't have requested a barrier, because last_applied == last_log_index, so entry with ID 3 could be a write transaction.

edit2: In Figure 8 log entry ID 2 could also be a non-barrier entry I think.

I don't fully understand your argument here. Strictly speaking the barrier has no effect on commitment, in the sense that with the logic that @cole-miller has put in place in this branch no entry (either regular or barrier) will ever be committed if it's from an older term (unless of course there's an entry from the current term that is committed). So none of the scenarios in Figure 8 can happen, I think? Regardless of whether you use barriers or not.

Barriers are useful basically if want/need to "block" until you know what the latest commit index is. This is not always a requirement, for example I believe it's not for dqlite.

In your example, if in step (b) server 5 does not request a barrier, because last_applied == last_log_index, and the entry with ID 3 is a write transaction, that's perfectly fine. Server 5 is (or believe it is) the leader, and from its point of view there can't be any committed entry that hasn't yet been applied to the FSM (because last_applied == last_log_index), that means that the FSM is at its latest state and it's legitimate to start a write transaction against it. The worst that can happen is that the entry for the write transaction does not get committed either because server 5 crashes or loses leadership.

The last_applied == last_log_index check is there just to be sure that the FSM is at it latest state from the point of view of the leader that is initiating the write transaction (otherwise you'd run it against an outdated database).

That is independent from the fix in this branch that basically just delays commitment for entries from older terms.

For instance, with this fix in place, in step 2, the last_applied == last_log_index check would certainly fail, because entry 2 at index 2 can't possibly be committed (since it's from an older term) and so the FSM is behind the leader's latest index. In that case the leader would request a barrier, which would cause entry 2 at index 2 to be committed (once the barrier gets replicated to a majority) and at that point the FSM is recent enough that a write transaction can be started.

Unless I'm missing something.

MathieuBordere · 2022-08-31T15:57:49Z

Ok, I believe I understand a little bit better now. I didn't realize that the need to change the tests is due (at least in part) by the newly introduced barrier.
Although it's true that the raft paper says that a leader should commit a no-op entry at the beginning of its term, I think this not really a hard requirement, as its purpose is only to find out what the last committed index is. There are cases were passively waiting for a new real-world entry to be submitted is enough for the consumer.
In our raft implementation that choice in some sense is left to the consumer of the library. The dqlite consumer already takes care of applying a barrier if needed, see the needsBarrier function in leader.c.
Unless I'm missing something, we could just keep this approach and still be 100% safe and correct.
What happens to the tests if you remove the barrier call that you introduce?
I'd suggest to remove the barrier from the code, and possibly add a defdicated barrier to the tests that need it for some reason, or something along those lines.
It might be that basically all the tests you've touched here need that barrier, but that would be fine. We make that explicit, in the same way we expect our consumers to be explicit about that (as dqlite is).

I think we hit the issue in Figure 8 of raft paper even with the dqlite barriers in place.
In step c) of Figure 8 imagine we are replicating the barrier entry with ID 4 at index 3. But once server 3 has finished replicating entry with ID 2 at index 2, we will commit it and it's an error.
I mean it's possible that those entries are barrier entries, and they are harmless, but it sure becomes tricky, I don't think it's a good idea to leave it to the user to use barriers where appropriate.
edit: In Figure 8 step (b) server 5 won't have requested a barrier, because last_applied == last_log_index, so entry with ID 3 could be a write transaction.
edit2: In Figure 8 log entry ID 2 could also be a non-barrier entry I think.

I don't fully understand your argument here. Strictly speaking the barrier has no effect on commitment, in the sense that with the logic that @cole-miller has put in place in this branch no entry (either regular or barrier) will ever be committed if it's from an older term (unless of course there's an entry from the current term that is committed). So none of the scenarios in Figure 8 can happen, I think? Regardless of whether you use barriers or not.

Barriers are useful basically if want/need to "block" until you know what the latest commit index is. This is not always a requirement, for example I believe it's not for dqlite.

In your example, if in step (b) server 5 does not request a barrier, because last_applied == last_log_index, and the entry with ID 3 is a write transaction, that's perfectly fine. Server 5 is (or believe it is) the leader, and from its point of view there can't be any committed entry that hasn't yet been applied to the FSM (because last_applied == last_log_index), that means that the FSM is at its latest state and it's legitimate to start a write transaction against it. The worst that can happen is that the entry for the write transaction does not get committed either because server 5 crashes or loses leadership.

The last_applied == last_log_index check is there just to be sure that the FSM is at it latest state from the point of view of the leader that is initiating the write transaction (otherwise you'd run it against an outdated database).

That is independent from the fix in this branch that basically just delays commitment for entries from older terms.

For instance, with this fix in place, in step 2, the last_applied == last_log_index check would certainly fail, because entry 2 at index 2 can't possibly be committed (since it's from an older term) and so the FSM is behind the leader's latest index. In that case the leader would request a barrier, which would cause entry 2 at index 2 to be committed (once the barrier gets replicated to a majority) and at that point the FSM is recent enough that a write transaction can be started.

Unless I'm missing something.

Oh sorry I didn't understand you then, yes with this fix in place Figure 8 can't happen, I understood that you thought this fix wasn't needed.

edit: After rereading I know understand that you only meant that the no-op barrier might be not needed upon election.

freeekanayaka · 2022-08-31T16:19:01Z

Oh sorry I didn't understand you then, yes with this fix in place Figure 8 can't happen, I understood that you thought this fix wasn't needed.

Ah right, just a misunderstanding then :)

edit: After rereading I know understand that you only meant that the no-op barrier might be not needed upon election.

Exactly, I think it's not needed and to me it feels that it's actually better to leave it to consumers to decide what to do exactly, based on their requirements (as dqlite does).

cole-miller · 2022-08-31T16:53:28Z

Where in the code does that last_applied == last_log_index check live?

freeekanayaka · 2022-08-31T19:02:01Z

needsBarrier function in leader.c

See the needsBarrier function in leader.c in dqlite.

cole-miller · 2022-08-31T20:05:29Z

Thanks! Let me check my understanding. Currently, our raft leaders (1) will commit entries from previous terms by majority, and (2) don't automatically append a barrier entry at the beginning of their terms. (1) is definitely a bug and is fixed by changing the logic of replicationQuorum. (2) is more of a gray area. With the fix for (1), leaders won't try to commit any outstanding entries from previous terms until an entry from the current term is appended to their log. That could be taken care of by an automatic barrier in raft, or we could require raft consumers to introduce their own barriers as needed (which dqlite already does).

Assuming that's correct… my preference would be for raft to handle (2) internally with an automatic barrier, rather than pushing it onto consumers. It seems like the needsBarrier logic will be the same for every consumer, so why not avoid the duplication by adopting the whitepaper's strategy here?

freeekanayaka · 2022-09-01T06:44:16Z

Thanks! Let me check my understanding. Currently, our raft leaders (1) will commit entries from previous terms by majority, and (2) don't automatically append a barrier entry at the beginning of their terms. (1) is definitely a bug and is fixed by changing the logic of replicationQuorum. (2) is more of a gray area. With the fix for (1), leaders won't try to commit any outstanding entries from previous terms until an entry from the current term is appended to their log. That could be taken care of by an automatic barrier in raft, or we could require raft consumers to introduce their own barriers as needed (which dqlite already does).

Assuming that's correct…

It is correct! Slight amend: "the leader won't try to commit any outstanding entries from previous terms until an entry from the current term is appended to their log and replicated to a majority of nodes (at that point it commits that entry and all the outstanding ones from previous terms)"

my preference would be for raft to handle (2) internally with an automatic barrier, rather than pushing it onto consumers. It seems like the needsBarrier logic will be the same for every consumer, so why not avoid the duplication by adopting the whitepaper's strategy here?

What I was trying to say is that the needsBarrier logic won't be the same for every consumer, because it really depends on your application. For example, the automatic barrier in this branch is not equivalent to the one that dqlite runs, because the barrier in this branch runs every time a server becomes leader, the one in dqlite a is bit more subtle and runs in slightly different although overlapping situations. Even if you put the automatic barrier here, you'd still need the one in dqlite (but not the other way round, you can remove the automatic barrier here, leave the one in dqlite and be safe). I can elaborate on that if it's not clear, but see my other comment above about the scenario in figure 8 of the raft paper.

cole-miller · 2022-09-01T18:31:41Z

Even if you put the automatic barrier here, you'd still need the one in dqlite

Ah, this might be what I'm missing -- why is that the case?

freeekanayaka · 2022-09-02T07:31:39Z

Even if you put the automatic barrier here, you'd still need the one in dqlite

Ah, this might be what I'm missing -- why is that the case?

Basically because you don't want to start any transaction against an out-dated FSM, and a new leader is only one of the cases where it can happen (there could simply be another operation that you are waiting for).

freeekanayaka · 2022-09-02T07:54:50Z

Note that the general point is still that having the leader commit a no-op at the beginning of its term is not a requirement for all applications. It's not for dqlite because if the leader's FSM is recent enough, there's no need to commit a no-op, a regular transaction entry can be proposed immediately in a safe way.

calvin2021y · 2022-09-02T10:08:32Z

Hi, @freeekanayaka

thanks for the explain.

It's not for dqlite because if the leader's FSM is recent enough, there's no need to commit a no-op, a regular transaction entry can be proposed immediately in a safe way.

how to know the FSM is recent enough? (dqlite from new leader finish a read transaction instead no-op-log) ?

correct me if I am wrong, without finish at least one log, new leader will not able to know the FSM is fresh?

freeekanayaka · 2022-09-02T12:06:57Z

Thanks for explain, I think what you mean is the case leader already know there is log committed in his term.

for example: in this case leader (at term 2) check the last committed log id (for example term=1, index=3), and this log match his local record(his biggest know log index).

in this case leader can be sure the FSM is updated ?

It can be sure that there is no index greater than 3 that is waiting to be committed by this leader itself. For dqlite's needs that's enough, and a new transaction index can be created without any need of a no-op in between.

calvin2021y · 2022-09-02T12:22:40Z

Thanks very much for the explain.

So I guess all other case for a leader at his new term (no log commit in this term yet), a no-log is needed.

Signed-off-by: Cole Miller <[email protected]>

cole-miller · 2022-09-02T19:22:13Z

Okay, I've pushed an updated branch without the automatic barrier. A few tests needed to be updated because they were relying on the old behavior. Still need to test this branch against dqlite and go-dqlite before it's ready to merge.

Signed-off-by: Cole Miller <[email protected]>

cole-miller · 2022-09-02T22:04:25Z

dqlite tests are passing, but there's an issue with go-dqlite/test/roles.sh -- I'm looking into it, probably just needs an explicit barrier somewhere.

cole-miller · 2022-09-07T18:05:58Z

I think I've finally figured out what's going on with go-dqlite's roles test. We have a leader that does the following:

promote another node from spare to voter
pick a voter and transfer leadership to it
take itself offline

And the problem is that the config change log entry from (1) gets replicated and committed by the original leader, but it doesn't have the chance to communicate the new commit index to its successsor before dropping out (3). The new leader doesn't apply any kind of barrier, so with the new quorum rules the config change just sticks around uncommitted and prevents any other config changes from getting started.

A narrow fix for this would be to update the raft_timeout_now RPC to include the leader's commit index (like the raft_append_entries RPC already does). I've implemented that locally and it's enough for the roles.sh test to pass. But it has some holes -- what if the old leader just crashed without sending raft_timeout_now? And I'm not confident that there aren't more places where we're implicitly relying on some old entries (in particular, RAFT_CHANGE entries) to be committed "on their own". I'm coming back around to the view that an automatic barrier at the beginning of every term is the safe way to deal with this, but short of that, maybe we could have leaders apply a barrier when they start their term with configuration_uncommitted_index != 0?

cole-miller · 2022-09-07T18:37:40Z

Ah, there's another new test failure with this branch, TestIntegration_HighAvailability in go-dqlite/driver. I'll look into that one.

cole-miller · 2022-09-07T19:38:30Z

Ah, there's another new test failure with this branch, TestIntegration_HighAvailability in go-dqlite/driver. I'll look into that one.

This was easier to figure out. The test looks like this:

https://github.com/canonical/go-dqlite/blob/c510f9c4547598bf13b4d1cf0ac96401efc6591b/driver/integration_test.go#L240-L263

The second db.Exec(...) fails at the sqlite3 call here:

https://github.com/canonical/dqlite/blob/98efc202e1c32af38b8c3ae39548bc8a69d96a5f/src/gateway.c#L526

That runs before we call leader__barrier, so with the quorum fix it's not guaranteed that the first db.Exec(...) is committed everywhere. The dqlite-only fix would be to call leader__barrier earlier in handle_exec_sql (and friends).

Edit: oh, this is also canonical/dqlite#210, right?

MathieuBordere · 2022-09-08T06:45:52Z

I also get the following failure with this branch.

(base) mathieu@anna:~/code/go-dqlite/driver (master)$ go test -run "TestIntegration_LeadershipTransfer"
--- FAIL: TestIntegration_LeadershipTransfer (2.21s)
    func.go:15: DEBUG: attempt 1: server @1: connected
    func.go:15: DEBUG: leadership lost (10250 - not leader)
    func.go:15: DEBUG: attempt 1: server @1: connect to reported leader @2
    func.go:15: DEBUG: attempt 1: server @1: connected
    integration_test.go:276: 
                Error Trace:    integration_test.go:276
                Error:          Received unexpected error:
                                no such table: test
                Test:           TestIntegration_LeadershipTransfer
FAIL
exit status 1
FAIL    github.com/canonical/go-dqlite/driver   4.450s

This failure also occurs without the quorum change though

MathieuBordere · 2022-09-08T06:51:08Z

... but short of that, maybe we could have leaders apply a barrier when they start their term with configuration_uncommitted_index != 0?

I think that should take care of it.

MathieuBordere · 2022-09-08T07:11:02Z

Ah, there's another new test failure with this branch, TestIntegration_HighAvailability in go-dqlite/driver. I'll look into that one.

This was easier to figure out. The test looks like this:

https://github.com/canonical/go-dqlite/blob/c510f9c4547598bf13b4d1cf0ac96401efc6591b/driver/integration_test.go#L240-L263

The second db.Exec(...) fails at the sqlite3 call here:

https://github.com/canonical/dqlite/blob/98efc202e1c32af38b8c3ae39548bc8a69d96a5f/src/gateway.c#L526

That runs before we call leader__barrier, so with the quorum fix it's not guaranteed that the first db.Exec(...) is committed everywhere. The dqlite-only fix would be to call leader__barrier earlier in handle_exec_sql (and friends).

Edit: oh, this is also canonical/dqlite#210, right?

I did some work on this in the past, never finished though -> https://github.com/MathieuBordere/dqlite/commits/commit-previous-term It could give you some inspiration in case you run against issues.

cole-miller · 2022-09-08T13:44:08Z

@MathieuBordere

TestIntegration_LeadershipTransfer [...] This failure also occurs without the quorum change though

Yep, looking at the test this seems to be another case of needing a barrier before sqlite3_prepare_v2.

cole-miller · 2022-09-15T15:32:28Z

I'm going to add some tests in this PR that are able to detect the misbehavior we're trying to fix, taking inspiration from the go-dqlite test failures I was diagnosing last week.

Signed-off-by: Cole Miller <[email protected]>

cole-miller · 2022-09-16T21:21:30Z

Added a regression test -- it's a bit awkward, but does pass on this branch and fail on master (you can cherry-pick the commit to confirm this).

cole-miller · 2022-10-06T20:25:02Z

@freeekanayaka, any new thoughts on the basic approach here? I'm asking because you said in #311 that you were rethinking how we might want to do barriers generally.

freeekanayaka · 2022-10-10T21:16:34Z

@freeekanayaka, any new thoughts on the basic approach here? I'm asking because you said in #311 that you were rethinking how we might want to do barriers generally.

I haven't been able to go through this yet, sorry about that. I understand it might be becoming kind of high-prio on your side, if so, please by all means don't block on me. I still plan to give a look at this, but have been busy on other fronts.

cole-miller · 2022-12-08T14:17:10Z

Mathieu is going to pick this up and implement the "automatic barrier at the beginning of every term" strategy, since it's relevant for his work on fixing #250. dqlite will still run its own barriers in the middle of a term to ensure that SQL requests don't run against a stale database. I'm going to close some other PRs that will be unnecessary once the automatic barrier is in place.

MathieuBordere · 2022-12-08T14:19:48Z

Mathieu is going to pick this up and implement the "automatic barrier at the beginning of every term" strategy, since it's relevant for his work on fixing #250. dqlite will still runs its own barriers in the middle of a term to ensure that SQL requests don't run against a stale database. I'm going to close some other PRs that will be unnecessary once the automatic barrier is in place.

It is needed to determine if a configuration loaded from disk at startup is committed or not as we can't know. Because we can't know, we have to set configuration_uncommitted_index and to clear it, we have to try and commit a log entry.

MathieuBordere · 2022-12-13T13:51:25Z

Replaced by #336

cole-miller force-pushed the quorum branch 6 times, most recently from 4c5f25c to 8ff7e21 Compare August 29, 2022 20:53

cole-miller force-pushed the quorum branch from 2b1d9e3 to 4edf942 Compare August 30, 2022 21:17

cole-miller force-pushed the quorum branch 2 times, most recently from 1958bb6 to 5d3a689 Compare August 30, 2022 21:37

cole-miller marked this pull request as ready for review August 30, 2022 21:43

MathieuBordere marked this pull request as draft August 31, 2022 07:01

replication: Don't use majority rule for old entries

d0f2bc9

Signed-off-by: Cole Miller <[email protected]>

cole-miller force-pushed the quorum branch from 5d3a689 to 3bc529e Compare September 2, 2022 19:20

Update tests to use a barrier where required

b994983

Signed-off-by: Cole Miller <[email protected]>

cole-miller force-pushed the quorum branch from 3bc529e to b994983 Compare September 2, 2022 20:09

cole-miller marked this pull request as ready for review September 2, 2022 21:30

cole-miller marked this pull request as draft September 2, 2022 21:30

cole-miller mentioned this pull request Sep 9, 2022

Add barriers before calling sqlite3_prepare_v2 canonical/dqlite#395

Merged

11 tasks

This was referenced Sep 15, 2022

Figure out why raft quorum bug wasn't detected canonical/jepsen.dqlite#28

Open

Run a barrier before config changes when required canonical/dqlite#402

Closed

test: Add regression test for old entries

70b479f

Signed-off-by: Cole Miller <[email protected]>

cole-miller mentioned this pull request Dec 8, 2022

Add getters for a few additional bits of state #311

Closed

MathieuBordere closed this Dec 13, 2022

replication: Don't use majority rule for old entries #302

replication: Don't use majority rule for old entries #302

Conversation

cole-miller commented Aug 29, 2022 • edited Loading

cole-miller commented Aug 30, 2022

freeekanayaka commented Aug 30, 2022

cole-miller commented Aug 30, 2022 • edited Loading

codecov-commenter commented Aug 30, 2022 • edited Loading

Codecov Report

cole-miller commented Aug 30, 2022 • edited Loading

MathieuBordere commented Aug 31, 2022 • edited Loading

freeekanayaka commented Aug 31, 2022 • edited Loading

MathieuBordere commented Aug 31, 2022 • edited Loading

freeekanayaka commented Aug 31, 2022

MathieuBordere commented Aug 31, 2022 • edited Loading

freeekanayaka commented Aug 31, 2022

cole-miller commented Aug 31, 2022

freeekanayaka commented Aug 31, 2022

cole-miller commented Aug 31, 2022 • edited Loading

freeekanayaka commented Sep 1, 2022

cole-miller commented Sep 1, 2022

freeekanayaka commented Sep 2, 2022

freeekanayaka commented Sep 2, 2022

calvin2021y commented Sep 2, 2022

freeekanayaka commented Sep 2, 2022

calvin2021y commented Sep 2, 2022

cole-miller commented Sep 2, 2022

cole-miller commented Sep 2, 2022

cole-miller commented Sep 7, 2022

cole-miller commented Sep 7, 2022

cole-miller commented Sep 7, 2022 • edited Loading

MathieuBordere commented Sep 8, 2022 • edited Loading

MathieuBordere commented Sep 8, 2022

MathieuBordere commented Sep 8, 2022

cole-miller commented Sep 8, 2022

cole-miller commented Sep 15, 2022

cole-miller commented Sep 16, 2022

cole-miller commented Oct 6, 2022

freeekanayaka commented Oct 10, 2022

cole-miller commented Dec 8, 2022 • edited Loading

MathieuBordere commented Dec 8, 2022

MathieuBordere commented Dec 13, 2022

cole-miller commented Aug 29, 2022 •

edited

Loading

cole-miller commented Aug 30, 2022 •

edited

Loading

codecov-commenter commented Aug 30, 2022 •

edited

Loading

cole-miller commented Aug 30, 2022 •

edited

Loading

MathieuBordere commented Aug 31, 2022 •

edited

Loading

freeekanayaka commented Aug 31, 2022 •

edited

Loading

MathieuBordere commented Aug 31, 2022 •

edited

Loading

MathieuBordere commented Aug 31, 2022 •

edited

Loading

cole-miller commented Aug 31, 2022 •

edited

Loading

cole-miller commented Sep 7, 2022 •

edited

Loading

MathieuBordere commented Sep 8, 2022 •

edited

Loading

cole-miller commented Dec 8, 2022 •

edited

Loading