Skip to content
This repository has been archived by the owner on Mar 4, 2024. It is now read-only.

replication: Don't use majority rule for old entries #302

Closed
wants to merge 3 commits into from

Conversation

cole-miller
Copy link
Contributor

@cole-miller cole-miller commented Aug 29, 2022

WIP. I expect some failing tests at first and will add further commits addressing them.

Closes #220.

Signed-off-by: Cole Miller [email protected]

@cole-miller cole-miller force-pushed the quorum branch 6 times, most recently from 4c5f25c to 8ff7e21 Compare August 29, 2022 20:53
@cole-miller
Copy link
Contributor Author

I've been thinking about a different approach to this. In normal operation no entries are appended during term 1, because all nodes start out as followers and the first leader will have incremented its term to 2. If we could rely on this always being the case, we could omit the barrier at the beginning of term 2, because there are guaranteed to be no entries before that (except the dummy entry at index 0, which all nodes share). That would mean that all of the integration (and fuzzy) tests that only cover one election -- that is, most of them -- wouldn't need updating, because there wouldn't be an extra barrier entry in play. Convenient!

I tried implementing this, and ran into the problem that some of the tests write entries to the log before starting the cluster:

TEST(replication, recvMissingEntries, setUp, tearDown, 0, NULL)
{
struct fixture *f = data;
struct raft_entry entry;
CLUSTER_BOOTSTRAP;
/* Server 0 has an entry that server 1 doesn't have */
entry.type = RAFT_COMMAND;
entry.term = 1;
FsmEncodeSetX(1, &entry.buf);
CLUSTER_ADD_ENTRY(0, &entry);
/* Server 0 wins the election because it has a longer log. */
CLUSTER_START;
CLUSTER_STEP_UNTIL_HAS_LEADER(5000);
munit_assert_int(CLUSTER_LEADER, ==, 0);
/* The first server replicates missing entries to the second. */
CLUSTER_STEP_UNTIL_APPLIED(1, 2, 3000);
return MUNIT_OK;
}

When the cluster starts, the leader will have an entry from term 1, which breaks the reasoning above. I could modify this and other tests that write to the log directly, but the question remains whether we're okay imposing the "no entries in term 1" condition on all raft consumers. @MathieuBordere @freeekanayaka thoughts?

@freeekanayaka
Copy link
Contributor

I've been thinking about a different approach to this. In normal operation no entries are appended during term 1, because all nodes start out as followers and the first leader will have incremented its term to 2. If we could rely on this always being the case, we could omit the barrier at the beginning of term 2, because there are guaranteed to be no entries before that (except the dummy entry at index 0, which all nodes share). That would mean that all of the integration (and fuzzy) tests that only cover one election -- that is, most of them -- wouldn't need updating, because there wouldn't be an extra barrier entry in play. Convenient!

I tried implementing this, and ran into the problem that some of the tests write entries to the log before starting the cluster:

TEST(replication, recvMissingEntries, setUp, tearDown, 0, NULL)
{
struct fixture *f = data;
struct raft_entry entry;
CLUSTER_BOOTSTRAP;
/* Server 0 has an entry that server 1 doesn't have */
entry.type = RAFT_COMMAND;
entry.term = 1;
FsmEncodeSetX(1, &entry.buf);
CLUSTER_ADD_ENTRY(0, &entry);
/* Server 0 wins the election because it has a longer log. */
CLUSTER_START;
CLUSTER_STEP_UNTIL_HAS_LEADER(5000);
munit_assert_int(CLUSTER_LEADER, ==, 0);
/* The first server replicates missing entries to the second. */
CLUSTER_STEP_UNTIL_APPLIED(1, 2, 3000);
return MUNIT_OK;
}

When the cluster starts, the leader will have an entry from term 1, which breaks the reasoning above. I could modify this and other tests that write to the log directly, but the question remains whether we're okay imposing the "no entries in term 1" condition on all raft consumers. @MathieuBordere @freeekanayaka thoughts?

I'm not sure what you mean with "consumers", basically the tests? Because as you point, in real world operation there should be no entries at term 1, for any real world "consumer".

That being said, if you are exploiting this property exclusively in order to fix a lot of tests, perhaps it'd be better to bite the bullet and fix the tests even if it's laborious. I don't have a clear idea of the type of failure that occurs tho, if you can paste an example that might help.

@cole-miller
Copy link
Contributor Author

cole-miller commented Aug 30, 2022

(GitHub ate this comment the first time around, ugh.)

I'm not sure what you mean with "consumers", basically the tests? Because as you point, in real world operation there should be no entries at term 1, for any real world "consumer".

Right, I guess I just wanted to check that I wasn't missing some way to smuggle in log entries at term 1 using the public API.

That being said, if you are exploiting this property exclusively in order to fix a lot of tests, perhaps it'd be better to bite the bullet and fix the tests even if it's laborious. I don't have a clear idea of the type of failure that occurs tho, if you can paste an example that might help.

That's fair, it's definitely a bit of a hack. The test failures that crop up when adding a barrier at term 2 are mostly in places where we call ASSERT_CONFIGURATION_INDICES, CLUSTER_STEP_UNTIL_APPLIED, or CLUSTER_LAST_APPLIED with hardcoded log indices -- those hardcoded values generally need to be bumped by 1 (or more, if the test covers further elections) to account for the barrier(s). membership/addNonVoting in test/fuzzy/test_membership.c is typical, here's the diff:

TEST(membership, addNonVoting, setup, tear_down, 0, _params)
{
    struct fixture *f = data;
    const struct raft_server *server;
    struct raft *raft;

    CLUSTER_ADD(&f->req);
-   CLUSTER_STEP_UNTIL_APPLIED(CLUSTER_LEADER, 2, 2000);
+   CLUSTER_STEP_UNTIL_APPLIED(CLUSTER_LEADER, 3, 2000);

    /* Then promote it. */
    CLUSTER_ASSIGN(&f->req, RAFT_STANDBY);

-   CLUSTER_STEP_UNTIL_APPLIED(CLUSTER_N, 3, 2000);
+   CLUSTER_STEP_UNTIL_APPLIED(CLUSTER_N, 4, 2000);

    raft = CLUSTER_RAFT(CLUSTER_LEADER);

    server = &raft->configuration.servers[CLUSTER_N - 1];
    munit_assert_int(server->id, ==, CLUSTER_N);
    return MUNIT_OK;
}

These changes aren't complicated (and I have Mathieu's old branch to work from), but I found it somewhat taxing to convince myself that each one was correct and that I wasn't missing any (due to tests passing spuriously), so cutting down the number of tests that needed updating seemed appealing. But you might be right that it's better to stick with the original approach.

@codecov-commenter
Copy link

codecov-commenter commented Aug 30, 2022

Codecov Report

Merging #302 (5d3a689) into master (24465ee) will increase coverage by 0.84%.
The diff coverage is 81.81%.

@@            Coverage Diff             @@
##           master     #302      +/-   ##
==========================================
+ Coverage   82.82%   83.67%   +0.84%     
==========================================
  Files          49       49              
  Lines        8707     9243     +536     
  Branches     2181     2476     +295     
==========================================
+ Hits         7212     7734     +522     
- Misses        856      954      +98     
+ Partials      639      555      -84     
Impacted Files Coverage Δ
src/election.c 84.10% <0.00%> (+2.16%) ⬆️
src/convert.c 86.56% <72.72%> (+3.08%) ⬆️
src/replication.c 76.44% <80.95%> (-0.46%) ⬇️
src/fixture.c 92.60% <100.00%> (+0.59%) ⬆️
src/snapshot.c 92.30% <0.00%> (-3.53%) ⬇️
src/uv_work.c 75.55% <0.00%> (-3.02%) ⬇️
src/uv_writer.c 77.44% <0.00%> (-3.00%) ⬇️
src/entry.c 86.00% <0.00%> (-2.89%) ⬇️
src/start.c 75.72% <0.00%> (-2.85%) ⬇️
src/uv_prepare.c 84.43% <0.00%> (-1.65%) ⬇️
... and 39 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@cole-miller cole-miller force-pushed the quorum branch 2 times, most recently from 1958bb6 to 5d3a689 Compare August 30, 2022 21:37
@cole-miller cole-miller marked this pull request as ready for review August 30, 2022 21:43
@cole-miller
Copy link
Contributor Author

cole-miller commented Aug 30, 2022

Okay, the latest version of this branch has the term 2 barrier and passes tests. Definitely would benefit from a second (and third) pair of eyes on it. I also would like to add at least one test to check that the new barriers are doing their job.

@MathieuBordere mentioned that the dqlite test suite will also need updating -- I'll get on that tomorrow.

@MathieuBordere
Copy link
Contributor

MathieuBordere commented Aug 31, 2022

So yes, I remember there being a non-trivial issue in the workings of dqlite (or go-dqlite). We better not merge this until we figure out again what it was, I will convert it to draft temporarily so we can't merge it by accident because e.g. LXD builds from the raft master branch and we don't want to start breaking tests if we can avoid it.

@MathieuBordere MathieuBordere marked this pull request as draft August 31, 2022 07:01
@freeekanayaka
Copy link
Contributor

freeekanayaka commented Aug 31, 2022

Ok, I believe I understand a little bit better now. I didn't realize that the need to change the tests is due (at least in part) by the newly introduced barrier.

Although it's true that the raft paper says that a leader should commit a no-op entry at the beginning of its term, I think this not really a hard requirement, as its purpose is only to find out what the last committed index is. There are cases were passively waiting for a new real-world entry to be submitted is enough for the consumer.

In our raft implementation that choice in some sense is left to the consumer of the library. The dqlite consumer already takes care of applying a barrier if needed, see the needsBarrier function in leader.c.

Unless I'm missing something, we could just keep this approach and still be 100% safe and correct.

What happens to the tests if you remove the barrier call that you introduce?

I'd suggest to remove the barrier from the code, and possibly add a defdicated barrier to the tests that need it for some reason, or something along those lines.

It might be that basically all the tests you've touched here need that barrier, but that would be fine. We make that explicit, in the same way we expect our consumers to be explicit about that (as dqlite is).

@MathieuBordere
Copy link
Contributor

MathieuBordere commented Aug 31, 2022

Ok, I believe I understand a little bit better now. I didn't realize that the need to change the tests is due (at least in part) by the newly introduced barrier.

Although it's true that the raft paper says that a leader should commit a no-op entry at the beginning of its term, I think this not really a hard requirement, as its purpose is only to find out what the last committed index is. There are cases were passively waiting for a new real-world entry to be submitted is enough for the consumer.

In our raft implementation that choice in some sense is left to the consumer of the library. The dqlite consumer already takes care of applying a barrier if needed, see the needsBarrier function in leader.c.

Unless I'm missing something, we could just keep this approach and still be 100% safe and correct.

What happens to the tests if you remove the barrier call that you introduce?

I'd suggest to remove the barrier from the code, and possibly add a defdicated barrier to the tests that need it for some reason, or something along those lines.

It might be that basically all the tests you've touched here need that barrier, but that would be fine. We make that explicit, in the same way we expect our consumers to be explicit about that (as dqlite is).

I think we hit the issue in Figure 8 of raft paper even with the dqlite barriers in place.

In step c) of Figure 8 imagine we are replicating the barrier entry with ID 4 at index 3. But once server 3 has finished replicating entry with ID 2 at index 2, we will commit it and it's an error.

I mean it's possible that those entries are barrier entries, and they are harmless, but it sure becomes tricky, I don't think it's a good idea to leave it to the user to use barriers where appropriate.

edit: In Figure 8 step (b) server 5 won't have requested a barrier, because last_applied == last_log_index, so entry with ID 3 could be a write transaction.

edit2: In Figure 8 log entry ID 2 could also be a non-barrier entry I think.

@freeekanayaka
Copy link
Contributor

Ok, I believe I understand a little bit better now. I didn't realize that the need to change the tests is due (at least in part) by the newly introduced barrier.
Although it's true that the raft paper says that a leader should commit a no-op entry at the beginning of its term, I think this not really a hard requirement, as its purpose is only to find out what the last committed index is. There are cases were passively waiting for a new real-world entry to be submitted is enough for the consumer.
In our raft implementation that choice in some sense is left to the consumer of the library. The dqlite consumer already takes care of applying a barrier if needed, see the needsBarrier function in leader.c.
Unless I'm missing something, we could just keep this approach and still be 100% safe and correct.
What happens to the tests if you remove the barrier call that you introduce?
I'd suggest to remove the barrier from the code, and possibly add a defdicated barrier to the tests that need it for some reason, or something along those lines.
It might be that basically all the tests you've touched here need that barrier, but that would be fine. We make that explicit, in the same way we expect our consumers to be explicit about that (as dqlite is).

I think we hit the issue in Figure 8 of raft paper even with the dqlite barriers in place.

In step c) of Figure 8 imagine we are replicating the barrier entry with ID 4 at index 3. But once server 3 has finished replicating entry with ID 2 at index 2, we will commit it and it's an error.

I mean it's possible that those entries are barrier entries, and they are harmless, but it sure becomes tricky, I don't think it's a good idea to leave it to the user to use barriers where appropriate.

edit: In Figure 8 step (b) server 5 won't have requested a barrier, because last_applied == last_log_index, so entry with ID 3 could be a write transaction.

edit2: In Figure 8 log entry ID 2 could also be a non-barrier entry I think.

I don't fully understand your argument here. Strictly speaking the barrier has no effect on commitment, in the sense that with the logic that @cole-miller has put in place in this branch no entry (either regular or barrier) will ever be committed if it's from an older term (unless of course there's an entry from the current term that is committed). So none of the scenarios in Figure 8 can happen, I think? Regardless of whether you use barriers or not.

Barriers are useful basically if want/need to "block" until you know what the latest commit index is. This is not always a requirement, for example I believe it's not for dqlite.

In your example, if in step (b) server 5 does not request a barrier, because last_applied == last_log_index, and the entry with ID 3 is a write transaction, that's perfectly fine. Server 5 is (or believe it is) the leader, and from its point of view there can't be any committed entry that hasn't yet been applied to the FSM (because last_applied == last_log_index), that means that the FSM is at its latest state and it's legitimate to start a write transaction against it. The worst that can happen is that the entry for the write transaction does not get committed either because server 5 crashes or loses leadership.

The last_applied == last_log_index check is there just to be sure that the FSM is at it latest state from the point of view of the leader that is initiating the write transaction (otherwise you'd run it against an outdated database).

That is independent from the fix in this branch that basically just delays commitment for entries from older terms.

For instance, with this fix in place, in step 2, the last_applied == last_log_index check would certainly fail, because entry 2 at index 2 can't possibly be committed (since it's from an older term) and so the FSM is behind the leader's latest index. In that case the leader would request a barrier, which would cause entry 2 at index 2 to be committed (once the barrier gets replicated to a majority) and at that point the FSM is recent enough that a write transaction can be started.

Unless I'm missing something.

@MathieuBordere
Copy link
Contributor

MathieuBordere commented Aug 31, 2022

Ok, I believe I understand a little bit better now. I didn't realize that the need to change the tests is due (at least in part) by the newly introduced barrier.
Although it's true that the raft paper says that a leader should commit a no-op entry at the beginning of its term, I think this not really a hard requirement, as its purpose is only to find out what the last committed index is. There are cases were passively waiting for a new real-world entry to be submitted is enough for the consumer.
In our raft implementation that choice in some sense is left to the consumer of the library. The dqlite consumer already takes care of applying a barrier if needed, see the needsBarrier function in leader.c.
Unless I'm missing something, we could just keep this approach and still be 100% safe and correct.
What happens to the tests if you remove the barrier call that you introduce?
I'd suggest to remove the barrier from the code, and possibly add a defdicated barrier to the tests that need it for some reason, or something along those lines.
It might be that basically all the tests you've touched here need that barrier, but that would be fine. We make that explicit, in the same way we expect our consumers to be explicit about that (as dqlite is).

I think we hit the issue in Figure 8 of raft paper even with the dqlite barriers in place.
In step c) of Figure 8 imagine we are replicating the barrier entry with ID 4 at index 3. But once server 3 has finished replicating entry with ID 2 at index 2, we will commit it and it's an error.
I mean it's possible that those entries are barrier entries, and they are harmless, but it sure becomes tricky, I don't think it's a good idea to leave it to the user to use barriers where appropriate.
edit: In Figure 8 step (b) server 5 won't have requested a barrier, because last_applied == last_log_index, so entry with ID 3 could be a write transaction.
edit2: In Figure 8 log entry ID 2 could also be a non-barrier entry I think.

I don't fully understand your argument here. Strictly speaking the barrier has no effect on commitment, in the sense that with the logic that @cole-miller has put in place in this branch no entry (either regular or barrier) will ever be committed if it's from an older term (unless of course there's an entry from the current term that is committed). So none of the scenarios in Figure 8 can happen, I think? Regardless of whether you use barriers or not.

Barriers are useful basically if want/need to "block" until you know what the latest commit index is. This is not always a requirement, for example I believe it's not for dqlite.

In your example, if in step (b) server 5 does not request a barrier, because last_applied == last_log_index, and the entry with ID 3 is a write transaction, that's perfectly fine. Server 5 is (or believe it is) the leader, and from its point of view there can't be any committed entry that hasn't yet been applied to the FSM (because last_applied == last_log_index), that means that the FSM is at its latest state and it's legitimate to start a write transaction against it. The worst that can happen is that the entry for the write transaction does not get committed either because server 5 crashes or loses leadership.

The last_applied == last_log_index check is there just to be sure that the FSM is at it latest state from the point of view of the leader that is initiating the write transaction (otherwise you'd run it against an outdated database).

That is independent from the fix in this branch that basically just delays commitment for entries from older terms.

For instance, with this fix in place, in step 2, the last_applied == last_log_index check would certainly fail, because entry 2 at index 2 can't possibly be committed (since it's from an older term) and so the FSM is behind the leader's latest index. In that case the leader would request a barrier, which would cause entry 2 at index 2 to be committed (once the barrier gets replicated to a majority) and at that point the FSM is recent enough that a write transaction can be started.

Unless I'm missing something.

Oh sorry I didn't understand you then, yes with this fix in place Figure 8 can't happen, I understood that you thought this fix wasn't needed.

edit: After rereading I know understand that you only meant that the no-op barrier might be not needed upon election.

@freeekanayaka
Copy link
Contributor

Oh sorry I didn't understand you then, yes with this fix in place Figure 8 can't happen, I understood that you thought this fix wasn't needed.

Ah right, just a misunderstanding then :)

edit: After rereading I know understand that you only meant that the no-op barrier might be not needed upon election.

Exactly, I think it's not needed and to me it feels that it's actually better to leave it to consumers to decide what to do exactly, based on their requirements (as dqlite does).

@cole-miller
Copy link
Contributor Author

Where in the code does that last_applied == last_log_index check live?

@freeekanayaka
Copy link
Contributor

needsBarrier function in leader.c

See the needsBarrier function in leader.c in dqlite.

@cole-miller
Copy link
Contributor Author

cole-miller commented Aug 31, 2022

Thanks! Let me check my understanding. Currently, our raft leaders (1) will commit entries from previous terms by majority, and (2) don't automatically append a barrier entry at the beginning of their terms. (1) is definitely a bug and is fixed by changing the logic of replicationQuorum. (2) is more of a gray area. With the fix for (1), leaders won't try to commit any outstanding entries from previous terms until an entry from the current term is appended to their log. That could be taken care of by an automatic barrier in raft, or we could require raft consumers to introduce their own barriers as needed (which dqlite already does).

Assuming that's correct… my preference would be for raft to handle (2) internally with an automatic barrier, rather than pushing it onto consumers. It seems like the needsBarrier logic will be the same for every consumer, so why not avoid the duplication by adopting the whitepaper's strategy here?

@freeekanayaka
Copy link
Contributor

Thanks! Let me check my understanding. Currently, our raft leaders (1) will commit entries from previous terms by majority, and (2) don't automatically append a barrier entry at the beginning of their terms. (1) is definitely a bug and is fixed by changing the logic of replicationQuorum. (2) is more of a gray area. With the fix for (1), leaders won't try to commit any outstanding entries from previous terms until an entry from the current term is appended to their log. That could be taken care of by an automatic barrier in raft, or we could require raft consumers to introduce their own barriers as needed (which dqlite already does).

Assuming that's correct…

It is correct! Slight amend: "the leader won't try to commit any outstanding entries from previous terms until an entry from the current term is appended to their log and replicated to a majority of nodes (at that point it commits that entry and all the outstanding ones from previous terms)"

my preference would be for raft to handle (2) internally with an automatic barrier, rather than pushing it onto consumers. It seems like the needsBarrier logic will be the same for every consumer, so why not avoid the duplication by adopting the whitepaper's strategy here?

What I was trying to say is that the needsBarrier logic won't be the same for every consumer, because it really depends on your application. For example, the automatic barrier in this branch is not equivalent to the one that dqlite runs, because the barrier in this branch runs every time a server becomes leader, the one in dqlite a is bit more subtle and runs in slightly different although overlapping situations. Even if you put the automatic barrier here, you'd still need the one in dqlite (but not the other way round, you can remove the automatic barrier here, leave the one in dqlite and be safe). I can elaborate on that if it's not clear, but see my other comment above about the scenario in figure 8 of the raft paper.

@cole-miller
Copy link
Contributor Author

Even if you put the automatic barrier here, you'd still need the one in dqlite

Ah, this might be what I'm missing -- why is that the case?

@freeekanayaka
Copy link
Contributor

Even if you put the automatic barrier here, you'd still need the one in dqlite

Ah, this might be what I'm missing -- why is that the case?

Basically because you don't want to start any transaction against an out-dated FSM, and a new leader is only one of the cases where it can happen (there could simply be another operation that you are waiting for).

@freeekanayaka
Copy link
Contributor

Note that the general point is still that having the leader commit a no-op at the beginning of its term is not a requirement for all applications. It's not for dqlite because if the leader's FSM is recent enough, there's no need to commit a no-op, a regular transaction entry can be proposed immediately in a safe way.

@calvin2021y
Copy link

Hi, @freeekanayaka

thanks for the explain.

It's not for dqlite because if the leader's FSM is recent enough, there's no need to commit a no-op, a regular transaction entry can be proposed immediately in a safe way.

how to know the FSM is recent enough? (dqlite from new leader finish a read transaction instead no-op-log) ?

correct me if I am wrong, without finish at least one log, new leader will not able to know the FSM is fresh?

@freeekanayaka
Copy link
Contributor

Thanks for explain, I think what you mean is the case leader already know there is log committed in his term.

for example: in this case leader (at term 2) check the last committed log id (for example term=1, index=3), and this log match his local record(his biggest know log index).

in this case leader can be sure the FSM is updated ?

It can be sure that there is no index greater than 3 that is waiting to be committed by this leader itself. For dqlite's needs that's enough, and a new transaction index can be created without any need of a no-op in between.

@calvin2021y
Copy link

Thanks very much for the explain.

So I guess all other case for a leader at his new term (no log commit in this term yet), a no-log is needed.

@cole-miller
Copy link
Contributor Author

Okay, I've pushed an updated branch without the automatic barrier. A few tests needed to be updated because they were relying on the old behavior. Still need to test this branch against dqlite and go-dqlite before it's ready to merge.

@cole-miller cole-miller marked this pull request as ready for review September 2, 2022 21:30
@cole-miller cole-miller marked this pull request as draft September 2, 2022 21:30
@cole-miller
Copy link
Contributor Author

dqlite tests are passing, but there's an issue with go-dqlite/test/roles.sh -- I'm looking into it, probably just needs an explicit barrier somewhere.

@cole-miller
Copy link
Contributor Author

I think I've finally figured out what's going on with go-dqlite's roles test. We have a leader that does the following:

  1. promote another node from spare to voter
  2. pick a voter and transfer leadership to it
  3. take itself offline

And the problem is that the config change log entry from (1) gets replicated and committed by the original leader, but it doesn't have the chance to communicate the new commit index to its successsor before dropping out (3). The new leader doesn't apply any kind of barrier, so with the new quorum rules the config change just sticks around uncommitted and prevents any other config changes from getting started.

A narrow fix for this would be to update the raft_timeout_now RPC to include the leader's commit index (like the raft_append_entries RPC already does). I've implemented that locally and it's enough for the roles.sh test to pass. But it has some holes -- what if the old leader just crashed without sending raft_timeout_now? And I'm not confident that there aren't more places where we're implicitly relying on some old entries (in particular, RAFT_CHANGE entries) to be committed "on their own". I'm coming back around to the view that an automatic barrier at the beginning of every term is the safe way to deal with this, but short of that, maybe we could have leaders apply a barrier when they start their term with configuration_uncommitted_index != 0?

@cole-miller
Copy link
Contributor Author

Ah, there's another new test failure with this branch, TestIntegration_HighAvailability in go-dqlite/driver. I'll look into that one.

@cole-miller
Copy link
Contributor Author

cole-miller commented Sep 7, 2022

Ah, there's another new test failure with this branch, TestIntegration_HighAvailability in go-dqlite/driver. I'll look into that one.

This was easier to figure out. The test looks like this:

https://github.com/canonical/go-dqlite/blob/c510f9c4547598bf13b4d1cf0ac96401efc6591b/driver/integration_test.go#L240-L263

The second db.Exec(...) fails at the sqlite3 call here:

https://github.com/canonical/dqlite/blob/98efc202e1c32af38b8c3ae39548bc8a69d96a5f/src/gateway.c#L526

That runs before we call leader__barrier, so with the quorum fix it's not guaranteed that the first db.Exec(...) is committed everywhere. The dqlite-only fix would be to call leader__barrier earlier in handle_exec_sql (and friends).

Edit: oh, this is also canonical/dqlite#210, right?

@MathieuBordere
Copy link
Contributor

MathieuBordere commented Sep 8, 2022

I also get the following failure with this branch.

(base) mathieu@anna:~/code/go-dqlite/driver (master)$ go test -run "TestIntegration_LeadershipTransfer"
--- FAIL: TestIntegration_LeadershipTransfer (2.21s)
    func.go:15: DEBUG: attempt 1: server @1: connected
    func.go:15: DEBUG: leadership lost (10250 - not leader)
    func.go:15: DEBUG: attempt 1: server @1: connect to reported leader @2
    func.go:15: DEBUG: attempt 1: server @1: connected
    integration_test.go:276: 
                Error Trace:    integration_test.go:276
                Error:          Received unexpected error:
                                no such table: test
                Test:           TestIntegration_LeadershipTransfer
FAIL
exit status 1
FAIL    github.com/canonical/go-dqlite/driver   4.450s

This failure also occurs without the quorum change though

@MathieuBordere
Copy link
Contributor

... but short of that, maybe we could have leaders apply a barrier when they start their term with configuration_uncommitted_index != 0?

I think that should take care of it.

@MathieuBordere
Copy link
Contributor

Ah, there's another new test failure with this branch, TestIntegration_HighAvailability in go-dqlite/driver. I'll look into that one.

This was easier to figure out. The test looks like this:

https://github.com/canonical/go-dqlite/blob/c510f9c4547598bf13b4d1cf0ac96401efc6591b/driver/integration_test.go#L240-L263

The second db.Exec(...) fails at the sqlite3 call here:

https://github.com/canonical/dqlite/blob/98efc202e1c32af38b8c3ae39548bc8a69d96a5f/src/gateway.c#L526

That runs before we call leader__barrier, so with the quorum fix it's not guaranteed that the first db.Exec(...) is committed everywhere. The dqlite-only fix would be to call leader__barrier earlier in handle_exec_sql (and friends).

Edit: oh, this is also canonical/dqlite#210, right?

I did some work on this in the past, never finished though -> https://github.com/MathieuBordere/dqlite/commits/commit-previous-term It could give you some inspiration in case you run against issues.

@cole-miller
Copy link
Contributor Author

@MathieuBordere

TestIntegration_LeadershipTransfer [...] This failure also occurs without the quorum change though

Yep, looking at the test this seems to be another case of needing a barrier before sqlite3_prepare_v2.

@cole-miller
Copy link
Contributor Author

I'm going to add some tests in this PR that are able to detect the misbehavior we're trying to fix, taking inspiration from the go-dqlite test failures I was diagnosing last week.

@cole-miller
Copy link
Contributor Author

Added a regression test -- it's a bit awkward, but does pass on this branch and fail on master (you can cherry-pick the commit to confirm this).

@cole-miller
Copy link
Contributor Author

@freeekanayaka, any new thoughts on the basic approach here? I'm asking because you said in #311 that you were rethinking how we might want to do barriers generally.

@freeekanayaka
Copy link
Contributor

@freeekanayaka, any new thoughts on the basic approach here? I'm asking because you said in #311 that you were rethinking how we might want to do barriers generally.

I haven't been able to go through this yet, sorry about that. I understand it might be becoming kind of high-prio on your side, if so, please by all means don't block on me. I still plan to give a look at this, but have been busy on other fronts.

@cole-miller
Copy link
Contributor Author

cole-miller commented Dec 8, 2022

Mathieu is going to pick this up and implement the "automatic barrier at the beginning of every term" strategy, since it's relevant for his work on fixing #250. dqlite will still run its own barriers in the middle of a term to ensure that SQL requests don't run against a stale database. I'm going to close some other PRs that will be unnecessary once the automatic barrier is in place.

@MathieuBordere
Copy link
Contributor

Mathieu is going to pick this up and implement the "automatic barrier at the beginning of every term" strategy, since it's relevant for his work on fixing #250. dqlite will still runs its own barriers in the middle of a term to ensure that SQL requests don't run against a stale database. I'm going to close some other PRs that will be unnecessary once the automatic barrier is in place.

It is needed to determine if a configuration loaded from disk at startup is committed or not as we can't know. Because we can't know, we have to set configuration_uncommitted_index and to clear it, we have to try and commit a log entry.

@MathieuBordere
Copy link
Contributor

Replaced by #336

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Committing entries from previous terms
5 participants