Skip to content

Commit

Permalink
Preserve Detached Client's Lamport in Version Vector (#1090)
Browse files Browse the repository at this point in the history
Following the GC error discovered in #1089, we've temporarily modified the
version vector handling to retain the detached client's lamport across
local and minimum version vectors. This change provides a stopgap solution
while we investigate a more comprehensive approach to garbage collection.

---------

Co-authored-by: Youngteac Hong <[email protected]>
  • Loading branch information
JOOHOJANG and hackerwins authored Dec 6, 2024
1 parent 571cf4b commit 7ad9e71
Show file tree
Hide file tree
Showing 7 changed files with 131 additions and 205 deletions.
108 changes: 1 addition & 107 deletions design/garbage-collection.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,7 @@ Conceptually, min version vector is version vector that is uniformly applied to
if (removedAt.lamport <= minVersionVector[removedAt.actor]) {
runGC()
} else if (removedAt.lamport < minVersionVector.minLamport()) {
runGC()
}
}
```

```
Expand All @@ -68,110 +66,6 @@ min([c1:2, c2:3, c3:4], [c1:3, c2:1, c3:5, c4:3])
=> [c1:2, c2:1, c3:4, c4:0]
```
### How we handle if min version vector includes detached client's lamport.
We have to wipe out detached client's lamport from every version vector in db and other client's local version vector.
![detached-user-version-vector](media/detached-user-version-vector.jpg)

For example,
```
// initial state
c1.versionVector = {c1:4, c2:3, c3: 4}
c2.versionVector = {c1:3, c2:4, c3: 5}
c3.versionVector = {c1:3, c2:3, c3: 6}
db.versionVector = {
c1: {c1:4, c2:3, c3: 4},
c2: {c1:3, c2:4, c3: 5},
c3: {c1:3, c2:3, c3: 6}
}
// process
1. c1 detach and remove its version vector from db.
db.versionVector = {
c2: {c1:3, c2:4, c3: 5}
c3: {c1:3, c2:3, c3: 6}
}
2. compute minVersionVector
min(c2.vv, c3.vv) = min({c1:3, c2:4, c3: 5}, {c1:3, c2:3, c3:5}) = {c1:3, c2:3, c3:5}
```
as you can see from above, c1's lamport is still inside minVersionVector, and also in every client's local document's version vector too.

So we need to filter detached client's lamport from
1. db.version vector
2. other client's local version vector.

But it causes n+1 query problem to remove lamport from db.versionVector. So we choose remove only client's version vector from table, and filter minVersionVector by active clients.

```
// initial state
db.versionVector = {
c1: {c1:3, c2:4, c3: 5},
c2: {c1:3, c2:3, c3: 6}
}
min(c1.vv, c2.vv) = min({c1:3, c2:4, c3: 5}, {c1:3, c2:3, c3:5}) =
{c1:3, c2:3, c3:5}
c1, c2 are acitve(attached).
minVersionVector = {c1:3, c2:3, c3:5}.Filter([c3]) = {c1:3, c2:3}
```

After client receive this minVersionVector, it will filter its version vector to remove detached client's lamport.
The next pushpull request will contains filtered version vector so that eventually db.version vector will store attached client's version vector only.
![filter-version-vector](media/filter-version-vector.jpg)



### Why `removedAt.lamport <= minVersionVector[removedAt.actor]` is not enough to run GC
Let's consider the following scenario.

Users A, B, and C are participating in a collaborative editing session. User C deletes a specific node and immediately detaches the document. In this situation, the node deleted by C remains in the document, and C's version vector is removed from the version vector table in the database.

Previously, we stated that to find the minimum version vector, we query all vectors in the version vector table in the database and take the minimum value. After C detaches, if we create the minimum version vector by querying the version vector table, the resulting minimum version vector will not contain C’s information.

Our existing garbage collection (GC) algorithm performs GC when the condition removedAt.lamport <= minVersionVector[removedAt.actor] is satisfied. However, if the actor who deleted the node does not exist in the minimum version vector, this logic will not function.

Therefore, the algorithm needs to be designed so that GC is performed even in situations where the actor who deleted the node is not present in the minimum version vector.

### Is it safe to run GC in condition `removedAt.lamport < minVersionVector.minLamport()`
We can understand this by considering the definitions of the version vector and the minimum version vector.

A version vector indicates the editing progress of a user’s document, including how much of other users’ edits have been incorporated. For example, if A’s version vector is `[A:5, B:4, C:2]`, it means that A’s document reflects changes up to 4 made by B and up to 2 made by C.

Expanding this further, let’s assume three users have the following version vectors:

- A: `[A:5, B:4, C:2]`
- B: `[A:3, B:4, C:2]`
- C: `[A:2, B:1, C:3]`

We assume that C deleted a specific node with their last change.

In this situation, if C detaches from the document, only A’s and B’s version vectors remain, and the minimum version vector would become `[A:3, B:4]`. When can we perform garbage collection (GC) to delete the node removed by C at `[A:2, B:1, C:3]`?

By examining the minimum version vector at this point, we can consider two scenarios:

1. Only A and B were participating in the editing from the beginning.
2. There was another user besides A and B, but that user has now detached.

In the first scenario, the existing algorithm that operates when `removedAt.lamport <= minVersionVector[removedAt.actor]` applies, so we don’t need to address it further.

The second scenario presents a potential issue, as a node removed by someone else remains as a tombstone. To remove this tombstone, we need a minimum guarantee.

If we express the execution criterion of the GC algorithm in semantic terms, it would be:

> "The point at which all users are aware that a specific node has been removed."
From the moment C detaches, information about C is removed from each version vector. So, how can we know that C deleted a specific node? Since there’s no direct way to determine this in the minimum version vector due to the lack of information, we need to verify this fact indirectly.

From the perspective of the version vector and the minimum version vector, this means that the minimum value in the minimum version vector should be greater than removedAt.

Of course, it’s possible for a specific client to have a timestamp greater than removedAt without knowing that C deleted the node. However, this case can be addressed by calculating the minimum lamport value in the minimum version vector.

What’s essential here is having a consistent criterion. If we take the node’s removedAt as this criterion, and if a lamport value greater than this criterion exists in the minimum version vector, then it is safe to delete the node.

![remove-detached-clients-tombstone](media/remove-datached-clients-tombstone.jpg)

## An example of garbage collection:
### State 1
Expand Down
12 changes: 1 addition & 11 deletions pkg/document/document.go
Original file line number Diff line number Diff line change
Expand Up @@ -221,17 +221,7 @@ func (d *Document) ApplyChangePack(pack *change.Pack) error {
d.GarbageCollect(pack.VersionVector)
}

// 05. Remove detached client's lamport from version vector if it exists
if pack.VersionVector != nil && !hasSnapshot {
actorIDs, err := pack.VersionVector.Keys()
if err != nil {
return err
}

d.doc.changeID = d.doc.changeID.SetVersionVector(d.doc.changeID.VersionVector().Filter(actorIDs))
}

// 06. Update the status.
// 05. Update the status.
if pack.IsRemoved {
d.SetStatus(StatusRemoved)
}
Expand Down
10 changes: 0 additions & 10 deletions pkg/document/internal_document.go
Original file line number Diff line number Diff line change
Expand Up @@ -188,16 +188,6 @@ func (d *InternalDocument) ApplyChangePack(pack *change.Pack, disableGC bool) er
}
}

// 04. Remove detached client's lamport from version vector if it exists
if pack.VersionVector != nil && !hasSnapshot {
actorIDs, err := pack.VersionVector.Keys()
if err != nil {
return err
}

d.changeID = d.changeID.SetVersionVector(d.changeID.VersionVector().Filter(actorIDs))
}

return nil
}

Expand Down
18 changes: 1 addition & 17 deletions pkg/document/time/version_vector.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ package time

import (
"bytes"
"math"
"sort"
"strconv"
"strings"
Expand Down Expand Up @@ -118,9 +117,7 @@ func (v VersionVector) EqualToOrAfter(other *Ticket) bool {
clientLamport, ok := v[other.actorID.bytes]

if !ok {
minLamport := v.MinLamport()

return minLamport > other.lamport
return false
}

return clientLamport >= other.lamport
Expand Down Expand Up @@ -176,19 +173,6 @@ func (v VersionVector) Max(other VersionVector) VersionVector {
return maxVV
}

// MinLamport returns min lamport value in version vector.
func (v VersionVector) MinLamport() int64 {
var minLamport int64 = math.MaxInt64

for _, value := range v {
if value < minLamport {
minLamport = value
}
}

return minLamport
}

// MaxLamport returns max lamport value in version vector.
func (v VersionVector) MaxLamport() int64 {
var maxLamport int64 = -1
Expand Down
35 changes: 7 additions & 28 deletions server/backend/database/memory/database.go
Original file line number Diff line number Diff line change
Expand Up @@ -1347,47 +1347,27 @@ func (d *DB) UpdateAndFindMinSyncedVersionVector(
docRefKey types.DocRefKey,
versionVector time.VersionVector,
) (time.VersionVector, error) {
// 01. Prepare attachedActorIDs including the current client.
// If some clients are detached, we should remove them from the min version vector.
// For this, we use attachedActorIDs to filter the min version vector.
var attachedActorIDs []*time.ActorID
attached, err := clientInfo.IsAttached(docRefKey.DocID)
if err != nil {
return nil, err
}
// TODO(JOOHOJANG): We have to consider removing detached client's lamport
// from min version vector.

if attached {
actorID, err := clientInfo.ID.ToActorID()
if err != nil {
return nil, err
}
attachedActorIDs = append(attachedActorIDs, actorID)
}

// 02. Find all version vectors of the given document from DB.
// 01. Find all version vectors of the given document from DB.
txn := d.db.Txn(false)
defer txn.Abort()
iterator, err := txn.Get(tblVersionVectors, "doc_id", docRefKey.DocID.String())
if err != nil {
return nil, fmt.Errorf("find all version vectors: %w", err)
}

// 03. Compute min version vector.
// 02. Compute min version vector.
var minVersionVector time.VersionVector

// 03-1. Compute min version vector of other clients and collect attachedActorIDs.
// 02-1. Compute min version vector of other clients and collect attachedActorIDs.
for raw := iterator.Next(); raw != nil; raw = iterator.Next() {
vvi := raw.(*database.VersionVectorInfo)
if clientInfo.ID == vvi.ClientID {
continue
}

actorID, err := vvi.ClientID.ToActorID()
if err != nil {
return nil, err
}
attachedActorIDs = append(attachedActorIDs, actorID)

if minVersionVector == nil {
minVersionVector = vvi.VersionVector
continue
Expand All @@ -1399,11 +1379,10 @@ func (d *DB) UpdateAndFindMinSyncedVersionVector(
minVersionVector = versionVector
}

// 03-2. Compute min version vector with current client's version vector and filter detached clients.
// 02-2. Compute min version vector with current client's version vector.
minVersionVector = minVersionVector.Min(versionVector)
minVersionVector = minVersionVector.Filter(attachedActorIDs)

// 04. Update current client's version vector. If the client is detached, remove it.
// 03. Update current client's version vector. If the client is detached, remove it.
// This is only for the current client and does not affect the version vector of other clients.
if err = d.UpdateVersionVector(ctx, clientInfo, docRefKey, versionVector); err != nil {
return nil, err
Expand Down
36 changes: 7 additions & 29 deletions server/backend/database/mongo/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -1235,26 +1235,11 @@ func (c *Client) UpdateAndFindMinSyncedVersionVector(
docRefKey types.DocRefKey,
versionVector time.VersionVector,
) (time.VersionVector, error) {
// TODO(JOOHOJANG): We have to consider removing detached client's lamport
// from min version vector.
var versionVectorInfos []database.VersionVectorInfo

// 01. Prepare attachedActorIDs including the current client.
// If some clients are detached, we should remove them from the min version vector.
// For this, we use attachedActorIDs to filter the min version vector.
var attachedActorIDs []*time.ActorID
attached, err := clientInfo.IsAttached(docRefKey.DocID)
if err != nil {
return nil, err
}

if attached {
currentActorID, err := clientInfo.ID.ToActorID()
if err != nil {
return nil, err
}
attachedActorIDs = append(attachedActorIDs, currentActorID)
}

// 02. Find all version vectors of the given document from DB.
// 01. Find all version vectors of the given document from DB.
cursor, err := c.collection(ColVersionVectors).Find(ctx, bson.M{
"project_id": docRefKey.ProjectID,
"doc_id": docRefKey.DocID,
Expand All @@ -1267,21 +1252,15 @@ func (c *Client) UpdateAndFindMinSyncedVersionVector(
return nil, fmt.Errorf("decode version vectors: %w", err)
}

// 03. Compute min version vector.
// 02. Compute min version vector.
var minVersionVector time.VersionVector

// 03-1. Compute min version vector of other clients and collect attachedActorIDs.
// 02-1. Compute min version vector of other clients and collect attachedActorIDs.
for _, vvi := range versionVectorInfos {
if clientInfo.ID == vvi.ClientID {
continue
}

actorID, err := vvi.ClientID.ToActorID()
if err != nil {
return nil, err
}
attachedActorIDs = append(attachedActorIDs, actorID)

if minVersionVector == nil {
minVersionVector = vvi.VersionVector
continue
Expand All @@ -1293,11 +1272,10 @@ func (c *Client) UpdateAndFindMinSyncedVersionVector(
minVersionVector = versionVector
}

// 03-2. Compute min version vector with current client's version vector and filter detached clients.
// 02-2. Compute min version vector with current client's version vector.
minVersionVector = minVersionVector.Min(versionVector)
minVersionVector = minVersionVector.Filter(attachedActorIDs)

// 04. Update current client's version vector. If the client is detached, remove it.
// 03. Update current client's version vector. If the client is detached, remove it.
// This is only for the current client and does not affect the version vector of other clients.
if err = c.UpdateVersionVector(ctx, clientInfo, docRefKey, versionVector); err != nil {
return nil, err
Expand Down
Loading

0 comments on commit 7ad9e71

Please sign in to comment.