Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VAULT-28677: Fix dangling entity-aliases in MemDB after invalidation #27750

Merged
merged 10 commits into from
Jul 25, 2024
3 changes: 3 additions & 0 deletions changelog/27750.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
```release-note:bug
core/identity: Fixed an issue where deleted/reassigned entity-aliases were not removed from in-memory database.
```
52 changes: 33 additions & 19 deletions vault/identity_store.go
Original file line number Diff line number Diff line change
Expand Up @@ -721,34 +721,48 @@ func (i *IdentityStore) invalidateEntityBucket(ctx context.Context, key string)
}
}

// If the entity is not in MemDB or if it is but differs from the
// state that's in the bucket storage entry, upsert it into MemDB.

// We've considered the use of github.com/google/go-cmp here,
// but opted for sticking with reflect.DeepEqual because go-cmp
// is intended for testing and is able to panic in some
// situations.
if memDBEntity == nil || !reflect.DeepEqual(memDBEntity, bucketEntity) {
// The entity is not in MemDB, it's a new entity. Add it to MemDB.
err = i.upsertEntityInTxn(ctx, txn, bucketEntity, nil, false)
if err != nil {
i.logger.Error("failed to update entity in MemDB", "entity_id", bucketEntity.ID, "error", err)
if memDBEntity != nil && reflect.DeepEqual(memDBEntity, bucketEntity) {
// No changes on this entity, move on to the next one.
continue
}

// If the entity exists in MemDB it must differ from the entity in
// the storage bucket because of above test. Blindly delete the
// current aliases associated with the MemDB entity. The correct set
// of aliases will be created in MemDB by the upsertEntityInTxn
// function. We need to do this because the upsertEntityInTxn
// function does not delete those aliases, it only creates missing
// ones.
if memDBEntity != nil {
biazmoreira marked this conversation as resolved.
Show resolved Hide resolved
if err := i.deleteAliasesInEntityInTxn(txn, memDBEntity, memDBEntity.Aliases); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcboudreau @elliesterner brought up a valid point while we were talking about entity merge prevention. Do you think not deleting the entity from memdb might cause an automatic merge to be triggered? if you could write a test for that, that would be awesome. we would like to prevent further merges from happening.

Copy link
Contributor Author

@marcboudreau marcboudreau Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From looking at the code in (*IdentityStore).upsertEntityInTxn, there are 2 circumstances that lead to (*IdentityStore).mergeEntityAsPartOfUpsert being called:

  1. previousEntity is not nil and entity has an alias in its Aliases field that exists in MemDB and whose CanonicalID field is set to the value of previousEntity.ID
  2. entity has an alias in its Aliases field that exists in MemDB and whose CanonicalID field is set to a value that is different than entity.ID.

In the (*IdentityStore).invalidateEntityBucket function, when upsertEntityInTxn is called, the previousEntity argument is always nil, so that rules out circumstance 1.

And by pre-deleting the aliases, we ensure that circumstance 2 cannot happen either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of clarity, pre-deleting the entity from MemDB and then calling (*IdentityStore).upsertEntityInTxn won't prevent an entity merge from happening, since the logic that decides that doesn't take into account whether the entity exists in MemDB or not. I think the only way to prevent an entity merge from happening, would be to scan each of the aliases associated with the entity (instead of pre-deleted them) and search for any alias in MemDB with a matching alias name and mount accessor and delete those. That would make it impossible for circumstance 2 to happen.

i.logger.Error("failed to remove entity aliases from changed entity", "entity_id", memDBEntity.ID, "error", err)
return
}
}

// If this is a performance secondary, the entity created on
// this node would have been cached in a local cache based on
// the result of the CreateEntity RPC call to the primary
// cluster. Since this invalidation is signaling that the
// entity is now in the primary cluster's storage, the locally
// cached entry can be removed.
if i.localNode.ReplicationState().HasState(consts.ReplicationPerformanceSecondary) && i.localNode.HAState() == consts.Active {
if err := i.localAliasPacker.DeleteItem(ctx, bucketEntity.ID+tmpSuffix); err != nil {
i.logger.Error("failed to clear local alias entity cache", "error", err, "entity_id", bucketEntity.ID)
return
}
err = i.upsertEntityInTxn(ctx, txn, bucketEntity, nil, false)
if err != nil {
i.logger.Error("failed to update entity in MemDB", "entity_id", bucketEntity.ID, "error", err)
return
}

// If this is a performance secondary, the entity created on
// this node would have been cached in a local cache based on
// the result of the CreateEntity RPC call to the primary
// cluster. Since this invalidation is signaling that the
// entity is now in the primary cluster's storage, the locally
// cached entry can be removed.
if i.localNode.ReplicationState().HasState(consts.ReplicationPerformanceSecondary) && i.localNode.HAState() == consts.Active {
if err := i.localAliasPacker.DeleteItem(ctx, bucketEntity.ID+tmpSuffix); err != nil {
i.logger.Error("failed to clear local alias entity cache", "error", err, "entity_id", bucketEntity.ID)
return
Comment on lines +758 to +767
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole dance is fascinating. I'm a little curious how you discovered it here - it seems like this being missing is an unrelated bug to the regression right?

@biazmoreira you probably know all about this already, is this another place that breaks the mental model of all global writes go to primary because we updated memdb with a global thing outside of replication? I think we saw places like that with standbys but this was new to me that we have perf secondaries updating their memdb outside of replication for replicated state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone who reads this later, I realised this wasn't new code - just moved down from further up (see lines 739 and on in the before part of this diff).

}
}

}
}

Expand Down
126 changes: 126 additions & 0 deletions vault/identity_store_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -992,6 +992,132 @@ func TestIdentityStoreInvalidate_Entities(t *testing.T) {
txn.Commit()
}

// TestIdentityStoreInvalidate_EntityAliasDelete verifies that the
// invalidateEntityBucket method properly cleans up aliases from
// MemDB that are no longer associated with the entity in the
// storage bucket.
func TestIdentityStoreInvalidate_EntityAliasDelete(t *testing.T) {
ctx := namespace.ContextWithNamespace(context.Background(), namespace.RootNamespace)
c, _, root := TestCoreUnsealed(t)

// Enable a No-Op auth method
c.credentialBackends["noop"] = func(context.Context, *logical.BackendConfig) (logical.Backend, error) {
return &NoopBackend{
BackendType: logical.TypeCredential,
}, nil
}
mountAccessor1 := "noop-accessor1"
mountAccessor2 := "noop-accessor2"
mountAccessor3 := "noon-accessor3"

createMountEntry := func(path, uuid, mountAccessor string, local bool) *MountEntry {
return &MountEntry{
Table: credentialTableType,
Path: path,
Type: "noop",
UUID: uuid,
Accessor: mountAccessor,
BackendAwareUUID: uuid + "backend",
NamespaceID: namespace.RootNamespaceID,
namespace: namespace.RootNamespace,
Local: local,
}
}

c.auth = &MountTable{
Type: credentialTableType,
Entries: []*MountEntry{
createMountEntry("/noop1", "abcd", mountAccessor1, false),
createMountEntry("/noop2", "ghij", mountAccessor2, false),
createMountEntry("/noop3", "mnop", mountAccessor3, true),
},
}

require.NoError(t, c.setupCredentials(context.Background()))

// Create an entity
req := &logical.Request{
ClientToken: root,
Operation: logical.UpdateOperation,
Path: "entity",
Data: map[string]interface{}{
"name": "alice",
},
}

resp, err := c.identityStore.HandleRequest(ctx, req)
require.NoError(t, err)
require.NotNil(t, resp)
require.Contains(t, resp.Data, "id")

entityID := resp.Data["id"].(string)

// Create two entity-aliases
createEntityAlias := func(name, mountAccessor string) string {
req = &logical.Request{
ClientToken: root,
Operation: logical.UpdateOperation,
Path: "entity-alias",
Data: map[string]interface{}{
"name": name,
"canonical_id": entityID,
"mount_accessor": mountAccessor,
},
}

resp, err = c.identityStore.HandleRequest(ctx, req)
require.NoError(t, err)
require.NotNil(t, resp)
require.Contains(t, resp.Data, "id")

return resp.Data["id"].(string)
}

alias1ID := createEntityAlias("alias1", mountAccessor1)
alias2ID := createEntityAlias("alias2", mountAccessor2)
alias3ID := createEntityAlias("alias3", mountAccessor3)

// Update the entity in storage only to remove alias2 then call invalidate
bucketKey := c.identityStore.entityPacker.BucketKey(entityID)
bucket, err := c.identityStore.entityPacker.GetBucket(context.Background(), bucketKey)
require.NoError(t, err)
require.NotNil(t, bucket)

bucketEntityItem := bucket.Items[0] // since there's only 1 entity
bucketEntity, err := c.identityStore.parseEntityFromBucketItem(context.Background(), bucketEntityItem)
require.NoError(t, err)
require.NotNil(t, bucketEntity)

replacementAliases := make([]*identity.Alias, 1)
for _, a := range bucketEntity.Aliases {
if a.ID != alias2ID {
replacementAliases[0] = a
break
}
}

bucketEntity.Aliases = replacementAliases

bucketEntityItem.Message, err = anypb.New(bucketEntity)
require.NoError(t, err)

require.NoError(t, c.identityStore.entityPacker.PutItem(context.Background(), bucketEntityItem))

c.identityStore.Invalidate(context.Background(), bucketKey)

alias1, err := c.identityStore.MemDBAliasByID(alias1ID, false, false)
assert.NoError(t, err)
assert.NotNil(t, alias1)

alias2, err := c.identityStore.MemDBAliasByID(alias2ID, false, false)
assert.NoError(t, err)
assert.Nil(t, alias2)

alias3, err := c.identityStore.MemDBAliasByID(alias3ID, false, false)
assert.NoError(t, err)
assert.NotNil(t, alias3)
}

// TestIdentityStoreInvalidate_LocalAliasesWithEntity verifies the correct
// handling of local aliases in the Invalidate method.
func TestIdentityStoreInvalidate_LocalAliasesWithEntity(t *testing.T) {
Expand Down
1 change: 1 addition & 0 deletions website/content/docs/release-notes/1.16.1.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ description: |-
| 1.16.1 - 1.16.3 | [New nodes added by autopilot upgrades provisioned with the wrong version](/vault/docs/upgrading/upgrade-to-1.15.x#new-nodes-added-by-autopilot-upgrades-provisioned-with-the-wrong-version) |
| 1.15.8+ | [Autopilot upgrade for Vault Enterprise fails](/vault/docs/upgrading/upgrade-to-1.15.x#autopilot) |
| 1.16.5 | [Listener stops listening on untrusted upstream connection with particular config settings](/vault/docs/upgrading/upgrade-to-1.16.x#listener-proxy-protocol-config) |
| 1.16.3 - 1.16.6 | [Vault standby nodes not deleting removed entity-aliases from in-memory database](/vault/docs/upgrade-to-1.16.x#dangling-entity-alias-in-memory) |

## Vault companion updates

Expand Down
1 change: 1 addition & 0 deletions website/content/docs/release-notes/1.17.0.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ description: |-
| Known issue (1.17.0) | [Vault Agent and Vault Proxy consume excessive amounts of CPU](/vault/docs/upgrading/upgrade-to-1.17.x#agent-proxy-cpu-1-17) |
| Known issue (1.15.8+) | [Autopilot upgrade for Vault Enterprise fails](/vault/docs/upgrading/upgrade-to-1.15.x#autopilot) |
| Known issue (1.17.1) | [Listener stops listening on untrusted upstream connection with particular config settings](/vault/docs/upgrading/upgrade-to-1.17.x#listener-proxy-protocol-config) |
| Known issue (1.17.0 - 1.17.2) | [Vault standby nodes not deleting removed entity-aliases from in-memory database](/vault/docs/upgrade-to-1.17.x#dangling-entity-alias-in-memory)

## Vault companion updates

Expand Down
2 changes: 2 additions & 0 deletions website/content/docs/upgrading/upgrade-to-1.16.x.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -115,3 +115,5 @@ decides to trigger the flag. More information can be found in the
@include 'known-issues/1_16_secrets-sync-chroot-activation.mdx'

@include 'known-issues/config_listener_proxy_protocol_behavior_issue.mdx'

@include 'known-issues/dangling-entity-aliases-in-memory.mdx'
2 changes: 2 additions & 0 deletions website/content/docs/upgrading/upgrade-to-1.17.x.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,5 @@ incorrectly. For additional details, refer to the
@include 'known-issues/config_listener_proxy_protocol_behavior_issue.mdx'

@include 'known-issues/transit-input-on-cmac-response.mdx'

@include 'known-issues/dangling-entity-aliases-in-memory.mdx'
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
<a id="dangling-entity-alias-in-memory" />

### Deleting an entity-aliases does not remove it from the in-memory database on standby nodes

#### Affected versions

##### Vault Community Edition

- 1.16.3
- 1.17.0 - 1.17.2

##### Enterprise

- 1.16.3+ent - 1.16.6+ent
- 1.17.0+ent - 1.17.2+ent

#### Issue

Entity-aliases deleted from Vault are not removed from the in-memory database on
standby nodes within a cluster. As a result, API requests to create a new
entity-alias with the same mount accessor and name sent to a standby node will
fail.

This bug is fixed in Vault 1.16.7+ent, 1.17.3, 1.17.3+ent and later.
Loading