How does dedup work on an old pool that was updated to have fast dedup? #16839

nstiurca · 2024-12-04T18:05:07Z

nstiurca
Dec 4, 2024

I'd like some clarification on how dedup works on an old pool (which had dedup enabled) that was updated to use fast dedup. Per Fast Dedup Review Guide,

If the fast_dedup feature is enabled on such a pool [with an existing dedup table], new dedup tables will be created with all Fast Dedup features available, but the old ones will continue to work as they always have.

Can someone clarify this? It sounds like old dedup table is left as-is, and now there's a second dedup table. So you actually end up with more memory usage. 😮 Is that right? (if it's not, maybe #16481 should just be closed?)

So then how do writes work if there's 2 dudp tables? Does it end up checking for dupes against both tables? Does it only check against the new table? Do entries in the old table just remain read-only, or do they somehow get updated at any point? Surely if a block referenced by the old table is freed, that old table must be updated? Do you maybe end up with the same block having entries in both dedup tables, thus having 1 copy on disk for each dedup table it appears in?

Or perhaps "the old ones will continue to work as they always have" simply means that an old pool which does not enable the fast_dedup feature is what continues the work the same way?

in other words, is it old pools that leave fast_dedup feature disabled what continues to work the same, or are existing dedup tables from old pools continue to work the same when fast_dedup is enabled?

Answered by robn

Dec 5, 2024

when a block is written, it's hashed using whatever checksum algorithm is currently active. if a table already exists for that algorithm, that table (be it legacy or fast) is checked and updated accordingly. if a table doesn't exist for the algo, then a new one is created respecting the fast_dedup feature enablement.

Exactly correct.

and how does freeing blocks work, particularly blocks that are in a table other than the one for the current checksum algo? do blocks' metadata track which dedup table it is referenced in? I'm guessing that must be the case.

Every block in ZFS has a block pointer, containing the location on disk and a bunch of metadata about the block. All block pointers …

View full answer

amotin · 2024-12-04T23:49:40Z

amotin
Dec 4, 2024
Collaborator

ZFS has separate dedup tables for each checksum algorithm. Once you upgrade while already having dedup table(s) for some algorithm(s) those will keep using old dedup. But if you delete all your data using some algorithm or switch algorithm to previously unused, then its new dedup table will use new format. Obviously data using different checksum algorithms can not deduped with each other. But there is no double accounting.

5 replies

IvanVolosyuk Dec 5, 2024

Oh, that probably means that without changing the checksum algorithm fast dedup will not be used at all as long as there are records in the old table.

What about changing the checksum algorithm. Does it applies just to the new files or new blocks in existing files as well (e.g. VM images)?

amotin Dec 5, 2024
Collaborator

Right. No fast dedup in that case. You will get some minor things like ZAP shrinking and quota, but that is it, not DDT log, etc.

Any new blocks written after the algorithm change.

nstiurca Dec 5, 2024
Author

ok so let's see if I got this:

when a block is written, it's hashed using whatever checksum algorithm is currently active. if a table already exists for that algorithm, that table (be it legacy or fast) is checked and updated accordingly. if a table doesn't exist for the algo, then a new one is created respecting the fast_dedup feature enablement.

and how does freeing blocks work, particularly blocks that are in a table other than the one for the current checksum algo? do blocks' metadata track which dedup table it is referenced in? I'm guessing that must be the case.

concretely: let's say you have an old pool with dedup on an checksum algo foo. you write two files biz and baz which happen to have an identical first block, let's call it block 42. so that block is stored in the old dedup table with a ref count of 2. then you enable fast dedup, and edit the file biz. block 42's ref count drops to 1, and a new block, say 44, is allocated for the modified biz. it gets a new entry in the original dedup table. now let's change the checksum algorithm to bar algo. we modify the file baz now, which drops the recount on block 42 to 0; the block and its entry in foo table is freed. meanwhile, the new block, let's say 45, is checksumed with bar algorithm, and a fast dedup table (bar algorithm) is created to account for block 45. at this point it's possible that blocks 44 and 45 are actually identical, but they've each been hashed with different algorithms and have a ref count of 1 in their respective tables. modifying biz again (or deleting it) would free block 44 from the pool and the foo (old/original) dedup table.

(of course I'm ignoring any complications from snapshots or compression or encryption or other things that might alter the representation of the data on disk, or that might increase any of the ref counts)

robn Dec 5, 2024
Collaborator

when a block is written, it's hashed using whatever checksum algorithm is currently active. if a table already exists for that algorithm, that table (be it legacy or fast) is checked and updated accordingly. if a table doesn't exist for the algo, then a new one is created respecting the fast_dedup feature enablement.

Exactly correct.

and how does freeing blocks work, particularly blocks that are in a table other than the one for the current checksum algo? do blocks' metadata track which dedup table it is referenced in? I'm guessing that must be the case.

Every block in ZFS has a block pointer, containing the location on disk and a bunch of metadata about the block. All block pointers include the checksum algorithm and data from when the block was written, regardless of the current checksum setting. Blocks tracked in the dedup table also have a extra flag set in the block pointer to indicate this. So when that block is "freed", ZFS sees the "used dedup" flag, so uses the algorithm to lookup the dedup table, and then the checksum to do the lookup. If it's found, we reduce the reference count and return "ok, freed it!".

concretely: ... at this point it's possible that blocks 44 and 45 are actually identical, but they've each been hashed with different algorithms and have a ref count of 1 in their respective tables.

A+. I've quoted what I think is the most important thing to understand, but the whole paragraph is right, nice.

(of course I'm ignoring any complications from snapshots or compression or encryption or other things that might alter the representation of the data on disk, or that might increase any of the ref counts)

Yep, it's all about the on-disk representation.

Specifically the inputs to the block "key" in the dedup table are the 256-bit checksum, logical and physical (pre- & post- compression) size, the compression algorithm and the "is encrypted" flag (the checksum algorithm isn't part of the key). So as long as these are all the same, they will match for dedup purposes. So a change in compression, or different block size (recordsize or volblocksize), or encryption changes, will make two blocks not match for dedup purposes, even if the actual user-visible data is the same.

Answer selected by nstiurca

nstiurca Dec 5, 2024
Author

Thank you so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does dedup work on an old pool that was updated to have fast dedup? #16839

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How does dedup work on an old pool that was updated to have fast dedup? #16839

nstiurca Dec 4, 2024

Replies: 1 comment · 5 replies

amotin Dec 4, 2024 Collaborator

IvanVolosyuk Dec 5, 2024

amotin Dec 5, 2024 Collaborator

nstiurca Dec 5, 2024 Author

robn Dec 5, 2024 Collaborator

nstiurca Dec 5, 2024 Author

nstiurca
Dec 4, 2024

Replies: 1 comment 5 replies

amotin
Dec 4, 2024
Collaborator

amotin Dec 5, 2024
Collaborator

nstiurca Dec 5, 2024
Author

robn Dec 5, 2024
Collaborator

nstiurca Dec 5, 2024
Author