-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HashSet<string> breaks if copied from one that uses a non-singleton default equality comparer #107222
Comments
Tagging subscribers to this area: @dotnet/area-system-collections |
Can the bug be reproduced without serialization?
In this scenario, I think EqualityComparersAreEqual would return true, and the constructor would call |
It seems like this is the case. I tried a few brute attempts at hitting the If I trigger the code path by reflection, it seems like it does cause the issue. #if !NETFRAMEWORK
[TestMethod]
public void HashSet_BreaksAfterRandomizeHashes()
{
HashSet<string> set = new HashSet<string>() { "a", "b" };
// Get the Resize(int, bool) method via reflection:
MethodInfo method = typeof(HashSet<string>).GetMethod("Resize", BindingFlags.NonPublic | BindingFlags.Instance, new Type[] { typeof(int), typeof(bool) });
method.Invoke(set, new object[] { 4, true });
Assert.IsTrue(set.Contains("a"));
HashSet<string> setClone = new HashSet<string>(set);
Assert.IsTrue(set.Contains("a"));
Assert.IsTrue(setClone.Contains("a"), "Fails!");
}
#endif The copy constructor notably has to not only head through |
Copying a Dictionary<TKey, TValue> uses AddRange, which compares |
If I'm understanding the issue correctly, the problem lies with the |
I've marked this as up-for-grabs as it seems like a straightforward change that should be accompanied with relevant testing. Perhaps it might be worth auditing other potential misuses of the |
@wnayes Thank you for the exceptionally detailed issue and the investigation you put into this! We will backport this fix into .NET 9 RC2, which releases in October. .NET 9's GA release is in November. Will you be able to target .NET 9 for your scenario once that release is available? |
Glad it is being fixed! I was able to work around the particular instance of the issue I had encountered by updating an in-house serializer, but it will be good to not have to worry about the issue occurring in other ways. The product I work on targets LTS releases, so I would be looking forward to it on .NET 10. |
Description
There seems to be a hidden assumption within
HashSet<string>
that, if broken, leads to a set containing strings that fail to be looked up. The assumption is that the instance returned byEqualityComparer<string>.Default
is the only instance of its type.This is a simple reproduction of the issue:
Of course, the question would then be - why would there ever be a second instance of the type backed by
EqualityComparer<string>.Default
?The NonRandomizedStringEqualityComparer in its
ISerializable
implementation indicates viaGetObjectData
that it should serialize as aGenericEqualityComparer<string>
:So an implementation of a serializer that respects
ISerializable
would probably create a new instance ofGenericEqualityComparer<string>
when serializingNonRandomizedStringEqualityComparer
(the default comparer thatHashSet<string>
seems to use, at least in my testing).In other words, a realistic impact of this bug is: if you round trip a
HashSet<string>
through anISerializable
serializer, and then copy it via copy constructor, the copy will be broken. Here is a demonstration of this:Reproduction Steps
Run either of the above test cases in .NET 8.
Expected behavior
The
HashSet<string>
should always report that it contains strings that are within it. The tests should pass on .NET Framework and .NET 8.Actual behavior
The tests pass on .NET Framework, and fail on .NET 8. (I did not try .NET 9, but I have been looking at source.dot.net when investigating this, and don't see any indication it would be fixed.)
Regression?
This was a regression noticed when migrating to .NET 8 from .NET Framework.
Known Workarounds
A serializer can special case these three singletons:
EqualityComparer<string>.Default
StringComparer.Ordinal
StringComparer.OrdinalIgnoreCase
If a serializer sees any of these, and upon deserializing ensures it returns the original singleton, assumptions can hold.
Configuration
.NET 8 / Windows 11, unlikely to be specific.
Other information
I'm not exactly sure how you would fix this, or if it would be realistic to do so at this point.
My understanding of the problem is that
HashSet<string>
ends up copying its backing bucket data incorrectly when the copy is made. It goes through this optimized code path when it shouldn't:In the problem circumstance,
this
is the new copyHashSet<string>
that is being created, and it is using aNonRandomizedStringEqualityComparer
instance that it gets from GetStringComparer. TheNonRandomizedStringEqualityComparer
basically "wraps" theEqualityComparer<string>.Default
, which is aGenericEqualityComparer<string>
.The
otherAsHashSet
is in the unexpected state where its comparer is a non-singletonGenericEqualityComparer<string>
, which isn't wrapped. The comparer isn't wrapped because GetStringComparer only wraps 3 specific singletons, when the given comparer is reference equal to one of them. (This is the assumption I refer to at the start of the issue.)The problem arises because EqualityComparersAreEqual is implemented as
set1.Comparer.Equals(set2.Comparer)
and theComparer
property "unwraps" a wrapped comparer.this.Comparer
will return an unwrappedGenericEqualityComparer<string>
, andotherAsHashSet.Comparer
returns aGenericEqualityComparer<string>
that was never wrapped. I think these two non-reference equal instances ofGenericEqualityComparer<string>
are considered equal byEquals
, leading to the bucket data copy inConstructFrom
. But it's not safe to copy bucket data, since the two sets are actually using comparers that use different hash codes.For the bucket data copy scenario, I think it would be more correct to check
set1._comparer.Equals(set2._comparer)
so the unwrapping doesn't hide the fact that the comparers are actually different.The text was updated successfully, but these errors were encountered: