Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnnotatedString equality contradicts docs #55244

Open
aplavin opened this issue Jul 25, 2024 · 14 comments
Open

AnnotatedString equality contradicts docs #55244

aplavin opened this issue Jul 25, 2024 · 14 comments
Labels
equality Issues relating to equality relations: ==, ===, isequal strings "Strings!"

Comments

@aplavin
Copy link
Contributor

aplavin commented Jul 25, 2024

Docs:

  ==(x, y)

  Generic equality operator. Falls back to ===. Should be implemented for all types with a notion of equality, based on the abstract value
  that an instance represents. For example, all numeric types are compared by numeric value, ignoring type. Strings are compared as
  sequences of characters, ignoring encoding. 
<...>

  ==(a::AbstractString, b::AbstractString) -> Bool

  Test whether two strings are equal character by character (technically, Unicode code point by code point).

Actual behavior:

julia> s1 = Base.AnnotatedString("abc def", [(1:1, :x => :y)])
"abc def"

julia> s2 = "abc def"
"abc def"

julia> s1 == s2
false

Either the docs or the implementation are clearly wrong...

@nsajko nsajko added the equality Issues relating to equality relations: ==, ===, isequal label Jul 25, 2024
@tecosaur
Copy link
Contributor

In your example s1 and s2 are not equal because the first character of s1 has an annotation. This lines up with "Test whether two strings are equal character by character", but not the "(technically... )" bit.

Initially, I didn't have annotations matter when comparing an AnnotatedString with an unannotated string, but that makes string equality non-transitive, which is a big no-no.

@aplavin
Copy link
Contributor Author

aplavin commented Jul 25, 2024

Not sure how annotations affect code points (docs: "Unicode code point by code point").
Also, the straightforward interpretation of "Strings are compared as sequences of characters, ignoring encoding" (from the same excerpt) seems to imply comparing, well, characters themselves :)
No matter how you encode it (with or without annotations) it still is the same character, right?

  The AbstractChar type is the supertype of all character implementations in
  Julia. A character represents a Unicode code point, and can be converted to
  an integer via the codepoint function in order to obtain the numerical value
  of the code point, or constructed from the same integer. These numerical
  values determine how characters are compared with < and ==, for example. New
  T <: AbstractChar types should define a codepoint(::T) method and a
  T(::UInt32) constructor, at minimum

that makes string equality non-transitive, which is a big no-no

Sure! I expected == to ignore annotations, following all these explanations in the docs.

@tecosaur
Copy link
Contributor

Not sure how annotations affect code points

They don't, but the reasoning I put forwards is predicated on a character with annotations not being equal to a character without annotations.

Sure! I expected == to ignore annotations, following all these explanations in the docs.

We could do this, but I'd expect this behavior to be potentially surprising.

I think the situation can be summed up like so:

Current behavior

A string without annotations is equal to an unannotated string.

Annotated strings are equal when the underlying string is equal, as are annotations.

Option 1

Loosen the "technically" to something like "often" in the ==(::AbstractString, ::AbstractString) docstring.

Option 2

Ignore annotations entirely when comparing AnnotatedStrings and other strings with ==.

@fatteneder
Copy link
Member

fatteneder commented Jul 26, 2024

Option 3

Just document

==(a::AnnotatedString, b::AnnotatedString) -> Bool

  Test whether their strings and annotations are equal.

@tecosaur
Copy link
Contributor

Ah yep, though maybe that would need to explicitly mention

==(a::AnnotatedString, b::AnnotatedString)
==(a::AbstractString, b::AnnotatedString)
==(a::AnnotatedString, b::AbstractString)

@aplavin
Copy link
Contributor Author

aplavin commented Aug 7, 2024

Yeah the current situtation is definitely weird:

julia> a = Base.AnnotatedString("abc", [(2:3, :x => 123)])
"abc"

julia> a[1] == 'a' && a[2] == 'b' && a[3] == 'c'
true

julia> a == "abc"
false

@tecosaur
Copy link
Contributor

Ahh, I think that the AnnotatedChar equality should be updated to match.

@nhz2
Copy link
Contributor

nhz2 commented Aug 13, 2024

Option 4

Don't make AnnotatedString a subtype of AbstractString

This way AnnotatedString is free to define whatever equality methods it wants without being constrained by the existing AbstractString interface.

@tecosaur
Copy link
Contributor

Half the point of AnnotatedString is that it can be passed to functions that are written for an AbstractString and don't know what to do with annotations and it "just works", automatically falling back to a plain un-annotated string 🙃

@nhz2
Copy link
Contributor

nhz2 commented Aug 13, 2024

The issue is that as a Julia user, I have no idea which functions that use AbstractString are going to suddenly have different behavior when passed an AnnotatedString because of some strange overload or piracy. If you want a function to have fundamentally different behavior for AnnotatedString why not just give the function a new name, for example, instead of == define isequal_annotated(a::AbstractString, b::AbstractString) = a == b && annotations(a) == annotations(b)

And define a fall back: annotations(::AbstractString) = Tuple{UnitRange{Int64}, Pair{Symbol, Any}}[]

@tecosaur
Copy link
Contributor

Mmmm, I think there's an argument to be made for abandoning the inclusion of annotations in the concept of string equality, and require them to be checked separately if you do care.

I'm not sure if this is the path we should take, but that is the alternative to the current situation.

@aplavin
Copy link
Contributor Author

aplavin commented Sep 11, 2024

Judging by the AbstractChar / AbstractString docs, there are only two choices to keep everything consistent:

  • AnnotatedChar/String <: AbstractChar/String and equality is decided by the codeunit(s) ignoring annotations
  • !(AnnotatedChar/String <: AbstractChar/String)

@aplavin aplavin changed the title AnnotatedString equality contradicts docs AnnotatedChar/String equality contradicts docs Oct 8, 2024
@aplavin aplavin changed the title AnnotatedChar/String equality contradicts docs AnnotatedString equality contradicts docs Oct 8, 2024
@aplavin
Copy link
Contributor Author

aplavin commented Oct 8, 2024

This issue remains in the actual release. Still, as I understand all this Annotated/Styled Strings/Chars stuff is officially considered experimental, and any future fixes won't be breaking.

@tecosaur
Copy link
Contributor

tecosaur commented Oct 8, 2024

Indeed, this still needs to be worked out, but we've added the "Experimental!" note to give ourselves the space to do so.

@nsajko nsajko added the strings "Strings!" label Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
equality Issues relating to equality relations: ==, ===, isequal strings "Strings!"
Projects
None yet
Development

No branches or pull requests

5 participants