-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
keyword patterns in push rules apply against apostrophised words #7862
Comments
Thanks for the report. Do you have any suggestions of how to improve this? Figuring out what is and isn't a word boundary isn't a trivial task unfortunately. |
I've also got to ask: where is this code used? In other words, what user-visible symptoms does this cause? |
@richvdh This is used to determine if a username or display name is in a body of text. The user impact is mostly erratic notification behavior. I would imagine the worst cause would be a user with the username/display name of matrix being in a room with many people on the matrix.org home server. @clokep I am unsure. Perhaps it can be reworked to handle contractions. I guess with unicode \W is much more than just [^0-9a-zA-F_]. |
@Torwegia would you mind updating the issue summary so that it describes the symptoms rather than the cause? it'll help us establish how to prioritise it. User mentions are due for a rework anyway; they are known to be brittle for many reasons. See also https://github.com/matrix-org/matrix-doc/issues/1549. |
Certainly I should have some time tomorrow! Thank you both for your time! |
Intentional mentions should make this much better for someone whose name is often in this situation; see #15487 for the tracking issue. |
Description
Currently a regex is used to search for display and usernames to determine whether or not to notify the user that they were mentioned in a message. This regex however incorrectly will determine that usernames/displaynames are in things like urls or words containing appostrophes. This leads to cases where the notification system is unable to be used.
The regex used to determine word boundaries is missing some cases where it should not be breaking. The regex in question can be found here. The logic here breaks words on things like apostrophes meaning some words are incorrectly broken up.
The core of the issue from what I can tell is that \W causes the word boundary logic to be overly eager to split up words. Commonly used things like urls are considered to be many words by this logic.
Steps to reproduce
In this example Angelo should not be matched.
Version information
A modular.im hosted server.
Version:
Versions:
Synapse version:1.17.0
Python version:3.7.8
Install method:
Irrelevant
Platform:
Irrelevant
The text was updated successfully, but these errors were encountered: