Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

keyword patterns in push rules apply against apostrophised words #7862

Closed
Torwegia opened this issue Jul 16, 2020 · 6 comments
Closed

keyword patterns in push rules apply against apostrophised words #7862

Torwegia opened this issue Jul 16, 2020 · 6 comments
Labels
A-Push Issues related to push/notifications z-bug (Deprecated Label) z-p2 (Deprecated Label)

Comments

@Torwegia
Copy link

Torwegia commented Jul 16, 2020

Description

Currently a regex is used to search for display and usernames to determine whether or not to notify the user that they were mentioned in a message. This regex however incorrectly will determine that usernames/displaynames are in things like urls or words containing appostrophes. This leads to cases where the notification system is unable to be used.

The regex used to determine word boundaries is missing some cases where it should not be breaking. The regex in question can be found here. The logic here breaks words on things like apostrophes meaning some words are incorrectly broken up.

The core of the issue from what I can tell is that \W causes the word boundary logic to be overly eager to split up words. Commonly used things like urls are considered to be many words by this logic.

Steps to reproduce

  • Use _re_word_boundary to wrap the string "Angelo"
  • Compile resulting regex with flags=re.IGNORECASE
  • Using the compiled regex search through a string like "D'Angelo rode the bus home yesterday."
  • Observe a match.

In this example Angelo should not be matched.

Version information

  • Homeserver:

A modular.im hosted server.

  • Version:
    Versions:
    Synapse version:1.17.0
    Python version:3.7.8

  • Install method:
    Irrelevant

  • Platform:
    Irrelevant

@clokep clokep added z-bug (Deprecated Label) z-p2 (Deprecated Label) labels Jul 16, 2020
@clokep
Copy link
Member

clokep commented Jul 16, 2020

Thanks for the report. Do you have any suggestions of how to improve this? Figuring out what is and isn't a word boundary isn't a trivial task unfortunately.

@richvdh
Copy link
Member

richvdh commented Jul 16, 2020

I've also got to ask: where is this code used? In other words, what user-visible symptoms does this cause?

@Torwegia
Copy link
Author

@richvdh This is used to determine if a username or display name is in a body of text. The user impact is mostly erratic notification behavior. I would imagine the worst cause would be a user with the username/display name of matrix being in a room with many people on the matrix.org home server.

@clokep I am unsure. Perhaps it can be reworked to handle contractions. I guess with unicode \W is much more than just [^0-9a-zA-F_].

@richvdh
Copy link
Member

richvdh commented Jul 16, 2020

@Torwegia would you mind updating the issue summary so that it describes the symptoms rather than the cause? it'll help us establish how to prioritise it.

User mentions are due for a rework anyway; they are known to be brittle for many reasons. See also https://github.com/matrix-org/matrix-doc/issues/1549.

@Torwegia
Copy link
Author

Certainly I should have some time tomorrow!

Thank you both for your time!

@Torwegia Torwegia changed the title _re_word_boundary splits words when it shouldn't _re_word_boundary splits words when it shouldn't causing notifications for users when not intended. Jul 29, 2020
@richvdh richvdh changed the title _re_word_boundary splits words when it shouldn't causing notifications for users when not intended. keyword patterns in push rules apply against apostrophised words Aug 11, 2020
@richvdh richvdh added the A-Push Issues related to push/notifications label Aug 11, 2020
@reivilibre
Copy link
Contributor

Intentional mentions should make this much better for someone whose name is often in this situation; see #15487 for the tracking issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Push Issues related to push/notifications z-bug (Deprecated Label) z-p2 (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

4 participants