Add domain validation in DomainSpecificString #4351

interru · 2019-01-05T04:08:42Z

FIXES #4088

This adds an additional validation check whether the domain/hostname meets the criteria of a valid hostname

Signed-off-by: Mathias Koehler <[email protected]>

codecov-io · 2019-01-05T04:36:01Z

Codecov Report

Merging #4351 into develop will decrease coverage by <.01%.
The diff coverage is 80.95%.

@@             Coverage Diff             @@
##           develop    #4351      +/-   ##
===========================================
- Coverage    73.73%   73.72%   -0.01%     
===========================================
  Files          300      300              
  Lines        29752    29957     +205     
  Branches      4879     4933      +54     
===========================================
+ Hits         21937    22086     +149     
- Misses        6390     6430      +40     
- Partials      1425     1441      +16

Impacted Files	Coverage Δ
synapse/crypto/context_factory.py	`87.5% <100%> (+2.25%)`	⬆️
synapse/types.py	`79.65% <78.94%> (-0.86%)`	⬇️
synapse/rest/client/v2_alpha/account_data.py	`56% <0%> (-29.37%)`	⬇️
synapse/util/file_consumer.py	`80.7% <0%> (-1.76%)`	⬇️
synapse/handlers/search.py	`80.24% <0%> (ø)`	⬆️
synapse/handlers/federation.py	`61.72% <0%> (ø)`	⬆️
synapse/api/filtering.py	`97.75% <0%> (+0.47%)`	⬆️
synapse/python_dependencies.py	`50.76% <0%> (+3.15%)`	⬆️
synapse/config/server.py	`68.5% <0%> (+4.4%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29f20a8...3daf185. Read the comment docs.

richvdh

A couple of thoughts here:

we already have synapse.http.endpoint.parse_and_validate_server_name, which does some of this stuff. We should be using (and possibly extending) that rather than inventing a new set of checks.
The code you are changing here is on a hot path - we'll end up spending a non-trivial amount of resources validating server names which are already known to be valid. I'd prefer it if we validated identifiers at the point of entry to the system - for example, createRoom fails with a 500 error if a badly-formatted mxid is in the invitees list #4088 could be fixed with better checks in the RoomCreationHandler.

richvdh · 2019-01-07T16:50:00Z

synapse/types.py

+        try:
+            localpart, domain = s[1:].split(':', 1)
+            if ':' in domain:
+                domain, port = domain.split(':')


this looks like it will fail on IPv6 addresses.

Good point! Completely forgot IPv6.

interru · 2019-01-07T17:06:06Z

Wrong button ;)

interru · 2019-01-07T19:04:59Z

The code you are changing here is on a hot path - we'll end up spending a non-trivial amount of resources validating server names which are already known to be valid.

Thats raises the question why UserIDs (& co) get converted back to an normal string and parsed a second time?! The only places where we should get it as string and not as an instance is on IO. The only IO you can trust is the Database (assuming the data in the database was already validated).

In my opinion we also should only handle the domain in one form (IDNA or unicode). Currently both forms are accepted which could perhaps cause weird issues if someone mixes them.

I need to make myself more familiar with the code, spec and ecosystem. Could also be that I currently might seeing problems because of Dunning-Kruger effect. ;)

richvdh · 2019-01-07T20:38:17Z

Thats raises the question why UserIDs (& co) get converted back to an normal string and parsed a second time?! The only places where we should get it as string and not as an instance is on IO. The only IO you can trust is the Database (assuming the data in the database was already validated).

well yes, it's specifically the path from the database that I was thinking of. However, I don't think it's as simple as you imagine: some parts of the code expect strings, and some expect objects, which naturally leads to a reasonable amount of conversion between the two. You could make this more consistent, but it's not obvious to me that doing so would reduce the amount of conversion to be done. For instance, there are a lot of simple CRUD paths which would end up converting string to instance and back to string again for no real value.

In my opinion we also should only handle the domain in one form (IDNA or unicode). Currently both forms are accepted which could perhaps cause weird issues if someone mixes them.

https://matrix.org/docs/spec/appendices.html#server-name is pretty clear that they should be ascii (ie, IDNA.) Anywhere that accepts them as unicode is probably a bug.

richvdh · 2019-02-13T16:13:30Z

appears abandoned by contributor

interru added 6 commits January 5, 2019 01:03

Add domain validation in DomainSpecificString

75d028b

Signed-off-by: Mathias Koehler <[email protected]>

Change DomainSpecific to more pythonic syntax

a3f81eb

Signed-off-by: Mathias Koehler <[email protected]>

Fix DomainSpecifcString parsing with port

a944c97

Fallback to pythons bultin idna

34bd885

Use generic exception because of fallback

2d97bdb

Signed-off-by: Mathias Koehler <[email protected]>

Add suggested changes from the linter

3daf185

richvdh suggested changes Jan 7, 2019

View reviewed changes

interru closed this Jan 7, 2019

interru reopened this Jan 7, 2019

richvdh closed this Feb 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add domain validation in DomainSpecificString #4351

Add domain validation in DomainSpecificString #4351

interru commented Jan 5, 2019

codecov-io commented Jan 5, 2019 •

edited

Loading

richvdh left a comment

richvdh Jan 7, 2019

interru Jan 7, 2019

interru commented Jan 7, 2019 •

edited

Loading

interru commented Jan 7, 2019

richvdh commented Jan 7, 2019

richvdh commented Feb 13, 2019

Add domain validation in DomainSpecificString #4351

Add domain validation in DomainSpecificString #4351

Conversation

interru commented Jan 5, 2019

codecov-io commented Jan 5, 2019 • edited Loading

Codecov Report

richvdh left a comment

Choose a reason for hiding this comment

richvdh Jan 7, 2019

Choose a reason for hiding this comment

interru Jan 7, 2019

Choose a reason for hiding this comment

interru commented Jan 7, 2019 • edited Loading

interru commented Jan 7, 2019

richvdh commented Jan 7, 2019

richvdh commented Feb 13, 2019

codecov-io commented Jan 5, 2019 •

edited

Loading

interru commented Jan 7, 2019 •

edited

Loading