Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add domain validation in DomainSpecificString #4351

Closed
wants to merge 6 commits into from

Conversation

interru
Copy link

@interru interru commented Jan 5, 2019

FIXES #4088

This adds an additional validation check whether the domain/hostname meets the criteria of a valid hostname

@codecov-io
Copy link

codecov-io commented Jan 5, 2019

Codecov Report

Merging #4351 into develop will decrease coverage by <.01%.
The diff coverage is 80.95%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4351      +/-   ##
===========================================
- Coverage    73.73%   73.72%   -0.01%     
===========================================
  Files          300      300              
  Lines        29752    29957     +205     
  Branches      4879     4933      +54     
===========================================
+ Hits         21937    22086     +149     
- Misses        6390     6430      +40     
- Partials      1425     1441      +16
Impacted Files Coverage Δ
synapse/crypto/context_factory.py 87.5% <100%> (+2.25%) ⬆️
synapse/types.py 79.65% <78.94%> (-0.86%) ⬇️
synapse/rest/client/v2_alpha/account_data.py 56% <0%> (-29.37%) ⬇️
synapse/util/file_consumer.py 80.7% <0%> (-1.76%) ⬇️
synapse/handlers/search.py 80.24% <0%> (ø) ⬆️
synapse/handlers/federation.py 61.72% <0%> (ø) ⬆️
synapse/api/filtering.py 97.75% <0%> (+0.47%) ⬆️
synapse/python_dependencies.py 50.76% <0%> (+3.15%) ⬆️
synapse/config/server.py 68.5% <0%> (+4.4%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29f20a8...3daf185. Read the comment docs.

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of thoughts here:

  • we already have synapse.http.endpoint.parse_and_validate_server_name, which does some of this stuff. We should be using (and possibly extending) that rather than inventing a new set of checks.
  • The code you are changing here is on a hot path - we'll end up spending a non-trivial amount of resources validating server names which are already known to be valid. I'd prefer it if we validated identifiers at the point of entry to the system - for example, createRoom fails with a 500 error if a badly-formatted mxid is in the invitees list #4088 could be fixed with better checks in the RoomCreationHandler.

try:
localpart, domain = s[1:].split(':', 1)
if ':' in domain:
domain, port = domain.split(':')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like it will fail on IPv6 addresses.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Completely forgot IPv6.

@interru interru closed this Jan 7, 2019
@interru
Copy link
Author

interru commented Jan 7, 2019

Wrong button ;)

@interru interru reopened this Jan 7, 2019
@interru
Copy link
Author

interru commented Jan 7, 2019

The code you are changing here is on a hot path - we'll end up spending a non-trivial amount of resources validating server names which are already known to be valid.

Thats raises the question why UserIDs (& co) get converted back to an normal string and parsed a second time?! The only places where we should get it as string and not as an instance is on IO. The only IO you can trust is the Database (assuming the data in the database was already validated).

In my opinion we also should only handle the domain in one form (IDNA or unicode). Currently both forms are accepted which could perhaps cause weird issues if someone mixes them.

I need to make myself more familiar with the code, spec and ecosystem. Could also be that I currently might seeing problems because of Dunning-Kruger effect. ;)

@richvdh
Copy link
Member

richvdh commented Jan 7, 2019

Thats raises the question why UserIDs (& co) get converted back to an normal string and parsed a second time?! The only places where we should get it as string and not as an instance is on IO. The only IO you can trust is the Database (assuming the data in the database was already validated).

well yes, it's specifically the path from the database that I was thinking of. However, I don't think it's as simple as you imagine: some parts of the code expect strings, and some expect objects, which naturally leads to a reasonable amount of conversion between the two. You could make this more consistent, but it's not obvious to me that doing so would reduce the amount of conversion to be done. For instance, there are a lot of simple CRUD paths which would end up converting string to instance and back to string again for no real value.

In my opinion we also should only handle the domain in one form (IDNA or unicode). Currently both forms are accepted which could perhaps cause weird issues if someone mixes them.

https://matrix.org/docs/spec/appendices.html#server-name is pretty clear that they should be ascii (ie, IDNA.) Anywhere that accepts them as unicode is probably a bug.

@richvdh
Copy link
Member

richvdh commented Feb 13, 2019

appears abandoned by contributor

@richvdh richvdh closed this Feb 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants