We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The following URLs are incorrectly canonicalized with SURT as "com)/".
SURT = "com)/" 1. https://www1355544.com/ 2. https://www3288.com/ 3. https://www504778.com/ 4. https://www556798.com/ 5. https://www57912.com/
Problem is that 'www\d*.' is removed from the URL, which leaves "com)/" as the SURT.
surt/surt/IAURLCanonicalizer.py
Line 127 in 6934c32
the other instance where the URLs were correctly canonicalized by removing 'www\d*.'
SURT = "jp,co,daily)/" 1. https://daily.co.jp/ 2. https://www4.daily.co.jp/ 3. https://www8.daily.co.jp/ 4. https://www9.daily.co.jp/
We proposed to update the regex where it should only remove the 'www\d*.' if at least two dots are present in the hostname.
The text was updated successfully, but these errors were encountered:
Don't perform www. prefix removal on 2nd level domains. Fixes interne…
25d87fb
…tarchive#28
Successfully merging a pull request may close this issue.
The following URLs are incorrectly canonicalized with SURT as "com)/".
Problem is that 'www\d*.' is removed from the URL, which leaves "com)/" as the SURT.
surt/surt/IAURLCanonicalizer.py
Line 127 in 6934c32
the other instance where the URLs were correctly canonicalized by removing 'www\d*.'
We proposed to update the regex where it should only remove the 'www\d*.' if at least two dots are present in the hostname.
The text was updated successfully, but these errors were encountered: