-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EmbedAPIv3: updates after QA with sphinx-hoverxref #8521
Conversation
Add extra domains for external intersphinx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few small suggestions.
@@ -71,12 +70,15 @@ def _download_page_content(self, url): | |||
|
|||
response = requests.get(url, timeout=settings.RTD_EMBED_API_DEFAULT_REQUEST_TIMEOUT) | |||
if response.ok: | |||
# NOTE: we use ``response.content`` to get its binary | |||
# representation. Then ``selectolax`` is in charge to auto-detect | |||
# its encoding. We trust more in selectolax for this than in requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we trust it more? Is there a specific issue we've seen? I'm guessing that requests has a lot more eyes on it, but perhaps we're doing something different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't have a good amount of data tested yet. However, I've already find myself doing the solution from this SO question: https://stackoverflow.com/a/52615216
r = requests.get("https://martin.slouf.name/")
# override encoding by real educated guess as provided by chardet
r.encoding = r.apparent_encoding
# access the data
r.text
Then, I read that selectolax also uses the meta-tags from the HTML (https://selectolax.readthedocs.io/en/latest/parser.html#selectolax.parser.HTMLParser) to guess the encoding and it always worked without manual interaction as with requests
.
Anyways, I don't have strong opinions here. I suppose whatever we choose here, it will fail in some case that the other library wouldn't 🙃
@@ -818,6 +818,8 @@ def DOCKER_LIMITS(self): | |||
r'docs\.python\.org', | |||
r'docs\.scipy\.org', | |||
r'docs\.sympy\.org', | |||
r'www.sphinx-doc.org', | |||
r'numpy\.org', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely something that should be defined in the DB, or somewhere else that doesn't require a deploy to modify. It's definitely something we're going to need to iterate on pretty frequently I'm guessing, and perhaps even support customization at a project or organization level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. We do have an issue opened for Django settings in the DB at https://github.com/readthedocs/readthedocs-ops/issues/1004 but I don't think that will progress too much. If that's the case, we can create a simple EmbedDomain
model that we can relate to a Project and/or Organization as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #8530 to continue the conversation there.
Co-authored-by: Eric Holscher <[email protected]>
Updates required by readthedocs/sphinx-hoverxref#146 found while doing QA.