Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Merge different Resource implementation classes #7732

Merged
merged 14 commits into from
Jul 3, 2020

Conversation

erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented Jun 22, 2020

We currently have two Resource classes that work in slightly different ways, this is an attempt to de-duplicate some of the code and make things more consistent (ironically by adding more classes).

Concretely, this fixes a bug where trace_servlet would trigger a "Tried to close a non-active scope!" error each time a request was processed in DirectServeResource if opentracing is enabled. This is because the trace_servlet wrapper was added after the wrap_async_request_handler, which meant that logcontexts would be forcibly dropped before the handling in trace_servlet happened. This is handled correctly by JsonResource, so by refactoring things so that they share the same code we can fix that bug.

@erikjohnston erikjohnston force-pushed the erikj/merge_resources branch from 1bbdbf2 to f5b7968 Compare June 22, 2020 15:23
@erikjohnston erikjohnston requested a review from a team June 22, 2020 15:47
@clokep clokep self-assigned this Jun 23, 2020
synapse/http/server.py Outdated Show resolved Hide resolved
synapse/rest/key/v2/remote_key_resource.py Show resolved Hide resolved
synapse/rest/media/v1/thumbnail_resource.py Outdated Show resolved Hide resolved
@erikjohnston erikjohnston requested review from clokep and richvdh June 23, 2020 14:11
synapse/http/server.py Outdated Show resolved Hide resolved
synapse/http/server.py Outdated Show resolved Hide resolved
synapse/http/server.py Show resolved Hide resolved
synapse/http/server.py Outdated Show resolved Hide resolved
synapse/rest/key/v2/remote_key_resource.py Show resolved Hide resolved
synapse/http/server.py Outdated Show resolved Hide resolved
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so... I've got one of my annoying opinions: as discussed briefly in #synapse-dev, your _AsyncResource feels like it is mixing separate concerns, and additionally it feels like the DirectServeResources are just doing a subset of what JsonResource needs to do anyway. (I've always felt like this: DirectServeResource should have been AsyncResource to my mind.

My (current) suggestion for a class heirarchy:

                Resource
                    ^
                    |
               AsyncResource
              ^    ^    ^        
          +---/     |    \------+
AsyncHtmlResource   |       AsyncJsonResource
                    |            ^
            CallbackResource     |
                    ^            |
                    |        +---/
                JsonServletResource 

AsyncResource overrides render to look for async_render, async_render_GET, etc; it also has stub functions for _handle_error (drops the connection and logs by default?) and _handle_result (to handle a non-empty result from async_render_*: probably does nothing default, but might even raise a not-implemented exception?)

{Json,Html}Resource can provide implementations of _handle_error and _handle_result which return something to the client, taking the place of DirectServe{Json,Html}Resource.

CallbackResource has an async_render which does the stuff you currently have inside the try block of _AsyncResource (https://github.com/matrix-org/synapse/pull/7732/files#diff-c2761a2c74c6c134fb39216aef055306R229-R256).

JsonServletResource is a renaming of the current JsonResource.

(It's unclear if we need both CallbackResource and JsonServletResource: perhaps we can combine them back into a single class, thus avoiding the multiple inheritance?)

I think you can also hook whatever it is trace_servlet is supposed to be doing at AsyncResource? Though I must admit I've failed to grok the bug you're trying to fix here.

@erikjohnston
Copy link
Member Author

I think that's basically what we have today, modulo abusing the @wrap_*_request_handler to choose between the HTML vs JSON formatting. AFAICT the only code that JsonServletResource and AsyncResource would share would be the two line render(..) function that delegates to _async_render (if that)? The JsonServletResource doesn't really want to be looking for async_render_* and co (though I suppose there is no harm in it).

I guess ideally all I want is to separate the routing from the rendering, although this PR gets a bit unstuck trying to support HTML vs JSON. One possible way of doing this is to have an SynapseResource that does the async stuff as well as trace_servlet etc, then that just delegates to a router that is passed in as an arg (i.e. it just calls await self.router.render(request)). Then the difference between JsonResource and DirectServeResource is just what router that is passed into SynapseResource.

Now, the problem with that is the error handling, as now SynapseResource no longer knows the format involved (unless you tell it, in which case you're effectively back to where we are with this PR, modulo a bit of shuffling). At which point all it can do is kill the request connection if an exception happens that the router(?) doesn't handle (which feels like a small step backwards from where we are today).

I think you can also hook whatever it is trace_servlet is supposed to be doing at AsyncResource? Though I must admit I've failed to grok the bug you're trying to fix here.

TLDR: the current code ends up effectively calling preserve_fn(trace_servlet(callback)) for JsonResource but trace_servlet(preserve_fn(callback)) for DirectServeResource. This breaks as trace_servlet assumes the given function correctly follows log context rules. This stems from the fact that the @wrap_*_request_handler are put at different levels in the two different classes

@richvdh
Copy link
Member

richvdh commented Jun 23, 2020

AFAICT the only code that JsonServletResource and AsyncResource would share would be the two line render(..) function that delegates to _async_render (if that)?

There's a bunch of stuff that AsyncResource can perform that will be useful for any subclass:

  • preserve_fn
  • request.processing
  • trace_request
  • call _async_render
  • return NOT_DONE_YET
  • call one of _handle_error or _handle_result

I call that at least six lines :)

AsyncResource._async_render can be the thing that goes looking for async_render_* and co, so could be shortcut by subclasses which know they don't need to delegate to it, BUT if we can do this in a way that fixes #6746 (and maybe #6894), that would be good.

One possible way of doing this is to have an SynapseResource that does the async stuff as well as trace_servlet etc, then that just delegates to a router that is passed in as an arg (i.e. it just calls await self.router.render(request)

I'm not sure I love this plan.

@clokep clokep removed their request for review June 23, 2020 17:47
@richvdh
Copy link
Member

richvdh commented Jun 23, 2020

I call that at least six lines :)

It's also annoying boilerplatey boilerplate, to get all the exception handling right.

@erikjohnston
Copy link
Member Author

I am a bit confused how your classes are different than mine, in terms of avoiding mixing of concerns. Each layer seems to be doing pretty much the same things, just implementing things a bit differently (or having JsonServletResource inherit from AsyncJsonResource).

I think having the render function call out to a function that subclasses implement to handle the request is nicer than it calling a function to get the callback is a nicer way of doing things. (Though its worth noting that doing it like that means we call trace_servlet(callback) each request, rather than wrapping the callbacks on startup).

I don't really understand why we would implement stub functions for _handle_error and _handle_result, everything has a concrete implementation of those already that produce better results.

AsyncResource._async_render can be the thing that goes looking for async_render_* and co, so could be shortcut by subclasses which know they don't need to delegate to it, BUT if we can do this in a way that fixes #6746 (and maybe #6894), that would be good.

I think we can make that work either way TBH.

One possible way of doing this is to have an SynapseResource that does the async stuff as well as trace_servlet etc, then that just delegates to a router that is passed in as an arg (i.e. it just calls await self.router.render(request)

I'm not sure I love this plan.

Fair, it was an attempt to try and separate out the concerns between rendering and routing more explicitly.

@richvdh
Copy link
Member

richvdh commented Jun 23, 2020

I think having the render function call out to a function that subclasses implement to handle the request is nicer than it calling a function to get the callback is a nicer way of doing things

This sentence doesn't make any sense, which is a shame because on closer inspection I think it's basically the nub of the difference between our approaches.

In my approach, rather than having AsyncResource call _get_handler_for_request, it would instead look for _async_render* methods itself - thus removing the point of DirectRenderResource. I think you're saying that you prefer one approach over the other, but I can't really tell which you prefer.

(Though its worth noting that doing it like that means we call trace_servlet(callback) each request, rather than wrapping the callbacks on startup).

trace_servlet doesn't need to be implemented as a wrapper... you could do what it's doing inline instead.

I don't really understand why we would implement stub functions for _handle_error and _handle_result, everything has a concrete implementation of those already that produce better results.

fair, scratch that part of the plan; keep them as abstract functions.

@richvdh
Copy link
Member

richvdh commented Jun 24, 2020

Fair, it was an attempt to try and separate out the concerns between rendering and routing more explicitly.

Incidentally, I think this would be better solved by using the resource tree as it is intended (ie, having individual Resources for each REST endpoint, which are registered up front), and thus fixing #5118 into the bargain. This whole business where the rendering layer calls out to a routing layer rather than the other way around feels wrong. I don't suggest we attempt to completely change all that now though.

@erikjohnston
Copy link
Member Author

I think it would be helpful if you could answer what you're objections are to the current approach that you're trying to fix.

I do think that your AsyncResource is nicer than the one in the PR, but these seem mainly to be cosmetic changes rather than fixing any mixing of concerns.

@richvdh
Copy link
Member

richvdh commented Jun 24, 2020

Two things, basically:

  • your _AsyncResource is mixing together (1) the handling of async things, and (2) the handling of servlet path registration. This is what I'm saying about mixing concerns.
  • Support for _async_render_METHOD methods should be in _AsyncResource, largely for symmetry with Resource itself; I'm not convinced that looking for the methods during construction and sticking them in a dict is a win.

@erikjohnston
Copy link
Member Author

Ok, thanks. I guess I just see it all as routing, but am happy to rejig.

@erikjohnston
Copy link
Member Author

The problem with the new approach is that now we need to move the trace_servlet and request.metrics.name into the sub classes since servlet name depends on the routing. At which point we're back to square one where we're duplicating the code.

I suppose the correct, ish, way of fixing this is to have each servlet be a Resource, or something, but that is not a change I'm going to make now.

Thoughts on how to get out of this welcome, though at this point I might just give up on the idea of de-duplicating the code and just fix the specific case of opentracing and move on

@richvdh
Copy link
Member

richvdh commented Jun 24, 2020

The problem with the new approach is that now we need to move the trace_servlet and request.metrics.name into the sub classes since servlet name depends on the routing.

oh, balls.

request.metrics.name isn't so bad because you can update it later (provided you get there before the first await), but trace_servlet is indeed problematic. Is it practical to update the span in some way after it is started (add the servlet name as some sort of metadata instead?)

@richvdh
Copy link
Member

richvdh commented Jun 24, 2020

(otherwise yes, that is a major flaw in my plan, and it's up to you whether you stick with your proposal or patch the current situation up)

@erikjohnston
Copy link
Member Author

erikjohnston commented Jun 24, 2020

Aha, actually looks like we can change the operation name, which isn't possible using the standard opentracing APIs but is possible in the jaeger specific APIs. In which case we can actually just set the operation name based on the request name at the end of the rendering, rather than duplicating that logic.

(What you can and can't change is quite arbitrary....)

@richvdh
Copy link
Member

richvdh commented Jun 24, 2020

In which case we can actually just set the operation name based on the request name at the end of the rendering

to confirm: this won't confuse nested spans that have come and gone by the time we finish rendering?

@erikjohnston
Copy link
Member Author

In which case we can actually just set the operation name based on the request name at the end of the rendering

to confirm: this won't confuse nested spans that have come and gone by the time we finish rendering?

I don't believe so, as they all use opaque IDs internally. Will test on jki.re though.

@erikjohnston erikjohnston force-pushed the erikj/merge_resources branch from 23399f3 to 40b84a0 Compare June 25, 2020 14:40
@erikjohnston
Copy link
Member Author

Ok, I've rejigged everything as discussed. Sorry that it's all in a single commit

@erikjohnston erikjohnston requested a review from richvdh June 25, 2020 15:10
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm much happier with this now; thanks!

have you checked that OPTIONS works everywhere we could conceivably have broken it?

synapse/http/server.py Outdated Show resolved Hide resolved
synapse/http/server.py Show resolved Hide resolved
synapse/http/server.py Outdated Show resolved Hide resolved
synapse/http/server.py Show resolved Hide resolved
synapse/http/server.py Outdated Show resolved Hide resolved
with scope:
result = func(request, *args, **kwargs)
if opentracing is None:
yield
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda wondering about the overhead here. There's a _NullContextManager in generic_worker.py which we could pull out and use...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single if statement per request doesn't fill me with terror, if I'm honest.

Copy link
Member

@richvdh richvdh Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not the if statement I'm worrying about, but more the infrastructure involved in instantiating a generator, constructing a GeneratorContextManager to wrap it, and then doing the callbacks.

I'm inclined to agree it's not worth lying awake at night over though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right I see, I forgot that we'd need to generate a context manager object each time

synapse/logging/opentracing.py Show resolved Hide resolved
synapse/replication/http/__init__.py Outdated Show resolved Hide resolved
synapse/http/server.py Outdated Show resolved Hide resolved
synapse/http/server.py Show resolved Hide resolved
@erikjohnston erikjohnston force-pushed the erikj/merge_resources branch from af3a628 to f0f2baf Compare June 30, 2020 10:52
@erikjohnston
Copy link
Member Author

have you checked that OPTIONS works everywhere we could conceivably have broken it?

Most of the OPTIONS are handled by a separate dedicated resource. For the places that handle options via _async_render_OPTIONS themselves we have a test for the url preview case and I've manually tested the other two

@erikjohnston erikjohnston requested a review from richvdh June 30, 2020 13:06
@clokep
Copy link
Member

clokep commented Jul 2, 2020

@erikjohnston Looks like I bitrotted this. There are conflicts now! 😢

synapse/http/server.py Show resolved Hide resolved
synapse/http/server.py Show resolved Hide resolved
with scope:
result = func(request, *args, **kwargs)
if opentracing is None:
yield
Copy link
Member

@richvdh richvdh Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not the if statement I'm worrying about, but more the infrastructure involved in instantiating a generator, constructing a GeneratorContextManager to wrap it, and then doing the callbacks.

I'm inclined to agree it's not worth lying awake at night over though.

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of minor things, generally lgtm

tests/http/test_additional_resource.py Outdated Show resolved Hide resolved
synapse/http/server.py Outdated Show resolved Hide resolved
@erikjohnston erikjohnston requested a review from richvdh July 3, 2020 15:28
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@erikjohnston erikjohnston merged commit 5cdca53 into develop Jul 3, 2020
@erikjohnston erikjohnston deleted the erikj/merge_resources branch July 3, 2020 18:02
babolivier pushed a commit that referenced this pull request Sep 1, 2021
* commit '5cdca53aa':
  Merge different Resource implementation classes (#7732)
  Fix inconsistent handling of upper and lower cases of email addresses. (#7021)
  Allow YAML config file to contain None (#7779)
  Fix a typo.
  Move 1.15.2 after 1.16.0rc2.
  1.16.0rc2
  Remove an extraneous space.
  Add links to the fixes.
  Fix tense in the release notes.
  Hack to add push priority to push notifications (#7765)
  Add early returns to `_check_for_soft_fail` (#7769)
  Use symbolic names for replication stream names (#7768)
  Type checking for `FederationHandler` (#7770)
  Fix new metric where we used ms instead of seconds (#7771)
  Fix incorrect error message when database CTYPE was set incorrectly. (#7760)
  Pin link in CHANGES.md
  Fixes to CHANGES.md
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants