Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use invalid to check asyncState #27

Merged
merged 2 commits into from
Jan 27, 2021

Conversation

zijin-m
Copy link
Contributor

@zijin-m zijin-m commented Jan 27, 2021

After testing, if two requests are repeatedly sent in a short time, the subsequent requests may use the AsyncState initialized by the previous request. It is temporarily unclear why store.enterWith(undefined as unknown as AsyncState) in the clear method not work(The store cannot be reset), I tried to add an invalid flag, which works normally, I hope to discuss the feasibility of this implementation with you, thx

@kezhenxu94
Copy link
Member

@tom-pytel can you please take a look?

@zijin-m
Copy link
Contributor Author

zijin-m commented Jan 27, 2021

see also #22

@tom-pytel
Copy link
Contributor

After testing, if two requests are repeatedly sent in a short time, the subsequent requests may use the AsyncState initialized by the previous request. It is temporarily unclear why store.enterWith(undefined as unknown as AsyncState) in the clear method not work(The store cannot be reset), I tried to add an invalid flag, which works normally, I hope to discuss the feasibility of this implementation with you, thx

So if I understand this correctly it is a fix for a bug in AsyncLocalStorage where getStore() may return the previous context after enterWith() sets a new one?

@tom-pytel
Copy link
Contributor

After testing, if two requests are repeatedly sent in a short time, the subsequent requests may use the AsyncState initialized by the previous request. It is temporarily unclear why store.enterWith(undefined as unknown as AsyncState) in the clear method not work(The store cannot be reset), I tried to add an invalid flag, which works normally, I hope to discuss the feasibility of this implementation with you, thx

Also, I don't see asyncState.context.invalid being cleared again after this, which would mean that once a ContextManager is marked invalid then all new subsequent Spans will get a clean SpanContext and the parent chain will be broken?

@zijin-m
Copy link
Contributor Author

zijin-m commented Jan 27, 2021

After testing, if two requests are repeatedly sent in a short time, the subsequent requests may use the AsyncState initialized by the previous request. It is temporarily unclear why store.enterWith(undefined as unknown as AsyncState) in the clear method not work(The store cannot be reset), I tried to add an invalid flag, which works normally, I hope to discuss the feasibility of this implementation with you, thx

So if I understand this correctly it is a fix for a bug in AsyncLocalStorage where getStore() may return the previous context after enterWith() sets a new one?

According to @Qard's reply comment this feature should be regarded as a feature rather than a bug. Maybe this is the expected behavior of enterWith. I guess it may be related to the order of calling getStore and enterWith, get asyncState() did not get it To the value of enterWith set (undefined), but the previous value, @Qard mentioned The "value" of the context between before and after hooks should be whatever the value was at the time the resource init event happened.

@zijin-m
Copy link
Contributor Author

zijin-m commented Jan 27, 2021

After testing, if two requests are repeatedly sent in a short time, the subsequent requests may use the AsyncState initialized by the previous request. It is temporarily unclear why store.enterWith(undefined as unknown as AsyncState) in the clear method not work(The store cannot be reset), I tried to add an invalid flag, which works normally, I hope to discuss the feasibility of this implementation with you, thx

Also, I don't see asyncState.context.invalid being cleared again after this, which would mean that once a ContextManager is marked invalid then all new subsequent Spans will get a clean SpanContext and the parent chain will be broken?

According to my understanding, it will not be broken in theory, because it will be marked only when the trace ends. The subsequent span should use the new trace. Could it be said that after the segment-finished event is triggered, there are sub spans that have not ended or were create?

Similarly, I think the role of the invalid tag is the same as store.enterWith(undefined as unknown as AsyncState), both get a clean SpanContext , Is it my understanding?

@tom-pytel
Copy link
Contributor

According to my understanding, it will not be broken in theory, because it will be marked only when the trace ends. The subsequent span should use the new trace. Could it be said that after the segment-finished event is triggered, there are sub spans that have not ended or were create?

Similarly, I think the role of the invalid tag is the same as store.enterWith(undefined as unknown as AsyncState), both get a clean SpanContext , Is it my understanding?

Ok, I understand what you are getting at but in that case you should be adding the invalid flag to the asyncState object and not the SpanContext object that it contains since that SpanContext may be used by other things (like parents of this Span).

@zijin-m
Copy link
Contributor Author

zijin-m commented Jan 27, 2021

According to my understanding, it will not be broken in theory, because it will be marked only when the trace ends. The subsequent span should use the new trace. Could it be said that after the segment-finished event is triggered, there are sub spans that have not ended or were create?
Similarly, I think the role of the invalid tag is the same as store.enterWith(undefined as unknown as AsyncState), both get a clean SpanContext , Is it my understanding?

Ok, I understand what you are getting at but in that case you should be adding the invalid flag to the asyncState object and not the SpanContext object that it contains since that SpanContext may be used by other things (like parents of this Span).

OK, I will update later

@zijin-m
Copy link
Contributor Author

zijin-m commented Jan 27, 2021

According to my understanding, it will not be broken in theory, because it will be marked only when the trace ends. The subsequent span should use the new trace. Could it be said that after the segment-finished event is triggered, there are sub spans that have not ended or were create?
Similarly, I think the role of the invalid tag is the same as store.enterWith(undefined as unknown as AsyncState), both get a clean SpanContext , Is it my understanding?

Ok, I understand what you are getting at but in that case you should be adding the invalid flag to the asyncState object and not the SpanContext object that it contains since that SpanContext may be used by other things (like parents of this Span).

OK, I will update later

@tom-pytel I have pushed changes, please take some time to review, thx

@tom-pytel
Copy link
Contributor

@tom-pytel I have pushed changes, please take some time to review, thx

Looks good.

@kezhenxu94 kezhenxu94 added the bug Something isn't working label Jan 27, 2021
@kezhenxu94 kezhenxu94 added this to the 0.2.0 milestone Jan 27, 2021
@kezhenxu94 kezhenxu94 merged commit b2ef178 into apache:master Jan 27, 2021
@tom-pytel
Copy link
Contributor

Actually there may still be a potential problem since the asyncState may be inherited and invalidating the state may affect a parent. I may need to dup the invalid flag like the spans are duped in spansDup() in order to prevent a child being invalidated also invalidating the parent. Need to study this some more, all thanks to that wonderful "feature" of AsyncLocalStorage.

@tom-pytel
Copy link
Contributor

Actually there may still be a potential problem since the asyncState may be inherited and invalidating the state may affect a parent. I may need to dup the invalid flag like the spans are duped in spansDup() in order to prevent a child being invalidated also invalidating the parent. Need to study this some more, all thanks to that wonderful "feature" of AsyncLocalStorage.

Never mind, unnecessary since the entire asyncState object is recreated in spansDup() there is no chance of a child invalidation affecting a parent.

@zijin-m zijin-m deleted the feature/fix-aysncState branch January 28, 2021 02:13
@Qard
Copy link

Qard commented Feb 20, 2021

Just saw I was tagged in here. I just wanted to say, as the author of store.enterWith(...), I highly recommend against using it in this way. It has one very specific use case, and is marked as such in the docs, which is to inject it before an http handler to have the entire remainder of the currently executing tick to be attributed to the given context. This is almost never what you actually want. In looking at how you're using it here, it should pretty much always be done with store.run(...) to properly escape the context attribution at the correct boundary. Otherwise you are likely to leak the context into places you might not expect or to switch contexts more than once in the same context which could also have strange behaviour.

The official interface is store.run(...) and I would highly recommend only ever using that unless you are absolutely certain store.enterWith(...) is the only solution and you fully understand the implications of using it.

@tom-pytel
Copy link
Contributor

Just saw I was tagged in here. I just wanted to say, as the author of store.enterWith(...), I highly recommend against using it in this way. It has one very specific use case, and is marked as such in the docs, which is to inject it before an http handler to have the entire remainder of the currently executing tick to be attributed to the given context. This is almost never what you actually want. In looking at how you're using it here, it should pretty much always be done with store.run(...) to properly escape the context attribution at the correct boundary. Otherwise you are likely to leak the context into places you might not expect or to switch contexts more than once in the same context which could also have strange behaviour.

The official interface is store.run(...) and I would highly recommend only ever using that unless you are absolutely certain store.enterWith(...) is the only solution and you fully understand the implications of using it.

Well, apart from this bug so far it seems to be working ok after some stress testing we did. store.run() doesn't work in the manner we need this to work, I am not trying to run a subtask but set a new context. The idea being that at the point where store.enterWith() is executed the context variable for this task and all asynchronous child tasks created chronologically after the enterWith() get the new context variables. I understand it may have been created for the niche case you describe, but are you saying that it will break for the desired behavior I described in some cases?

@Qard
Copy link

Qard commented Feb 21, 2021

Quite possibly, yes.

As indicated in the docs, it applies for the entire remainder of the current tick, which isn't necessarily the span you think it is. In cases like event batching it can cause all sorts of havoc because, as shown in the docs, it will leak from one event emission into the next if they are triggered within the same tick. There's no logical boundary around store.enterWith(...) only a tick transition. The primary risk here is that even if it may work correctly now you are relying on unspecified behaviour that may not remain true forever. For example, streams may choose to make some optimizations to batch reads together and emit them in one tick--that would completely break this. It's best to provide your own strict encapsulation otherwise you're relying on a moving target for when the "end" of a context happens.

@tom-pytel
Copy link
Contributor

tom-pytel commented Feb 22, 2021

As indicated in the docs, it applies for the entire remainder of the current tick, which isn't necessarily the span you think it is.

I understand what you mean by this and I did test to verify how it behaves (event handlers and all). I take this into account and specifically code around it by doing things like cloning the context object in child tasks, including a mutable array of spans in order to not modify the parent. Also explicitly removing the current span from the list of spans if the span code is about to relinquish control to be regained in a new async task and re-adding to the list of spans there in order that chronologically subsequent siblings in the same task do not become children (does that explanation make sense?). It seems to work.

In cases like event batching it can cause all sorts of havoc because, as shown in the docs, it will leak from one event emission into the next if they are triggered within the same tick. There's no logical boundary around store.enterWith(...) only a tick transition.

The logical boundary in this case is somewhat convoluted because of the open-ended control flow with span.start() and span.stop() which can be called anywhere, so I take care of enforcing the boundary. It is not as strict as run(), but I don't need that, just something that works like Python's contextvars.ContextVar.

The primary risk here is that even if it may work correctly now you are relying on unspecified behaviour that may not remain true forever.

This is a risk, but I do not have much choice here. Like I said, run() is not really an option because of the open-ended control flow. If the behavior of enterWith() changes in the future I will either have to adapt to whatever the new situation is, or will have to fall back to what I am currently doing for Node 10 which is to emulate a ContextVar-ish AsyncLocalStorage using asyc_hooks.createHook() - which is obviously not ideal for performance reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants