Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce unnecessary producerStateTable publish notifications #258

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jipanyang
Copy link
Contributor

@jipanyang jipanyang commented Feb 1, 2019

Signed-off-by: Jipan Yang [email protected]

The typical use case that this PR is trying to deal with is related to route flapping.

Assuming such a scenario:
<1> 4K routes learned on an interface.

<2> interface down, 4k routes withdraw.
<3> fpmsyncd inserts 4k route keys into producerStateTabk keyset and generates 4k redis publish notifications

<4> Interface goes up

<4.A> 
Interface may go up and learn the 4k routes back  before orchagent has chance to pick up any notification and process the keyset.  That is fine, change in https://github.com/Azure/sonic-swss-common/pull/257 handled this case, no new redis publish notification will be created since the route keys already exist in producerStateTabk keyset and are to be processed. 

<4.B> 
Interface may go up and learn the 4k routes back after orchagent has processed a few notifications but not all the redis notifications (say, 1k notification processed),  with the first notification,  consumer in orchagent batch popped all the 4k route keys from producerStateTabk keyset,  (orchagent has set popBatchSize as 8192),  for the other 999 notifications processed, orchagent just call consumer_state_table_pops.lua 999 times without seeing any real data.
4k - 1k = 3k more redis notifications pending in redis queue now.

<5> Interface goes down

<5.A>  If <4.A> was hit during previous interface up, good, no new redis publish notification will be generated

<5.B>  If <4.B> was hit during previous interface up. 4K more redis notifications put into the redis queue.

<6> Interface goes up, loop back to <4>

The exact scenarios vary, but from <4> to <6>, number of pending notifications keeps increasing.

The issue is that orchagent may be abled to handle all the route changes with the batch processing (popBatchSize = 8192), redis publish notification was generated for each key and orchagent has to respond to each of the notifications with current select framework though most of them bring no value.

The changes in this PR has two parts:

<1> Only generate redis publish notification when absolutely necessary.
In prodcueStateTable, generate notification for the first key in an empty keyset.
In consumerStateTable, consume all the data in keyset upon redis notification. If by any chance, the number of keys in keyset is larger than popBatchSize, generate one notification to itself to come back again.

<2> consumerStateTable.pop() processing
pop() method is not used by orchagent. but some unit test cases only. User of consumerStateTable.pop() process one key for each notification. So the work around is we explicitly create m_buffer.size() of redis publish notifications for select() frame to pickup all the buffered keys.

@jipanyang
Copy link
Contributor Author

This is the implementation done together with @zhenggen-xu

@lguohan lguohan requested a review from qiluo-msft February 1, 2019 23:59
@lguohan
Copy link
Contributor

lguohan commented Feb 2, 2019

@jipanyang , can you tell us which one you prefer?

@jipanyang
Copy link
Contributor Author

jipanyang commented Feb 2, 2019

@lguohan This one probabally will be able to handle the most extreme redis usage scenario, but the change requires more careful review.

I need to check and fix the pyext part, forgot to run this test locally.

@qiluo-msft
Copy link
Contributor

Could you explain the problem and the high level design in this PR? That may help us understand your code.

@jipanyang jipanyang force-pushed the producerState_publish branch from 29d86b7 to c3341c3 Compare February 6, 2019 22:27
@jipanyang
Copy link
Contributor Author

@qiluo-msft problem description and high level design updated.

"for i = 0, #KEYS - 3 do\n"
" redis.call('HSET', KEYS[3 + i], ARGV[3 + i * 2], ARGV[4 + i * 2])\n"
"end\n"
" if added > 0 then \n"
"local num = redis.call('SCARD', KEYS[2])\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably better to move this check before we add the current key itself (before "redis.call('SADD', KEYS[2], ARGV[2])\n"), and we check whether "num == 0". With the code as is, in case the current key is the same as the exist key before, we could end up publishing more notifications than needed.

"redis.call('SADD', KEYS[4], ARGV[2])\n"
"redis.call('DEL', KEYS[3])\n"
"if added > 0 then \n"
"local num = redis.call('SCARD', KEYS[2])\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before, to check the keyset number before we add the new key to be more precise.

"for i = 0, #KEYS - 3 do\n"
" redis.call('HSET', KEYS[3 + i], ARGV[3 + i * 2], ARGV[4 + i * 2])\n"
"end\n"
" if added > 0 then \n"
"local num = redis.call('SCARD', KEYS[2])\n"
" if num == 1 then \n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since we have only one notification at producer side for the keyset now, we might want to be cautious of the case where the notification could get dropped. In such case, the keyset will never be served to consumer even we may update keys/values in the keyset, as we will never generate notifications at producer again. Some mechanism might be needed here to prevent such cases.

Copy link
Contributor Author

@jipanyang jipanyang Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlikely redis will drop the notification, in that case probably a lot of bad things happened before the error.

But I agree this is a valid concern, we should avoid single point of failure here. Currently I'm considering this extra check:

  1. producerStateTable side, together with notification, it also leaves one field/value pair under _PRODUCESTATE_PUBLISH_REC
    ex.
127.0.0.1:6380> hset _PRODUCESTATE_PUBLISH_REC ROUTE_TABLE_CHANNEL 1
(integer) 1
127.0.0.1:6380> hset _PRODUCESTATE_PUBLISH_REC PORT_TABLE_CHANNEL 1
(integer) 1
127.0.0.1:6380> hgetall _PRODUCESTATE_PUBLISH_REC
1) "ROUTE_TABLE_CHANNEL"
2) "1"
3) "PORT_TABLE_CHANNEL"
4) "1"


  1. consumerStateTable side, after consuming all the data, it deletes the field/value pair
127.0.0.1:6380> hdel  _PRODUCESTATE_PUBLISH_REC ROUTE_TABLE_CHANNEL
(integer) 1
127.0.0.1:6380> hdel _PRODUCESTATE_PUBLISH_REC PORT_TABLE_CHANNEL
(integer) 1
  1. The caller of consumerStateTable, which is orchagent, do a check of _PRODUCESTATE_PUBLISH_REC key after each Select::TIMEOUT event.
127.0.0.1:6380> hgetall _PRODUCESTATE_PUBLISH_REC
(empty list or set)

If it is not empty, likely the notification has been lost, generate new channel notification for each FV under the key so the pending data could be picked up.

Due to race condition, there is very small possibility of false positive. We generate one extra notification, that is ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mechanism looks ok to me. A few questions:
1, "If it is not empty, likely the notification has been lost, generate new channel notification for each FV under the key so the pending data could be picked up".
Any reason we do notification per FV? I would think we should generate the notification per keyset as before for consumer to pick up.

2, "Due to race condition"
Where exactly the race condition? If after time-out there could be a notification before we check the " _PRODUCESTATE_PUBLISH_REC", we could check the notification all together there atomically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FV here is the FV under _PRODUCESTATE_PUBLISH_REC:
ex.

1) "ROUTE_TABLE_CHANNEL"
2) "1"
3) "PORT_TABLE_CHANNEL"
4) "1"

each FV represents one keyset which has pending data.

For race condition, consider such a scenario:

  1. Orchagent has been idle for 1 second, timeout signal will kick in.

  2. neighsyncd has discovered a new neighbor and programmed NEIGH_TABLE, then generated redis notification

  3. At the exact moment, if timeout signal is processed by orchagent before redis notification while neighsyncd already put data under _PRODUCESTATE_PUBLISH_REC key, orchagent will create one extra reidis publish notification for NEIGH_TABLE, though immediately after that the real notification from neighsyncd will be processed.

This will be really rare scenario, the extra notification is harmless other than causing orchagent to perform one more check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants