Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue 456] Support chunking for big messages. #805

Merged
merged 44 commits into from
Oct 25, 2022

Conversation

Gleiphir2769
Copy link
Contributor

@Gleiphir2769 Gleiphir2769 commented Jul 10, 2022

Contribution Checklist

Master Issue: #456

Motivation

Make pulsar go client support chunking to produce/consume big messages. The earlier implementation (#717) didn't take into account many details, so I decided to reimplement it.

Modifications

  • Add internalSingleSend to send message without batch because batch message will not be received by chunk.
  • Moved BlockIfQueueFull check from internalSendAsync to internalSend (canAddQueue) to ensure the normal block in chunking.
  • Make producer send big messages by chunking.
  • Add chunkedMsgCtxMap to store chunked messages meta and data.
  • Make consumer can obtain chunks and consume the big message.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Add TestProducerChunking to verify send big message by chunking.
  • Add message_chunking_test to verify message chunking.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): no
  • The public API: no
  • The schema: no
  • The default values of configurations: no
  • The wire protocol: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? not yet

Copy link
Contributor

@zzzming zzzming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on Java implementation, ConsumerImpl.java, Chunking also requires changes at the consumer side to be able to assembly chunks into the original message. Are you going to add support on the consumer side?

@Gleiphir2769
Copy link
Contributor Author

Based on Java implementation, ConsumerImpl.java, Chunking also requires changes at the consumer side to be able to assembly chunks into the original message. Are you going to add support on the consumer side?

The consumer side implement is already planned. It will be commited in another PR.

…g of sequenceID generate; fix the bug where batch messages are compressed twice; add internal error handle, fix inaccurate PublishErrorsMsgTooLarge metric; fix the incorrect producer queueFullBlock
@Gleiphir2769 Gleiphir2769 changed the title [Issue 456] Support chunking to produce big messages. [Issue 456] Support chunking for big messages. Aug 6, 2022
@Gleiphir2769
Copy link
Contributor Author

Based on Java implementation, ConsumerImpl.java, Chunking also requires changes at the consumer side to be able to assembly chunks into the original message. Are you going to add support on the consumer side?

Hi, the consumer side is implemented here. Looking forward to your review. @zzzming

pulsar/producer.go Outdated Show resolved Hide resolved
pulsar/producer_partition.go Outdated Show resolved Hide resolved
pulsar/producer_partition.go Show resolved Hide resolved
pulsar/producer_partition.go Show resolved Hide resolved
pulsar/producer_partition.go Outdated Show resolved Hide resolved
pulsar/producer_partition.go Outdated Show resolved Hide resolved
pulsar/producer_partition.go Outdated Show resolved Hide resolved
pulsar/producer_partition.go Outdated Show resolved Hide resolved
pulsar/producer_partition.go Outdated Show resolved Hide resolved
pulsar/consumer.go Outdated Show resolved Hide resolved
@Gleiphir2769
Copy link
Contributor Author

Gleiphir2769 commented Oct 16, 2022

This PR will be possible to fix this issue #447.

@Gleiphir2769
Copy link
Contributor Author

/pulsarbot run-failure-checks

Copy link
Member

@nodece nodece left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your PR seems to have made improvements unrelated to chunking, if so, I suggest you make a new PR to improve.


if mid.consumer != nil {
return mid.Ack()
if err := c.checkMsgIDPartition(msgID); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed covert the msgID from the MessageID to the trackingMessageID type, I'm not sure if we need this.

Why not use messageID(), what did I miss?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trackingMessageID does not records chunking infomation.

For example, Ack() a big message need to ack all the chunks of it. Using trackingMessageID can not figure out which chunk (messageId) need to be ack.

trackingMessageID is designed to tracking batch messages so it shoud not be the messageId type accepted by the method exposed by partitionConsumer. I think the better way would be to accept MessageID as the messageId type in partitionConsumer methods. However, only the necessary interfaces have been modified (Ack, NAck and Seed) for the least changes

pulsar/consumer_impl.go Show resolved Hide resolved
pulsar/consumer_impl.go Show resolved Hide resolved
pulsar/consumer_partition.go Show resolved Hide resolved
@@ -164,7 +167,11 @@ func (bc *batchContainer) hasSpace(payload []byte) bool {
return true
}
msgSize := uint32(len(payload))
return bc.numMessages+1 <= bc.maxMessages && bc.buffer.ReadableBytes()+msgSize <= uint32(bc.maxBatchSize)
expectedSize := bc.buffer.ReadableBytes() + msgSize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this code(170-174).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L171 compared the bc.buffer.ReadableBytes() + msgSize with bc.maxMessageSize and the original code does not.

It used to make sure that one batch size does not exceed the maxMessageSize. It's a part of correctly calculation whether the message is too large.

By the way, the compare code is too long to be inline.

Copy link
Member

@nodece nodece Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, but using inline looks like more clear, so like:

return bc.numMessages+1 <= bc.maxMessages && expectedSize <= uint32(bc.maxBatchSize) && expectedSize <= bc.maxMessageSize

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nodece This PR introduces the maxMessageSize. I think these changes should be related to this PR.

Copy link
Contributor Author

@Gleiphir2769 Gleiphir2769 Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, but using inline looks like more clear, so like:

return bc.numMessages+1 <= bc.maxMessages && expectedSize <= uint32(bc.maxBatchSize) && expectedSize <= bc.maxMessageSize

Done, thx.

pulsar/producer_partition.go Outdated Show resolved Hide resolved
@nodece nodece requested review from wolfstudy and merlimat October 24, 2022 09:50
@Gleiphir2769
Copy link
Contributor Author

/pulsarbot run-failure-checks

@maraiskruger1980
Copy link
Contributor

Does the chunking work on shared subscription introduced in Pulsar 2.11?

@Gleiphir2769
Copy link
Contributor Author

Does the chunking work on shared subscription introduced in Pulsar 2.11

Hi @maraiskruger1980, This PR is finished when pulsar 2.11 has not been released. So it doesn't support shared subscription chunking.

I think I can take some time on it. Welcome to follow the progress.

@maraiskruger1980
Copy link
Contributor

That will be great if it can support shared subscription

@Gleiphir2769
Copy link
Contributor Author

That will be great if it can support shared subscription

Hi @maraiskruger1980. After I checked, it does no limit in consumer when subscription is shared. Which means you can safely consume chunking messages in shared subscribtion if your pulsar version >= 2.11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants