Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different MessageID implementations for message Production and Consumption #324

Merged
merged 1 commit into from
Jul 24, 2020

Conversation

dferstay
Copy link
Contributor

@dferstay dferstay commented Jul 14, 2020

This change splits the MessageID implementation in two:

  1. messageID - A 24 byte structure that contains message identification
    information only; to be used during message production
  2. trackingMessageID - A 72 byte structucture that shares the same
    message identification information as messageID
    and adds ackTracker, acker, and receivedTime
    fields; to be used during message consumption

Micro benchmarks show that passing arguments by value that are less-than
four words of memory are optimized by the Go runtime. Results from the
pulsar/impl_message_bench_test.go module are below.

name            time/op
ProducerCall    1.46ns ± 5%
ProducerCall-4  1.47ns ± 5%
ConsumerCall    7.62ns ± 1%
ConsumerCall-4  7.53ns ± 5%

Motivation

The messageID structure in pulsar-client-go has the following fields:

type messageID struct {
	ledgerID     int64
	entryID      int64
	batchIdx     int32
	partitionIdx int32

	tracker      *ackTracker
	consumer     acker
	receivedTime time.Time
}

The above consumes 72 bytes.

Consider the following:

  • The Go runtime optimizes copying of values that are less than or equal to 4 words (32 bytes on 64-bit architectures).
  • The tracker (8 bytes), consumer (16 bytes), and receivedTime (24 bytes) fields are not used when producing messages.

It would be advantageous to split the MessageID implementation into two structures: one used for message production, the other used for message consumption

Modifications

Split the MessageID implementation into two structures; one used during message production:

type messageID struct {
	ledgerID     int64
	entryID      int64
	batchIdx     int32
	partitionIdx int32

One used during message consumption:

type trackingMessageID struct {
	messageID

	tracker      *ackTracker
	consumer     acker
	receivedTime time.Time
}

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as:

  • pulsar/consumer_multitopic_test.go
  • pulsar/consumer_partition_test.go
  • pulsar/consumer_regex_test.go
  • pulsar/consumer_test.go
  • pulsar/impl_message_test.go
  • pulsar/negative_acks_tracker_test.go
  • pulsar/producer_test.go
  • pulsar/reader_test.go

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API: no
  • The schema: no
  • The default values of configurations: no
  • The wire protocol: no

Documentation

  • Does this pull request introduce a new feature? no

@wolfstudy wolfstudy requested review from merlimat and wolfstudy July 14, 2020 02:05
@wolfstudy wolfstudy added this to the 0.2.0 milestone Jul 14, 2020
@wolfstudy
Copy link
Member

wolfstudy commented Jul 14, 2020

Thanks @dferstay work for this. When we first added other fields to the messageID structure, I had the same considerations. This is the result of our final discussion. Reference to #82

@cckellogg What do you think of it?

…ption

This change splits the `MessageID` implementation in two:
1. `messageID` - A 24 byte structure that contains message identification
                 information only; to be used during message production
2. `trackingMessageID` - A 72 byte structucture that shares the same
                         message identification information as `messageID`
                         and adds `ackTracker`, `acker`, and `receivedTime`
                         fields; to be used during message consumption

Micro benchmarks show that passing arguments by value that are less-than
four words of memory are optimized by the Go runtime.  Results from the
pulsar/impl_message_bench_test.go module are below.

```
name            time/op
ProducerCall    1.46ns ± 5%
ProducerCall-4  1.47ns ± 5%
ConsumerCall    7.62ns ± 1%
ConsumerCall-4  7.53ns ± 5%
```
@dferstay dferstay force-pushed the split-message-id-impl branch from e8409ef to 593d7c1 Compare July 14, 2020 02:21
Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 We saw the actual difference with this approach.

@merlimat merlimat merged commit c6d905d into apache:master Jul 24, 2020
BewareMyPower added a commit to BewareMyPower/pulsar-client-go that referenced this pull request Feb 27, 2023
### Motivation

Currently there are three implementations of the `MessageID` interface:
- `messageID`: 24 bytes
- `trackingMessageID`: 64 bytes
- `chunkMessageID`: 80 bytes

However, for all methods of them, the receiver is a value rather than a
pointer. It's inefficient because each time a method is called, the copy
would happen.

Reference: https://go.dev/tour/methods/8

### Modifications

- Change the receiver from value to pointer for all `MessageID`
  implementations.
- Use pointers as the returned values and function parameters for these
  implementations everywhere.

The `trackingMessageID.Undefined` method is removed because it's never
used now. Though it's a public method, the struct and its factory
function are not exposed, so I think it's reasonable.

Remove the benchmark added in
apache#324. The result is
obvious and this test is meaningless. I tried passing the
`trackingMessageID` by pointer and the result reduced from 8.548 ns/op
to 1.628 ns/op. It's obvious because a pointer is only 8 bytes while a
`trackingMessageID` is 64 bytes. The overhead of accessing by pointers
is far less than copying the extra bytes.
BewareMyPower added a commit to BewareMyPower/pulsar-client-go that referenced this pull request Feb 27, 2023
### Motivation

Currently there are three implementations of the `MessageID` interface:
- `messageID`: 24 bytes
- `trackingMessageID`: 64 bytes
- `chunkMessageID`: 80 bytes

However, for all methods of them, the receiver is a value rather than a
pointer. It's inefficient because each time a method is called, the copy
would happen.

Reference: https://go.dev/tour/methods/8

### Modifications

- Change the receiver from value to pointer for all `MessageID`
  implementations.
- Use pointers as the returned values and function parameters for these
  implementations everywhere.

The `trackingMessageID.Undefined` method is removed because it's never
used now. Though it's a public method, the struct and its factory
function are not exposed, so I think it's reasonable.

Remove the benchmark added in
apache#324. The result is
obvious and this test is meaningless. I tried passing the
`trackingMessageID` by pointer and the result reduced from 8.548 ns/op
to 1.628 ns/op. It's obvious because a pointer is only 8 bytes while a
`trackingMessageID` is 64 bytes. The overhead of accessing by pointers
is far less than copying the extra bytes.
BewareMyPower added a commit to BewareMyPower/pulsar-client-go that referenced this pull request Feb 27, 2023
### Motivation

Currently there are three implementations of the `MessageID` interface:
- `messageID`: 24 bytes
- `trackingMessageID`: 64 bytes
- `chunkMessageID`: 80 bytes

However, for all methods of them, the receiver is a value rather than a
pointer. It's inefficient because each time a method is called, the copy
would happen.

Reference: https://go.dev/tour/methods/8

### Modifications

- Change the receiver from value to pointer for all `MessageID`
  implementations.
- Use pointers as the returned values and function parameters for these
  implementations everywhere.

The `trackingMessageID.Undefined` method is removed because it's never
used now. Though it's a public method, the struct and its factory
function are not exposed, so I think it's reasonable.

Remove the benchmark added in
apache#324. The result is
obvious and this test is meaningless. I tried passing the
`trackingMessageID` by pointer and the result reduced from 8.548 ns/op
to 1.628 ns/op. It's obvious because a pointer is only 8 bytes while a
`trackingMessageID` is 64 bytes. The overhead of accessing by pointers
is far less than copying the extra bytes.
BewareMyPower added a commit to BewareMyPower/pulsar-client-go that referenced this pull request Feb 28, 2023
### Motivation

Currently there are three implementations of the `MessageID` interface:
- `messageID`: 24 bytes
- `trackingMessageID`: 64 bytes
- `chunkMessageID`: 80 bytes

However, for all methods of them, the receiver is a value rather than a
pointer. It's inefficient because each time a method is called, the copy
would happen.

Reference: https://go.dev/tour/methods/8

### Modifications

- Change the receiver from value to pointer for all `MessageID`
  implementations.
- Use pointers as the returned values and function parameters for these
  implementations everywhere.

The `trackingMessageID.Undefined` method is removed because it's never
used now. Though it's a public method, the struct and its factory
function are not exposed, so I think it's reasonable.

Remove the benchmark added in
apache#324. The result is
obvious and this test is meaningless. I tried passing the
`trackingMessageID` by pointer and the result reduced from 8.548 ns/op
to 1.628 ns/op. It's obvious because a pointer is only 8 bytes while a
`trackingMessageID` is 64 bytes. The overhead of accessing by pointers
is far less than copying the extra bytes.
BewareMyPower added a commit to BewareMyPower/pulsar-client-go that referenced this pull request Feb 28, 2023
### Motivation

Currently there are three implementations of the `MessageID` interface:
- `messageID`: 24 bytes
- `trackingMessageID`: 64 bytes
- `chunkMessageID`: 80 bytes

However, for all methods of them, the receiver is a value rather than a
pointer. It's inefficient because each time a method is called, the copy
would happen.

Reference: https://go.dev/tour/methods/8

### Modifications

- Change the receiver from value to pointer for all `MessageID`
  implementations.
- Use pointers as the returned values and function parameters for these
  implementations everywhere.

The `trackingMessageID.Undefined` method is removed because it's never
used now. Though it's a public method, the struct and its factory
function are not exposed, so I think it's reasonable.

Remove the benchmark added in
apache#324. The result is
obvious and this test is meaningless. I tried passing the
`trackingMessageID` by pointer and the result reduced from 8.548 ns/op
to 1.628 ns/op. It's obvious because a pointer is only 8 bytes while a
`trackingMessageID` is 64 bytes. The overhead of accessing by pointers
is far less than copying the extra bytes.
BewareMyPower added a commit to BewareMyPower/pulsar-client-go that referenced this pull request Feb 28, 2023
### Motivation

Currently there are three implementations of the `MessageID` interface:
- `messageID`: 24 bytes
- `trackingMessageID`: 64 bytes
- `chunkMessageID`: 80 bytes

However, for all methods of them, the receiver is a value rather than a
pointer. It's inefficient because each time a method is called, the copy
would happen.

Reference: https://go.dev/tour/methods/8

### Modifications

- Change the receiver from value to pointer for all `MessageID`
  implementations.
- Use pointers as the returned values and function parameters for these
  implementations everywhere.

The `trackingMessageID.Undefined` method is removed because it's never
used now. Though it's a public method, the struct and its factory
function are not exposed, so I think it's reasonable.

Remove the benchmark added in
apache#324. The result is
obvious and this test is meaningless. I tried passing the
`trackingMessageID` by pointer and the result reduced from 8.548 ns/op
to 1.628 ns/op. It's obvious because a pointer is only 8 bytes while a
`trackingMessageID` is 64 bytes. The overhead of accessing by pointers
is far less than copying the extra bytes.
BewareMyPower added a commit to BewareMyPower/pulsar-client-go that referenced this pull request Mar 1, 2023
### Motivation

Currently there are three implementations of the `MessageID` interface:
- `messageID`: 24 bytes
- `trackingMessageID`: 64 bytes
- `chunkMessageID`: 80 bytes

However, for all methods of them, the receiver is a value rather than a
pointer. It's inefficient because each time a method is called, the copy
would happen.

Reference: https://go.dev/tour/methods/8

### Modifications

- Change the receiver from value to pointer for all `MessageID`
  implementations.
- Use pointers as the returned values and function parameters for these
  implementations everywhere.

The `trackingMessageID.Undefined` method is removed because it's never
used now. Though it's a public method, the struct and its factory
function are not exposed, so I think it's reasonable.

Remove the benchmark added in
apache#324. The result is
obvious and this test is meaningless. I tried passing the
`trackingMessageID` by pointer and the result reduced from 8.548 ns/op
to 1.628 ns/op. It's obvious because a pointer is only 8 bytes while a
`trackingMessageID` is 64 bytes. The overhead of accessing by pointers
is far less than copying the extra bytes.
BewareMyPower added a commit that referenced this pull request Mar 1, 2023
…#968)

### Motivation

Currently there are three implementations of the `MessageID` interface:
- `messageID`: 24 bytes
- `trackingMessageID`: 64 bytes
- `chunkMessageID`: 80 bytes

However, for all methods of them, the receiver is a value rather than a
pointer. It's inefficient because each time a method is called, the copy
would happen.

Reference: https://go.dev/tour/methods/8

### Modifications

- Change the receiver from value to pointer for all `MessageID`
  implementations.
- Use pointers as the returned values and function parameters for these
  implementations everywhere.

The `trackingMessageID.Undefined` method is removed because it's never
used now. Though it's a public method, the struct and its factory
function are not exposed, so I think it's reasonable.

Remove the benchmark added in
#324. The result is
obvious and this test is meaningless. I tried passing the
`trackingMessageID` by pointer and the result reduced from 8.548 ns/op
to 1.628 ns/op. It's obvious because a pointer is only 8 bytes while a
`trackingMessageID` is 64 bytes. The overhead of accessing by pointers
is far less than copying the extra bytes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants