-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIP-85: [pulsar-io] pass pulsar client via context to connector #11056
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for bringing this patch to the community.
There has been a long discussion recently about adding this feature and in general to add support for accessing the Pulsar Client I to Pulsar IO.
There are some security concerns because the client we be able to leverage all of the resources of the client.
Also we have to ensure that the API that we expose will work correctly.
This patch also touches the Kafka integration part, probably it is better to split the patch into two parts.
I appreciate your initiative but this is kind of an hot topic in the community and it deserves a little bit of discussion.
During the past months we added new API like this one (see the Pulsar Admin API support) and it was a pain.
There are pending works about allowing to use the Pulsar Client API inside functions and IO connectors because we have classpath problems.
My suggestion here is to:
- Start a discussion on the dev@
- define clearly how functions and IO connectors must refer to the Pulsar API, for instance currently many connectors must bundle Pulsar client jars inside the nar file
- add integration tests that cover the list of supported API in the client (do we have to fully support the client API?)
@merlimat @jerrypeng @codelipenghui @lhotari @dlg99 please take a look as you are interested in this topic.
I am overall okay in adding this feature but this time we have to think more about the design of such feature
...a-connect-adaptor/src/test/java/org/apache/pulsar/io/kafka/connect/KafkaConnectSinkTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a useful thing to have, some minor nits to clean up.
I share @eolivelli 's concerns and looking forward for the consistent way forward for the Client/Admin in the context API that is agreed on by the community.
...nnect-adaptor/src/main/java/org/apache/pulsar/io/kafka/connect/PulsarOffsetBackingStore.java
Outdated
Show resolved
Hide resolved
pulsar-functions/api-java/src/main/java/org/apache/pulsar/functions/api/BaseContext.java
Outdated
Show resolved
Hide resolved
also it broke the tests
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your active review. I'll put this PR into review once all issues and failed tests are fixed.
...a-connect-adaptor/src/test/java/org/apache/pulsar/io/kafka/connect/KafkaConnectSinkTest.java
Show resolved
Hide resolved
...nnect-adaptor/src/main/java/org/apache/pulsar/io/kafka/connect/PulsarOffsetBackingStore.java
Outdated
Show resolved
Hide resolved
What are the security concerns? I think we have discussed that in one of the community meetings talking about the PulsarAdmin interface. The function uses the credentials/token to access the Pulsar resources. The credentials already limit the access on what resources that the function can access. I don't see why and how expose PulsarAdmin or PulsarClient can introduce security issues. Also we have to ensure that the API that we expose will work correctly.
The main reason for this change is to solve the problem of the Debezium connector. A Debezium connector is a wrapper over Kafka connector. I don't see why we need to split that into two parts.
We have fixed the class loading problem. If we don't encounter class loading problems, I don't know why this would be a concern.
I don't see how is that related to this pull request. These are two different issues. Please don't couple them together.
This change is used for replacing the Pulsar client in the Pulsar offset store, which already points to the same Pulsar cluster. So it is already being tested as part of debezium connector integration tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@eolivelli the auth data the pulsar client that is instantiated in a pulsar instance has should be properly scoped. Usually it is inherits the credentials of the the source/sink/function submitter. Thus, the instance will only be able to perform the operations that it is authorized to perform. |
@jerrypeng thanks for your clarification. @sijie But my point is that here we are adding again a new powerful API to the Pulsar Functions/IO system. We said many times during the past months (cc @merlimat) that we should use better the tool of the PIPs and we should add new APIs more carefully, being sure that the community is aware of what is happening. |
Hi, seems the CI check failed for some weird reason. Can anyone re-run all the test? |
/pulsarbot rerun-failure-checks |
e6879f0
to
8991fdf
Compare
Agree on having a PIP. but it doesn't mean that we shouldn't include the Pulsar client API. Class loading is a different problem that we need to solve. We don't necessarily need to couple different issues into one discussion. @nlu90 Can you send out a PIP to dev@ mailing list for this new API? |
Agreed. Thanks |
/pulsarbot run-failure-checks |
yeah, I just sent the PIP to dev mailing list. |
@eolivelli Can you please review and unblock this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @nlu90
I have restarted the failed CI job, we can merge as soon as CI is green.
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
b894ebe
to
90b844b
Compare
…y database (#11251) ### Motivation The Debezium requires pulsar a service URL for history database usage. In #11056 , the `service.url` field from `PulsarKafkaWorkerConfig` is no longer available. And the value is also deleted from multiple yaml config files in this [commit](3ce24c9). This causes the integration test for Debezium connector to fail. Based on the Debezium [paradigm](https://debezium.io/documentation/reference/1.5/connectors/mysql.html#debezium-mysql-connector-database-history-configuration-properties), all configurations should be passed as strings. There's no easy way to inject a PulsarClient via configuration. We need to ask user to provide the pulsar url explicitly and probably auth info also. ### Modifications 1. Make the `database.history.pulsar.service.url` field required 2. Add the config value back to example yaml files 3. Update the integration test config ### Verifying this change - [ ] Make sure that the change passes the CI checks.
…he#11056) ### Motivation Fixes apache#8668 ### Modifications Expose `PulsarClient` via `BaseContext`, and allow connectors to use the inherited pulsar client from function worker to produce/consume messages. ### Verifying this change - [ ] Make sure that the change passes the CI checks. This change is already covered by existing tests, such as: - PulsarOffsetBackingStoreTest - KafkaConnectSourceTest - KafkaConnectSinkTest ### Does this pull request potentially affect one of the following parts: - The public API: `SourceContext` and `SinkContext` need to implement the `getPulsarClient` method
…y database (apache#11251) ### Motivation The Debezium requires pulsar a service URL for history database usage. In apache#11056 , the `service.url` field from `PulsarKafkaWorkerConfig` is no longer available. And the value is also deleted from multiple yaml config files in this [commit](apache@3ce24c9). This causes the integration test for Debezium connector to fail. Based on the Debezium [paradigm](https://debezium.io/documentation/reference/1.5/connectors/mysql.html#debezium-mysql-connector-database-history-configuration-properties), all configurations should be passed as strings. There's no easy way to inject a PulsarClient via configuration. We need to ask user to provide the pulsar url explicitly and probably auth info also. ### Modifications 1. Make the `database.history.pulsar.service.url` field required 2. Add the config value back to example yaml files 3. Update the integration test config ### Verifying this change - [ ] Make sure that the change passes the CI checks.
…he#11056) ### Motivation Fixes apache#8668 ### Modifications Expose `PulsarClient` via `BaseContext`, and allow connectors to use the inherited pulsar client from function worker to produce/consume messages. ### Verifying this change - [ ] Make sure that the change passes the CI checks. This change is already covered by existing tests, such as: - PulsarOffsetBackingStoreTest - KafkaConnectSourceTest - KafkaConnectSinkTest ### Does this pull request potentially affect one of the following parts: - The public API: `SourceContext` and `SinkContext` need to implement the `getPulsarClient` method
…y database (apache#11251) ### Motivation The Debezium requires pulsar a service URL for history database usage. In apache#11056 , the `service.url` field from `PulsarKafkaWorkerConfig` is no longer available. And the value is also deleted from multiple yaml config files in this [commit](apache@3ce24c9). This causes the integration test for Debezium connector to fail. Based on the Debezium [paradigm](https://debezium.io/documentation/reference/1.5/connectors/mysql.html#debezium-mysql-connector-database-history-configuration-properties), all configurations should be passed as strings. There's no easy way to inject a PulsarClient via configuration. We need to ask user to provide the pulsar url explicitly and probably auth info also. ### Modifications 1. Make the `database.history.pulsar.service.url` field required 2. Add the config value back to example yaml files 3. Update the integration test config ### Verifying this change - [ ] Make sure that the change passes the CI checks.
Cherry pick #11293 ## Motivation The Debezium requires pulsar a service URL for history database usage. In #11056 , the service.url field from PulsarKafkaWorkerConfig is no longer available. And the value is also deleted from multiple yaml config files in this commit. This causes the integration test for Debezium connector to fail. Based on the Debezium paradigm, all configurations should be passed as strings. There's no easy way to inject a PulsarClient via configuration. We need to ask user to provide the pulsar url explicitly and probably auth info also. ## Modifications Make the database.history.pulsar.service.url field required Add the config value back to example yaml files Update the integration test config
…he#11056) Fixes apache#8668 Expose `PulsarClient` via `BaseContext`, and allow connectors to use the inherited pulsar client from function worker to produce/consume messages. - [ ] Make sure that the change passes the CI checks. This change is already covered by existing tests, such as: - PulsarOffsetBackingStoreTest - KafkaConnectSourceTest - KafkaConnectSinkTest - The public API: `SourceContext` and `SinkContext` need to implement the `getPulsarClient` method (cherry picked from commit cb2ba71)
…he#11056) ### Motivation Fixes apache#8668 ### Modifications Expose `PulsarClient` via `BaseContext`, and allow connectors to use the inherited pulsar client from function worker to produce/consume messages. ### Verifying this change - [ ] Make sure that the change passes the CI checks. This change is already covered by existing tests, such as: - PulsarOffsetBackingStoreTest - KafkaConnectSourceTest - KafkaConnectSinkTest ### Does this pull request potentially affect one of the following parts: - The public API: `SourceContext` and `SinkContext` need to implement the `getPulsarClient` method
…y database (apache#11251) ### Motivation The Debezium requires pulsar a service URL for history database usage. In apache#11056 , the `service.url` field from `PulsarKafkaWorkerConfig` is no longer available. And the value is also deleted from multiple yaml config files in this [commit](apache@3ce24c9). This causes the integration test for Debezium connector to fail. Based on the Debezium [paradigm](https://debezium.io/documentation/reference/1.5/connectors/mysql.html#debezium-mysql-connector-database-history-configuration-properties), all configurations should be passed as strings. There's no easy way to inject a PulsarClient via configuration. We need to ask user to provide the pulsar url explicitly and probably auth info also. ### Modifications 1. Make the `database.history.pulsar.service.url` field required 2. Add the config value back to example yaml files 3. Update the integration test config ### Verifying this change - [ ] Make sure that the change passes the CI checks.
Motivation
Fixes #8668
Modifications
Expose
PulsarClient
viaBaseContext
, and allow connectors to use the inherited pulsar client from function worker to produce/consume messages.Verifying this change
This change is already covered by existing tests, such as:
Does this pull request potentially affect one of the following parts:
SourceContext
andSinkContext
need to implement thegetPulsarClient
method