feat: add profiler for request execution details for write api connection worker #2555

GaoleMeng · 2024-07-10T21:00:25Z

Add a profiler to generate periodical report.

In the next PR, we would start to add interested point at which we want to examine the performance.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> ☕️

If you write sample code, please follow the samples format.

will be added in the next PR

agrawal-siddharth

So if I understand correctly, once this profiler is available, subsequent commits will then start using it in various places within the codebase. There is definitely overlap with the metrics being generated for OpenTelemetry. Perhaps eventually we could automatically profile those metrics with this tool.

agrawal-siddharth · 2024-07-11T19:31:23Z

...loud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/RequestProfiler.java

+
+  private static final Logger log = Logger.getLogger(RequestProfiler.class.getName());
+
+  // Control per how many requests we log one time for a dropped operation.


I'm not clear what is meant by "dropped"? Does this mean a request was rejected?

When we cache too many request I would drop that, added an comment.

agrawal-siddharth · 2024-07-11T19:38:04Z

...loud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/RequestProfiler.java

+ * </pre>
+ */
+public class RequestProfiler {
+  enum OperationName {


Why does the name of the operation need to be pre-defined or known in advance to the profiler? Is this just to ensure consistency in how it is used?

Yeah consistency, so that we would deal with random name

Let's ensure there is consistency between the metric names identified here, and those currently used and future ones to be added for OpenTelemetry metrics. The latter are defined in this document: http://go/writeapi-telemetry. See the names used under "List of instruments".

If there is a need for profiling metrics there is likely a need for the same OpenTelemetry metric.

I think the current overlap is the backend latency, let's change it to NETWORK_RESPONSE_LATENCY

agrawal-siddharth · 2024-07-11T21:47:53Z

...loud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/RequestProfiler.java

+  // Control per how many requests we log one time for a dropped operation.
+  // An operation can be dropped if we are caching too many requests (by default 100000) in memory. If we are at that
+  // state, any new operation for new requests would be dropped.
+  private static final int LOG_PER_DROPPED_OPERATION = 50;


OK. Now I understand this constant better. It really just helps with reducing the volume of log messages generated around dropped operations. Maybe it would be more clear if you defined this constant after defining the MAX_CACHED_REQUEST. Perhaps a better name would be DROPPED_OPERATION_LOG_FREQUENCY.

I think, let's just remove this variable and related the logic, the upper cache bound is really large, I would say safe guarding such extreme case does not have too much value

agrawal-siddharth · 2024-07-11T21:48:59Z

...loud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/RequestProfiler.java

+    // Json to proto conversion time.
+    JSON_TO_PROTO_CONVERSION("json_to_proto_conversion"),
+    // Time spent to fetch the table schema when user didn't provide it.
+    SCHEMA_FECTCHING("schema_fetching"),


typo here, should be SCHEMA_FETCHING

agrawal-siddharth

I'm wondering if tracking every single request will result in excessive memory consumption. To deal with this, you may need to:

(a) use histogram buckets to break down individual values into aggregate values, or

(b) dynamically keep track of only the K longest latency requests per type.

agrawal-siddharth · 2024-07-12T17:42:56Z

...loud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/RequestProfiler.java

+ * </pre>
+ */
+public class RequestProfiler {
+  enum OperationName {


Let's ensure there is consistency between the metric names identified here, and those currently used and future ones to be added for OpenTelemetry metrics. The latter are defined in this document: http://go/writeapi-telemetry. See the names used under "List of instruments".

If there is a need for profiling metrics there is likely a need for the same OpenTelemetry metric.

GaoleMeng

Discussed offline that it's not easy to come up with a clean solution that would only keep the top k requests all the time to save memory. Since it's a debugging only class and each flush would release the memory of the past requests, we think it's fine to continue with the current solution

agrawal-siddharth · 2024-07-13T01:11:18Z

...loud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/RequestProfiler.java

+                  + "seen before, this is possible when "
+                  + "we are recording too much ongoing requests. So far we has dropped %s operations.",
+              requestUniqueId, droppedOperationCount));
+      droppedOperationCount.incrementAndGet();


Should this be incremented here as well? What if the calling code always calls startOperation() and endOperation()?

yirutang · 2024-07-11T16:18:22Z

...loud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/RequestProfiler.java

+ * -----------------------------
+ * 	Request uuid: request_1 with total time 1000 milliseconds
+ * 		Operation name json_to_proto_conversion starts at: 1720566109971, ends at: 1720566109971, total time: 200 milliseconds
+ * 		Operation name backend_latency starts at: 1720566109971, ends at: 1720566109971, total time: 800 milliseconds


I am wondering if we can call this better, this is not pure backend_latency, it is backend_latency + network_latency?

And one problem is this is that we still cannot join this with full backend performance stats.

The current way we can join is through timestamp, unless this request id can be passed to backend.

Let's call it response_latency to make it clear

yirutang · 2024-07-11T17:03:09Z

...loud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/RequestProfiler.java

+ * ...
+ * </pre>
+ */
+public class RequestProfiler {


This should have package visibility? Maintain a public class is a lot more difficult.

I don't see it is changed? It is still a public class.

yirutang · 2024-07-11T17:03:49Z

...loud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/RequestProfiler.java

+    // Time spent to fetch the table schema when user didn't provide it.
+    SCHEMA_FECTCHING("schema_fetching"),
+    // Time spent within wait queue before it get picked up.
+    WAIT_QUEUE("wait_queue"),


I am wondering if additional stats, such as wait queue length can be attached.

We could, but that requires really hard interface change, I wanna shrink the scope to time related profiling for now.

Merge branch 'main' of https://github.com/GaoleMeng/java-bigquerystorage

…tion worker (googleapis#2555) * Add profiler for request execution details. The usage of the new API will be added in the next PR * Add profiler for request execution details. The usage of the new API will be added in the next PR

Add profiler for request execution details. The usage of the new API

d8f28c0

will be added in the next PR

GaoleMeng requested a review from a team as a code owner July 10, 2024 21:00

GaoleMeng requested a review from agrawal-siddharth July 10, 2024 21:00

product-auto-label bot added size: l Pull request size is large. api: bigquerystorage Issues related to the googleapis/java-bigquerystorage API. labels Jul 10, 2024

GaoleMeng requested a review from yirutang July 10, 2024 21:00

GaoleMeng added 3 commits July 10, 2024 21:01

Add profiler for request execution details. The usage of the new API

60a224d

will be added in the next PR

Merge branch 'main' of https://github.com/GaoleMeng/java-bigquerystorage

2178bed

Merge branch 'main' of https://github.com/GaoleMeng/java-bigquerystorage

0f16091

GaoleMeng force-pushed the main branch 6 times, most recently from b4e467d to ef12805 Compare July 11, 2024 04:43

GaoleMeng changed the title ~~Add profiler for request execution details. The usage of the new API~~ feat: add profiler for request execution details for write api connection worker Jul 11, 2024

agrawal-siddharth requested changes Jul 11, 2024

View reviewed changes

GaoleMeng force-pushed the main branch from ef12805 to 769ed0b Compare July 11, 2024 21:36

agrawal-siddharth requested changes Jul 11, 2024

View reviewed changes

GaoleMeng force-pushed the main branch 2 times, most recently from 082c376 to a8f8d5a Compare July 11, 2024 23:44

agrawal-siddharth requested changes Jul 12, 2024

View reviewed changes

GaoleMeng force-pushed the main branch 3 times, most recently from daac318 to f0cbac8 Compare July 12, 2024 23:00

GaoleMeng self-assigned this Jul 12, 2024

GaoleMeng commented Jul 12, 2024

View reviewed changes

GaoleMeng force-pushed the main branch from f0cbac8 to f6aa5ab Compare July 12, 2024 23:15

GaoleMeng requested a review from agrawal-siddharth July 13, 2024 00:00

agrawal-siddharth requested changes Jul 13, 2024

View reviewed changes

GaoleMeng force-pushed the main branch from f6aa5ab to 788937f Compare July 15, 2024 19:21

agrawal-siddharth approved these changes Jul 15, 2024

View reviewed changes

yirutang reviewed Jul 15, 2024

View reviewed changes

GaoleMeng force-pushed the main branch from 788937f to 80bcd04 Compare July 15, 2024 20:05

add profiler for request execution details.

de95efd

Merge branch 'main' of https://github.com/GaoleMeng/java-bigquerystorage

GaoleMeng force-pushed the main branch from 80bcd04 to de95efd Compare July 15, 2024 20:20

GaoleMeng merged commit 5691bd5 into googleapis:main Jul 15, 2024
19 checks passed

release-please bot mentioned this pull request Jul 15, 2024

chore(main): release 3.7.0 #2551

Merged

PhongChuong mentioned this pull request Jul 31, 2024

RequestProfilerTest.java is incorrectly being removed autogenerated code PR #2588

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add profiler for request execution details for write api connection worker #2555

feat: add profiler for request execution details for write api connection worker #2555

GaoleMeng commented Jul 10, 2024

agrawal-siddharth left a comment

agrawal-siddharth Jul 11, 2024

GaoleMeng Jul 11, 2024

agrawal-siddharth Jul 11, 2024

GaoleMeng Jul 11, 2024

agrawal-siddharth Jul 12, 2024

GaoleMeng Jul 12, 2024

agrawal-siddharth Jul 11, 2024

GaoleMeng Jul 11, 2024

agrawal-siddharth Jul 11, 2024

GaoleMeng Jul 11, 2024

agrawal-siddharth left a comment

agrawal-siddharth Jul 12, 2024

GaoleMeng left a comment

agrawal-siddharth Jul 13, 2024

GaoleMeng Jul 15, 2024

yirutang Jul 11, 2024

GaoleMeng Jul 15, 2024

yirutang Jul 11, 2024

GaoleMeng Jul 15, 2024

yirutang Jul 15, 2024

yirutang Jul 11, 2024

GaoleMeng Jul 15, 2024


		private static final Logger log = Logger.getLogger(RequestProfiler.class.getName());

		// Control per how many requests we log one time for a dropped operation.

feat: add profiler for request execution details for write api connection worker #2555

feat: add profiler for request execution details for write api connection worker #2555

Conversation

GaoleMeng commented Jul 10, 2024

agrawal-siddharth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agrawal-siddharth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GaoleMeng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment