-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SAMZA-2800] A new Control Group Metric for Samza #1699
Conversation
@@ -47,7 +47,8 @@ | |||
scalaTestVersion = "3.0.1" | |||
snappyVersion = "1.1.8.4" | |||
slf4jVersion = "1.7.7" | |||
yarnVersion = "2.10.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to split up the Hadoop version bump (and the related hadoop zookeeper gradle exclude) to a separate PR? It feels like it may be significant enough to stand alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in the next push
if (o == null || getClass() != o.getClass()) { | ||
return false; | ||
} | ||
org.apache.samza.container.host.LinuxCgroupStatistics that = (org.apache.samza.container.host.LinuxCgroupStatistics) o; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be fully qualified here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in the next push
* under the License. | ||
*/ | ||
|
||
package org.apache.samza.container.host; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line break after package please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in the next push
} | ||
|
||
private double getCPUStat() { | ||
if (this.containerID.equals("NOT_DETECTED")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is NOT_DETECTED defined? Should this be a new const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in the next push
// return a sentinel value to signal this is not running on Hadoop | ||
return -2.0; | ||
} | ||
String[] controllers = {"cpu", "cpuacct", "cpu,cpuacct" }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order isn't guaranteed here. Please add "cpuacct,cpu".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I didn't think about directory name ordering. Updated in the next push
package org.apache.samza.container.host; | ||
|
||
import org.junit.Test; | ||
import static org.junit.Assert.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't wildcard import here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to figure out the IDE setting that does this. Updated in the next push.
package org.apache.samza.container.host; | ||
|
||
|
||
import static org.junit.Assert.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't wildcard import here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to figure out the IDE setting that does this. Updated in the next push.
I'm not happy with this approach after reviewing the changes. As these are YARN metrics, they should be part of the samza-yarn subproject and not part of samza-core. I'm going to close this PR and open a new one to simplify the change list. |
Introduction
Hadoop clusters have the ability to restrict CPU usage for Samza applications by utilizing Control Groups, (Cgroups).
Before enabling CPU enforcement on Hadoop clusters, application owners must have a way of knowing when their application is being throttled by Cgroups. This PR will add a new Cgroup metric that makes application owners aware if containers CPU usage is being throttled by control groups & whether the application needs to request additional resources.
Implementation
The Linux kernel reports when applications within a Cgroup has been throttled by writing values to a file named cpu.stat. cpu.stat contains two fields named nr_periods & nr_throttled. nr_periods represents the number of enforcement periods that elapsed. nr_thorttled represents the number of times the group has been throttled. We can treat these fields as a ratio that shows the number of times applications has been throttled over a number of enforcement periods. The proposal is to have the running container locate the cpu.stat file by reading property values from Hadoop's YARN config.
Implementation details
Considered Alternatives
I’m unaware of alternatives but reading values from cpu.stat is a pattern which appears in the Runc project. Runc is the underlying library for ContainerD which is used by both Docker & Kubernetes.
The metric needs to be emitted from the Samza container itself. Using a system daemon or sidecar application complicates deployments & creates data consistency issues when the sidecar process isn’t running.
External references