speed up analyze-outcomes stage #144

daverodgman · 2024-01-02T12:17:51Z

Improve perf of the analyse-outcomes stage of CI, which especially matters because it's a serial stage of the CI pipeline.

Currently, it takes around 247s to run xz on the CI.

Testing locally, current xz command takes 130s. With -0, it takes 15s (so expect ~28.5s on CI). This increases compressed size from 38 MB to 54 MB.

Using -T8 (8 threads) reduces it further to 2.6s locally (expect ~5s on CI). There's no penalty for going too big on thread count, but no benefit beyond about 8-16.

lzma, gzip, bzip2, zip are all worse.

Also, use grep instead of awk to extract failures, which saves a few more seconds.

CI run: https://jenkins-mbedtls.oss.arm.com/blue/organizations/jenkins/mbed-tls-pr-head/detail/PR-8530-head/9/pipeline/

Signed-off-by: Dave Rodgman <[email protected]>

gilles-peskine-arm

OK to switch to grep, but only if the limitation is documented.

Not ok to use more threads than we have cores.

vars/analysis.groovy

gilles-peskine-arm · 2024-01-02T12:35:07Z

vars/analysis.groovy

 # Compress the failure list if it is large (for some value of large)
 if [ "$(wc -c <failures.csv)" -gt 99999 ]; then
-    xz failures.csv
+    xz -0 -T8 failures.csv


Why -T8? AFAIK we don't have 8-core executors on the CI.

Beware of insisting on too much parallelism: this can create a load spike that slows down other executors, resulting in worse performance overall. You can't measure the impact just by running one job.

Using -T8 (8 threads) reduces it further to 2.6s locally (expect ~5s on CI). There's no penalty for going too big on thread count, but no benefit beyond about 8-16.

Measuring on a machine with a different number of cores is not useful to guess at the performance on the CI.

Well, -T1000 does not downgrade performance locally. So my assumption is that -T8 is fine and will benefit if the CI improves in the future. It looks like xz caps threads at CPU count (locally, I see improvement up to -T8), so my expectation is that this will work well on the CI.

I am aware of the load-spike concern, but as this is a serial stage, it is a win to steal CPU capacity from the parallel stages.

I still don't see a reason to use -T8. Why not -T0?

As far as I understand, the actual number of threads is limited by the memory consumption (which I don't expect to matter at -O0), by the -T value and by the file size, but not by the number of cores. Looking at the xz source code, lzma_cputhreads() is called to determine what hardware_threads_get() will return when using -T0, and then the encoder only relies on hardware_threads_get() and not lzma_cputhreads().

Since 8 is an arbitrary number here, I'd want to see some benchmark that justifies it. Otherwise please use -T0 (the number of cores seems like a sensible default choice).

This is still unaddressed.

gilles-peskine-arm · 2024-01-02T12:35:57Z

vars/analysis.groovy

 # Compress the failure list if it is large (for some value of large)
 if [ "$(wc -c <failures.csv)" -gt 99999 ]; then
-    xz failures.csv
+    xz -0 -T8 failures.csv


lzma, gzip, bzip2, zip are all worse.

I'd expect them to result in worse compression, but maybe we're ok with less compression but less CPU consumption?

The trade-off looked non-favourable in all cases; the closest was probably gzip, which is faster (in single-thread mode, but it doesn't have a multi-thread mode), but almost twice as big output - so xz -T<n> wins overall.

Multithreaded mode may be counterproductive anyway, but have you tried pigz? (In a realistic scenario, not on your machine!)

pigz -1 does look considerably better for both perf (0.36s locally) and size (25 MB, about 2x better).

By the way, we need to keep in mind that another element of the compromise is storage. That's the reason we compress, not to save on download time/bandwidth. A vast majority of outcome files are never consumed, but all are stored for some time. I don't know how much we're spending on storage, and in particular how the (compressed) outcome files compare with logs.

We can always up the storage budget if we need to. As it stands, an extra 18 MB / PR run is unlikely to cause big budget issues IMO.

I'm tracking the pigz suggestion (thank you) as a separate follow-up in #150, since that requires additional CI work beyond the scope of this PR.

Signed-off-by: Dave Rodgman <[email protected]>

vars/analysis.groovy

LC_ALL in wrong place

Signed-off-by: Dave Rodgman <[email protected]>

tom-cosgrove-arm

LGTM

tom-cosgrove-arm · 2024-01-03T08:14:04Z

CI is green

@gilles-peskine-arm are you happy enough with this now to approve?

Signed-off-by: Dave Rodgman <[email protected]>

daverodgman · 2024-01-03T11:04:13Z

@gilles-peskine-arm I have updated to use -T0, please review

gilles-peskine-arm

Looks good to me on code inspection.

Ok to merge with a passing CI run (either release or PR will do).

daverodgman · 2024-01-04T14:47:10Z

Test run shows xz taking 15s https://jenkins-mbedtls.oss.arm.com/blue/rest/organizations/jenkins/pipelines/mbed-tls-pr-head/branches/PR-8530-head/runs/12/nodes/8129/log/?start=0

speed up outcome file compression

c662496

Signed-off-by: Dave Rodgman <[email protected]>

daverodgman force-pushed the compression_perf branch from f6500ab to c662496 Compare January 2, 2024 12:18

daverodgman requested review from bensze01 and tom-cosgrove-arm January 2, 2024 12:19

daverodgman added enhancement New feature or request needs: ci needs: review priority-high size-xs Estimated task size: extra small (a few hours at most) labels Jan 2, 2024

gilles-peskine-arm requested changes Jan 2, 2024

View reviewed changes

gilles-peskine-arm added the needs: work label Jan 2, 2024

daverodgman added 2 commits January 2, 2024 12:53

Add comment about grep false positives

1fcbc38

Signed-off-by: Dave Rodgman <[email protected]>

set locale when running grep for robustness

828caca

Signed-off-by: Dave Rodgman <[email protected]>

daverodgman removed the needs: work label Jan 2, 2024

daverodgman mentioned this pull request Jan 2, 2024

Switch from xz to pigz for analyze-outcomes #150

Open

tom-cosgrove-arm previously approved these changes Jan 2, 2024

View reviewed changes

gilles-peskine-arm reviewed Jan 2, 2024

View reviewed changes

vars/analysis.groovy Outdated Show resolved Hide resolved

Fix LC_ALL in wrong place

86fb584

Signed-off-by: Dave Rodgman <[email protected]>

tom-cosgrove-arm previously approved these changes Jan 3, 2024

View reviewed changes

tom-cosgrove-arm removed the needs: ci label Jan 3, 2024

gilles-peskine-arm added needs: work and removed needs: review labels Jan 3, 2024

allow xz to pick num threads

ba120af

Signed-off-by: Dave Rodgman <[email protected]>

daverodgman dismissed tom-cosgrove-arm’s stale review via ba120af January 3, 2024 11:02

daverodgman removed the needs: work label Jan 3, 2024

daverodgman added the needs: review label Jan 3, 2024

tom-cosgrove-arm approved these changes Jan 3, 2024

View reviewed changes

gilles-peskine-arm approved these changes Jan 3, 2024

View reviewed changes

gilles-peskine-arm added needs: ci approved Approved in review. May need additional CI. and removed needs: review labels Jan 3, 2024

daverodgman removed the needs: ci label Jan 4, 2024

daverodgman merged commit 865d01b into Mbed-TLS:master Jan 4, 2024
1 check passed

daverodgman deleted the compression_perf branch January 4, 2024 14:47

gilles-peskine-arm mentioned this pull request Jan 23, 2024

Fix generation of failures.csv when there are only non-outcome failures #154

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up analyze-outcomes stage #144

speed up analyze-outcomes stage #144

daverodgman commented Jan 2, 2024 •

edited

Loading

gilles-peskine-arm left a comment

gilles-peskine-arm Jan 2, 2024

daverodgman Jan 2, 2024

gilles-peskine-arm Jan 2, 2024

gilles-peskine-arm Jan 3, 2024

gilles-peskine-arm Jan 2, 2024

daverodgman Jan 2, 2024

gilles-peskine-arm Jan 2, 2024

daverodgman Jan 2, 2024

gilles-peskine-arm Jan 2, 2024

daverodgman Jan 2, 2024

daverodgman Jan 2, 2024 •

edited

Loading

tom-cosgrove-arm left a comment

tom-cosgrove-arm commented Jan 3, 2024

daverodgman commented Jan 3, 2024

gilles-peskine-arm left a comment

daverodgman commented Jan 4, 2024

speed up analyze-outcomes stage #144

speed up analyze-outcomes stage #144

Conversation

daverodgman commented Jan 2, 2024 • edited Loading

gilles-peskine-arm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daverodgman Jan 2, 2024 • edited Loading

Choose a reason for hiding this comment

tom-cosgrove-arm left a comment

Choose a reason for hiding this comment

tom-cosgrove-arm commented Jan 3, 2024

daverodgman commented Jan 3, 2024

gilles-peskine-arm left a comment

Choose a reason for hiding this comment

daverodgman commented Jan 4, 2024

daverodgman commented Jan 2, 2024 •

edited

Loading

daverodgman Jan 2, 2024 •

edited

Loading