Automatically set heap size hint on workers #270

MilesCranmer · 2023-12-22T02:35:49Z

Changes:

Introduce heap_size_hint_in_bytes parameter for setting up distributed workers.
Change from using print to using @info when printing out updates to the search process.
Various refactoring to break down the main methods into small pieces.

Hopefully should fix issues like this:

MilesCranmer/PySR#490 (@eelregit and @paulomontero)

which is due to poor garbage collection in Julia when using distributed processing:

The workaround I have added here is basically to hint to every process what it should choose as its memory limit before aggressive garbage collection. I automatically select that hint based on total memory divided by number of processes. But the user can pass a per-worker hint of their own (such as for multi-node, where the single-node memory is much less than total memory across nodes).

This seems to work well for preventing OOM errors when I tried it.

TODO:

Document in docstring
Add option to SRRegressor
Unittest
- (Unclear how to test for this, so skipping a specific test)

This also removes the explicit precompilation script as I think it didn't help much.

github-actions · 2023-12-22T03:05:58Z

Benchmark Results

	master	`0becbf4`...	t[master]/t[`0becbf4`...]
search/multithreading	27.6 ± 3.9 s	27.9 ± 3.5 s	0.99
search/serial	31.5 ± 0.62 s	31.6 ± 0.67 s	0.998
utils/best_of_sample	0.791 ± 0.22 μs	0.772 ± 0.24 μs	1.02
utils/check_constraints_x10	12.7 ± 3.2 μs	12.6 ± 3.2 μs	1
utils/compute_complexity_x10/Float64	2.31 ± 0.11 μs	2.33 ± 0.12 μs	0.988
utils/compute_complexity_x10/Int64	2.29 ± 0.11 μs	2.28 ± 0.12 μs	1
utils/compute_complexity_x10/nothing	1.51 ± 0.11 μs	1.55 ± 0.11 μs	0.974
utils/optimize_constants_x10	29 ± 6.5 ms	29.4 ± 6.7 ms	0.985
time_to_load	1.95 ± 0.003 s	2.04 ± 0.015 s	0.959

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

This reverts commit 973e3e6.

@MilesCranmer

[Diff since v0.22.5](v0.22.5...v0.23.0) **Merged pull requests:** - Automatically set heap size hint on workers (#270) (@MilesCranmer) **Closed issues:** - How do I set up a basis function consisting of three different inputs x, y, z? (#268)

eelregit · 2024-01-04T06:16:23Z

Thanks Miles!

I am running the news versions on 4 rusty icelake nodes, and find that the mem usage of each node is only 56GB/1TB. This is surprising because I have increased the max_size from 25~35 to 64, but with the new versions the memory usage decreases. I wonder if this is expected?

MilesCranmer · 2024-01-04T18:12:30Z

Is it slower at all?

It doesn’t actually need the memory. But basically letting it use more memory can make the garbage collection more efficient as it can do it in batches. But if it really doesn’t need all of the RAM it’s not a big issue.

eelregit · 2024-01-05T05:59:45Z

Is it slower at all?

Previously: 1 node, max_size is 25 or 35, about 1e5 expr/sec
Now: 4 nodes, max_size is 64, about 5e4 expr/sec
Order-of-magnitude-wise, if the time complexity ~ $\mathcal{O}(\mathrm{maxsize}^2)$,
it doesn't seem to be slower. Maybe running with multi nodes hurt the speed a bit?

BTW, I have a simple script to plot the log-log pareto front and its lower convex hull.
I can add it as PySRRegressor.pareto_plot if you think it's useful.

hall_of_fame_2024-01-03_102321.228.pdf

The convex hull shows the tradeoffs in power law forms: $\mathrm{loss} \cdot \mathrm{complexity}^\alpha = \mathrm{const}$.
And the typical behavior of the pareto curves is like some trajectories of balls bouncing down the convex hull,
the model right before or at the ball landing is the economical ones for each bounce.
The range of complexity is naturally divided by those bounces.

I think it'd be cool to have tensorboard tracking these figures over time.
And equation too, but I don't know how to add the latex table to tensorboard.

eelregit · 2024-01-05T06:03:32Z

It doesn’t actually need the memory. But basically letting it use more memory can make the garbage collection more efficient as it can do it in batches. But if it really doesn’t need all of the RAM it’s not a big issue.

Forgot to ask: I guess that means julia garbage collection starts to work quite hard even with a little nudge?

MilesCranmer · 2024-01-05T07:17:33Z

Previously: 1 node, max_size is 25 or 35, about 1e5 expr/sec
Now: 4 nodes, max_size is 64, about 5e4 expr/sec

From the settings you described, this scaling sounds fine – I don't think there's any slowdown from the change in this PR.

I'm surprised you are only getting 5e4 expr/sec though. Typically on 4 rusty nodes I can get 5e6+ expr/sec for maxsize 50 and 100 datapoints. How many datapoints are you running? Is the CPU load reasonable on all nodes?

Do you want to open an issue here or a discussion thread on the PySR forums to debug this? https://github.com/MilesCranmer/PySR/discussions

eelregit · 2024-01-05T09:07:38Z

Opened an discussion: MilesCranmer/PySR#518

And equation too, but I don't know how to add the latex table to tensorboard.

And tensorboard can actually log text: https://www.tensorflow.org/tensorboard/text_summaries
So if you want I can add that as an option, to accompany the huge progress log from slurm with a better web interface.

MilesCranmer · 2024-01-05T09:17:15Z

That’s a great idea! I see there’s a Julia plugin as well: https://github.com/JuliaLogging/TensorBoardLogger.jl

eelregit · 2024-01-05T09:24:30Z

Ah right, this has to be done in the Julia loop. I see that matplotlib can be called there as well https://github.com/JuliaPy/PyPlot.jl
But maybe this is too circular? And I don't really know much Julia 😅

MilesCranmer · 2024-01-05T11:00:14Z

Actually it's okay because they are already talking to eachother, so you can totally call Python stuff from the Julia loop.

MilesCranmer added 5 commits December 21, 2023 20:44

[wip] attempt to define heap size on workers automatically

e9e44c3

Remove extra precompilation

973e3e6

Try not specializing on Options

e2a85f1

Clean up adding procs

bd961f4

Clean up exeflags setting

4d348c5

MilesCranmer mentioned this pull request Dec 22, 2023

[BUG]: Possible memory leakage & best practices for memory scaling? MilesCranmer/PySR#490

Closed

MilesCranmer added 13 commits December 22, 2023 05:53

Use randperm instead of shuffle

1b0934d

Specialize on options again

70b2975

Remove unused shuffle

a7ec891

Revert "Remove extra precompilation"

1c4e34f

This reverts commit 973e3e6.

Remove function specialization

3182ee0

Fix precompilation

721a3c3

Refactor more option assertions

170cd63

Change from copy_pop_member to just copy

550cd71

Use Base.copy for more types rather than copy_

b2bcd70

Remove redundant variable in main search loop

a0c4f83

Refactor main cycle of search code

4dc0ad8

Streamline worker setup to separate method

7d48f70

Avoid extra allocation to record type

f5ccffd

MilesCranmer force-pushed the auto-heap-size branch from ffe7bcc to d732afd Compare December 23, 2023 09:12

Prefer to use @info instead of internal debug command

9a5ad57

MilesCranmer force-pushed the auto-heap-size branch from d732afd to 9a5ad57 Compare December 23, 2023 09:19

Simplify maxsize calculation

bda097e

MilesCranmer force-pushed the auto-heap-size branch from a7281ae to bda097e Compare December 24, 2023 01:38

MilesCranmer added 4 commits December 24, 2023 01:44

Add docstring for heap_size_in_bytes

9bf2e52

Give better names to allPops

6dd8e7a

More type stability

27dcc04

More refactoring of main search loop

6c5075b

MilesCranmer force-pushed the auto-heap-size branch from 4461adc to 6c5075b Compare December 24, 2023 02:31

MilesCranmer added 10 commits December 24, 2023 02:42

Refactor worker assignment code

3ef3e1c

Type assertions

1a1fc33

Refactor parallelism macro

62ce580

Use capital letters for compile-time constants

0fb6f40

Remove preprocessing from main search loop

807e750

Typo

2d09e30

Fix missing import

1e20218

Typo

765e42d

Ensure same path is passed to configuration script

db42fff

Fix use of eval

0becbf4

MilesCranmer force-pushed the auto-heap-size branch from e160159 to 0becbf4 Compare December 24, 2023 04:07

MilesCranmer merged commit c878d66 into master Dec 24, 2023
34 checks passed

MilesCranmer deleted the auto-heap-size branch December 24, 2023 05:06

MilesCranmer mentioned this pull request Jan 9, 2024

Garbage collection too passive on worker processes #237

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically set heap size hint on workers #270

Automatically set heap size hint on workers #270

MilesCranmer commented Dec 22, 2023 •

edited

Loading

github-actions bot commented Dec 22, 2023 •

edited

Loading

eelregit commented Jan 4, 2024

MilesCranmer commented Jan 4, 2024

eelregit commented Jan 5, 2024

eelregit commented Jan 5, 2024

MilesCranmer commented Jan 5, 2024

eelregit commented Jan 5, 2024

MilesCranmer commented Jan 5, 2024

eelregit commented Jan 5, 2024

MilesCranmer commented Jan 5, 2024

Automatically set heap size hint on workers #270

Automatically set heap size hint on workers #270

Conversation

MilesCranmer commented Dec 22, 2023 • edited Loading

github-actions bot commented Dec 22, 2023 • edited Loading

Benchmark Results

Benchmark Plots

eelregit commented Jan 4, 2024

MilesCranmer commented Jan 4, 2024

eelregit commented Jan 5, 2024

eelregit commented Jan 5, 2024

MilesCranmer commented Jan 5, 2024

eelregit commented Jan 5, 2024

MilesCranmer commented Jan 5, 2024

eelregit commented Jan 5, 2024

MilesCranmer commented Jan 5, 2024

MilesCranmer commented Dec 22, 2023 •

edited

Loading

github-actions bot commented Dec 22, 2023 •

edited

Loading