Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically set heap size hint on workers #270

Merged
merged 34 commits into from
Dec 24, 2023
Merged

Conversation

MilesCranmer
Copy link
Owner

@MilesCranmer MilesCranmer commented Dec 22, 2023

Changes:

  • Introduce heap_size_hint_in_bytes parameter for setting up distributed workers.
  • Change from using print to using @info when printing out updates to the search process.
  • Various refactoring to break down the main methods into small pieces.

Hopefully should fix issues like this:

MilesCranmer/PySR#490 (@eelregit and @paulomontero)

which is due to poor garbage collection in Julia when using distributed processing:

JuliaLang/julia#50673

The workaround I have added here is basically to hint to every process what it should choose as its memory limit before aggressive garbage collection. I automatically select that hint based on total memory divided by number of processes. But the user can pass a per-worker hint of their own (such as for multi-node, where the single-node memory is much less than total memory across nodes).

This seems to work well for preventing OOM errors when I tried it.

TODO:

  • Document in docstring
  • Add option to SRRegressor
  • Unittest
    • (Unclear how to test for this, so skipping a specific test)

This also removes the explicit precompilation script as I think it didn't help much.

Copy link
Contributor

github-actions bot commented Dec 22, 2023

Benchmark Results

master 0becbf4... t[master]/t[0becbf4...]
search/multithreading 27.6 ± 3.9 s 27.9 ± 3.5 s 0.99
search/serial 31.5 ± 0.62 s 31.6 ± 0.67 s 0.998
utils/best_of_sample 0.791 ± 0.22 μs 0.772 ± 0.24 μs 1.02
utils/check_constraints_x10 12.7 ± 3.2 μs 12.6 ± 3.2 μs 1
utils/compute_complexity_x10/Float64 2.31 ± 0.11 μs 2.33 ± 0.12 μs 0.988
utils/compute_complexity_x10/Int64 2.29 ± 0.11 μs 2.28 ± 0.12 μs 1
utils/compute_complexity_x10/nothing 1.51 ± 0.11 μs 1.55 ± 0.11 μs 0.974
utils/optimize_constants_x10 29 ± 6.5 ms 29.4 ± 6.7 ms 0.985
time_to_load 1.95 ± 0.003 s 2.04 ± 0.015 s 0.959

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@MilesCranmer MilesCranmer merged commit c878d66 into master Dec 24, 2023
34 checks passed
@MilesCranmer MilesCranmer deleted the auto-heap-size branch December 24, 2023 05:06
MilesCranmer added a commit that referenced this pull request Dec 25, 2023
[Diff since v0.22.5](v0.22.5...v0.23.0)

**Merged pull requests:**
- Automatically set heap size hint on workers (#270) (@MilesCranmer)

**Closed issues:**
- How do I set up a basis function consisting of three different inputs x, y, z? (#268)
@eelregit
Copy link

eelregit commented Jan 4, 2024

Thanks Miles!

I am running the news versions on 4 rusty icelake nodes, and find that the mem usage of each node is only 56GB/1TB. This is surprising because I have increased the max_size from 25~35 to 64, but with the new versions the memory usage decreases. I wonder if this is expected?

@MilesCranmer
Copy link
Owner Author

Is it slower at all?

It doesn’t actually need the memory. But basically letting it use more memory can make the garbage collection more efficient as it can do it in batches. But if it really doesn’t need all of the RAM it’s not a big issue.

@eelregit
Copy link

eelregit commented Jan 5, 2024

Is it slower at all?

Previously: 1 node, max_size is 25 or 35, about 1e5 expr/sec
Now: 4 nodes, max_size is 64, about 5e4 expr/sec
Order-of-magnitude-wise, if the time complexity ~ $\mathcal{O}(\mathrm{maxsize}^2)$,
it doesn't seem to be slower. Maybe running with multi nodes hurt the speed a bit?

BTW, I have a simple script to plot the log-log pareto front and its lower convex hull.
I can add it as PySRRegressor.pareto_plot if you think it's useful.

hall_of_fame_2024-01-03_102321.228.pdf

The convex hull shows the tradeoffs in power law forms: $\mathrm{loss} \cdot \mathrm{complexity}^\alpha = \mathrm{const}$.
And the typical behavior of the pareto curves is like some trajectories of balls bouncing down the convex hull,
the model right before or at the ball landing is the economical ones for each bounce.
The range of complexity is naturally divided by those bounces.

I think it'd be cool to have tensorboard tracking these figures over time.
And equation too, but I don't know how to add the latex table to tensorboard.

@eelregit
Copy link

eelregit commented Jan 5, 2024

It doesn’t actually need the memory. But basically letting it use more memory can make the garbage collection more efficient as it can do it in batches. But if it really doesn’t need all of the RAM it’s not a big issue.

Forgot to ask: I guess that means julia garbage collection starts to work quite hard even with a little nudge?

@MilesCranmer
Copy link
Owner Author

Previously: 1 node, max_size is 25 or 35, about 1e5 expr/sec
Now: 4 nodes, max_size is 64, about 5e4 expr/sec

From the settings you described, this scaling sounds fine – I don't think there's any slowdown from the change in this PR.

I'm surprised you are only getting 5e4 expr/sec though. Typically on 4 rusty nodes I can get 5e6+ expr/sec for maxsize 50 and 100 datapoints. How many datapoints are you running? Is the CPU load reasonable on all nodes?

Do you want to open an issue here or a discussion thread on the PySR forums to debug this? https://github.com/MilesCranmer/PySR/discussions

@eelregit
Copy link

eelregit commented Jan 5, 2024

Opened an discussion: MilesCranmer/PySR#518

And equation too, but I don't know how to add the latex table to tensorboard.

And tensorboard can actually log text: https://www.tensorflow.org/tensorboard/text_summaries
So if you want I can add that as an option, to accompany the huge progress log from slurm with a better web interface.

@MilesCranmer
Copy link
Owner Author

That’s a great idea! I see there’s a Julia plugin as well: https://github.com/JuliaLogging/TensorBoardLogger.jl

@eelregit
Copy link

eelregit commented Jan 5, 2024

Ah right, this has to be done in the Julia loop. I see that matplotlib can be called there as well https://github.com/JuliaPy/PyPlot.jl
But maybe this is too circular? And I don't really know much Julia 😅

@MilesCranmer
Copy link
Owner Author

Actually it's okay because they are already talking to eachother, so you can totally call Python stuff from the Julia loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants