Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit GC.gc call does not reclaim memory on 1.11 and master #54020

Open
Liozou opened this issue Apr 10, 2024 · 10 comments
Open

Explicit GC.gc call does not reclaim memory on 1.11 and master #54020

Liozou opened this issue Apr 10, 2024 · 10 comments
Labels
GC Garbage collector regression Regression in behavior compared to a previous version

Comments

@Liozou
Copy link
Member

Liozou commented Apr 10, 2024

Here is a simple setup that makes julia memory leak until it is killed by the OS (x86 linux, glibc), on a single thread:

function bar(n)
    x = [collect(1:n) for _ in 1:(900_000_000÷n)]
    first(last(x))
end

I have 8 GiB of RAM available after executing the above. Calling bar(n) once uses around 6.7GiB for any value of n between 1000 and 100_000. That's fine, and this memory used to create and fill x should be reclaimable by GC. But calling bar(n) a second time then crashes julia, even if the two calls are separated by GC.gc(false); GC.gc(true). This occurs on v1.11.0-alpha2 and master (d183ee1), but not on 1.10.2.

liozou@liozou:~$ julia-master/usr/bin/julia -t1 --startup-file=no -E 'bar(n) = (x = [collect(1:n) for _ in 1:(900_000_000÷n)]; first(last(x))); bar(9000);'  # master: one call is fine
1

liozou@liozou:~$ julia-master/usr/bin/julia -t1 --startup-file=no -E 'bar(n) = (x = [collect(1:n) for _ in 1:(900_000_000÷n)]; first(last(x))); bar(9000); GC.gc(false); GC.gc(true); bar(9000)'  # master: two calls crash
Killed

liozou@liozou:~$ julia-1.11.0-alpha2/bin/julia -t1 --startup-file=no -E 'bar(n) = (x = [collect(1:n) for _ in 1:(900_000_000÷n)]; first(last(x))); bar(9000);'  # v1.11: one call is fine
1

liozou@liozou:~$ julia-1.11.0-alpha2/bin/julia -t1 --startup-file=no -E 'bar(n) = (x = [collect(1:n) for _ in 1:(900_000_000÷n)]; first(last(x))); bar(9000); GC.gc(false); GC.gc(true); bar(9000)'  # v1.11: two calls crash
Killed

liozou@liozou:~$ julia-1.10.2/bin/julia -t1 --startup-file=no -E 'bar(n) = (x = [collect(1:n) for _ in 1:(900_000_000÷n)]; first(last(x))); bar(9000); bar(9000)'  # v1.10 is fine even with two calls
1

(for the sake of completeness, this also used to work on v1.6.6 and v1.7.3, it crashed on v1.8.5 unless an explicit GC.gc() was called between the two bar(9000), and it crashed on v1.9.4 which might be related to #50345)

Irrespective of it being a regression without calling GC.gc, the fact that an explicit GC.gc call does not reclaim the free memory looks like a bug to me. See also #51818, but here function bar exited before the call to GC.gc.

@Liozou Liozou added regression Regression in behavior compared to a previous version GC Garbage collector labels Apr 10, 2024
@giordano giordano added this to the 1.11 milestone Apr 10, 2024
@vtjnash
Copy link
Member

vtjnash commented Apr 10, 2024

Explicit GC doesn't reset the heuristics, so even if we collect a lot of the memory at once (which will start to reduce the thresholds somewhat based on a discounted moving average), we will continue to assume you have enough swap (probably 2x memory used to be enough, but with SSD speeds these days now, it might need to be a lot more) available to handle the transient memory load without needing to reduce overall performance to stay within the memory budget.

@vtjnash
Copy link
Member

vtjnash commented Apr 10, 2024

What does /usr/bin/time -v report for your maxrss values?

@Liozou
Copy link
Member Author

Liozou commented Apr 10, 2024

I have no swap, could that be the reason?

/usr/bin/time -v reports the following (I shortened to relevant info), first is one call to bar, second is two calls, third is with explicit GC:

liozou@lzs510u:~$ /usr/bin/time -v julia -t1 --startup-file=no -E 'bar(n) = (x = [collect(1:n) for _ in 1:(900_000_000÷n)]; first(last(x))); bar(9000);'
1
	Command being timed: "julia -t1 --startup-file=no -E bar(n) = (x = [collect(1:n) for _ in 1:(900_000_000÷n)]; first(last(x))); bar(9000);"
	Maximum resident set size (kbytes): 7249696
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 4
	Minor (reclaiming a frame) page faults: 1785192
	Voluntary context switches: 14
	Involuntary context switches: 50

liozou@lzs510u:~$ /usr/bin/time -v julia -t1 --startup-file=no -E 'bar(n) = (x = [collect(1:n) for _ in 1:(900_000_000÷n)]; first(last(x))); bar(9000); bar(9000)'
Command terminated by signal 9
	Command being timed: "julia -t1 --startup-file=no -E bar(n) = (x = [collect(1:n) for _ in 1:(900_000_000÷n)]; first(last(x))); bar(9000); bar(9000)"
	Maximum resident set size (kbytes): 10161020
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 71
	Minor (reclaiming a frame) page faults: 2543658
	Voluntary context switches: 427
	Involuntary context switches: 3251

liozou@lzs510u:~$ /usr/bin/time -v julia -t1 --startup-file=no -E 'bar(n) = (x = [collect(1:n) for _ in 1:(900_000_000÷n)]; first(last(x))); bar(9000); GC.gc(false); GC.gc(true); bar(9000)'
Command terminated by signal 9
	Command being timed: "julia -t1 --startup-file=no -E bar(n) = (x = [collect(1:n) for _ in 1:(900_000_000÷n)]; first(last(x))); bar(9000); GC.gc(false); GC.gc(true); bar(9000)"
	Maximum resident set size (kbytes): 10193724
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 283
	Minor (reclaiming a frame) page faults: 3412660
	Voluntary context switches: 434
	Involuntary context switches: 2340

@vtjnash
Copy link
Member

vtjnash commented Apr 10, 2024

Ah yeah, if you don't have swap, then your system is likely to run a lot slower and crash more

@Liozou
Copy link
Member Author

Liozou commented Apr 10, 2024

Ah, well, shame. But in any case, regarding your initial comment:

Explicit GC doesn't reset the heuristics, so even if we collect a lot of the memory at once [...]

The issue here is precisely that no memory is collected at all whereas it should when explicitly calling GC.gc(), right?

@nsajko
Copy link
Contributor

nsajko commented Apr 15, 2024

That's fine, and this memory used to create and fill x should be reclaimable by GC. But calling bar(n) a second time then crashes julia

Definitely seems like a bug.

even if the two calls are separated by GC.gc(false); GC.gc(true)

I'm not sure if this is a bug. It's always been my understanding that a single gc call is not guaranteed to free anything, you need multiple calls to free things. Indeed, I just tried this in the repl, and calling gc in a loop between invocations of bar prevents OOM.

Ah yeah, if you don't have swap, then your system is likely to run a lot slower and crash more

This is off-topic and also quite rude??!

@Liozou
Copy link
Member Author

Liozou commented Apr 15, 2024

Ah yeah, if you don't have swap, then your system is likely to run a lot slower and crash more

This is off-topic and also quite rude??!

No no, this is relevant: if julia expects the user to have swap, then the entire issue could be moot because then there is no bug in julia, but a system incompatibility instead. As far as I know, using julia does not require having swap space though, having it is simply better for GC heuristics ; if it was needed it should be in the documentation, or even in the requirements in the README.
I also don't feel it's rude at all, I don't really see why it would be? I wasn't aware that having no swap on my system could lead to crash or slowness so it's actually valuable information for me.

@gbaraldi
Copy link
Member

gbaraldi commented Apr 23, 2024

I don't consider this an issue, this workload makes it so we need to have about 6.8GB in memory at once, given that GCs are expected to have some space overhead it will use more than that.
The --heap-size-hint flag is designed to keep julia under a specified amount.
In my machine if I run several of these calls without heap-size-hint julia will use up to 16GB of ram, but if I set a heapsizehint of 8GB julia only uses 8.7 (the flag isn't a strict amount it just does it's best).

The issue of running without swap is that it doesn't allow for any overhead and makes the OS keep everything in memory. If you don't want to use swap then I would recommend setting the flag to a safe amount.

@KristofferC KristofferC removed this from the 1.11 milestone Apr 23, 2024
@gbaraldi
Copy link
Member

And btw @Liozou if you want to see how much the GC is freeing GC.enable_logging() whill show it.

@Liozou
Copy link
Member Author

Liozou commented Apr 23, 2024

I'll test this and let you know later when I can get back to my computer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

6 participants