-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: possible memory leak #519
Comments
If it can help I've generate a pprof profile of the node.
pprof.chain-maind.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz |
Yes huge leak/ !!! Lock down.
…On Sat, May 1, 2021, 18:05 Galadrin ***@***.***> wrote:
If it can help I've generate a pprof profile of the node.
sudo -u chain /usr/local/go/bin/go tool pprof -png /srv/chain/chain-maind 'http://127.0.0.1:6060/debug/pprof/allocs' > /tmp/allocs.png
Fetching profile over HTTP from http://127.0.0.1:6060/debug/pprof/allocs
Saved profile in /srv/chain/pprof/pprof.chain-maind.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
pprof.chain-maind.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
<https://github.com/crypto-org-chain/chain-main/files/6410355/pprof.chain-maind.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz>
[image: allocs]
<https://user-images.githubusercontent.com/5019960/116796077-0d707b80-aada-11eb-8d58-9d7e832e5468.png>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#519 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASVNFNM3A7LJH2CRKBRXJ3TTLR3LNANCNFSM433JUQAQ>
.
|
@Galadrin thanks!! big help! |
@tomtau after went through the code and the chain-main err-log. It highly likes the same levelDB issues syndtr/goleveldb#290 and syndtr/goleveldb#353 (less likely in this one) someone in the discussion might provide a useful patch syndtr/goleveldb#290 (comment) Do you think we should patch the levelDB and test it a bit? the LevelDB repo looks like slow maintenance. If we can confirm the patch can solve the issue. Maybe we suggest the Tendermint Core switch the LevelDB if any better implementation or they should do a patch on it? |
I'm using the ClevelDB (but didn't see any difference with perfs.) |
@JayT106 it'd be better to confirm the root cause -- @Galadrin mentions they recompiled the binary for cleveldb, so it shouldn't be affected by the goleveldb issues (unless it was compiled, such that goleveldb is still included for some parts). |
@tomtau Sure, the IAVL issue looks more like the performance issue to impact the db layer. but perhaps it also relates to memory leaking issue.
|
Yes, it's from the same machine, but not the same time.
|
For "cleveldb", afaik one needs to recompile the binary as well: |
sure, it's localy rebuilt with |
Aside from adding |
@Galadrin Is it okay to confirm your node can execute with |
The library is linked and loaded in the proc maps.
How can I check my node execute with
ok, so I need to add |
If the executing log didn't show the db backend info. Maybe you can do the kill PID to retrieve the log (like you did in the Additional context) |
ok, I have rebuil with the following changes :
then run again on the node.
|
After 7hours uptime, the swap usage continue to grow up. do you want a new pprof or stack log ? |
@Galadrin Thanks for the report. Is it okay to observe it longer? And perhaps gather the process RAM memory usage info, If we can see a huge memory leak. We can expect the process will be OOM in the long term. |
I'm OK to swap off. The swap is increasing but the ram usage is stable. After 14h uptime, I'm using 4GB (50%) of RAM and 688MB swap. |
our monitoring start warn about low memory on the server. I'm not sure to have enough memory available for a pperf snapshot. |
@Galadrin Thanks! could you 1. keep the current data you collected 2. restart the system with |
@JayT106 what do you mean by keeping the data ? |
will add |
do pperf snapshot, the snapshot shouldn't use too much RAM to dump the info to the file. |
|
@JayT106 The memory limitation do is job :) |
@Galadrin Do you mean the process did GC to free the RAM? Or the process OOM? If it's OOM, any dump left we can analyze it. Thanks. |
Yes, OOM-killer trigger
|
The profiling result showed the different PATH in |
@Galadrin from the RAM monitoring data in our seed node, we can see the RAM usage from 4.7G (Apr 26) increased to around 7.3G (now). But if the node is a sentry node. The RAM usage is around 2GB. |
it's a sentry node, but we enable seed and snapshot. |
After the investigation by @JayT106 and @allthatjazzleo , they didn't manage to reproduce the issue and didn't notice any memory leak-like behaviour -- the only thing noticed was the increased memory usage due to IAVL. If this issue reappears or there's a reliable way to reproduce it, feel free to reopen the issue. |
We also meet this problem when key,value size is big, we found there is two hard code cache size in iavl/store.go and cache/cache.go. Changing cache size will be help. |
Thanks for sharing, we found out the recent Cronos is using |
Describe the bug
A sentry used with seed enable and snapshot enable use a lot of swap memory over time.
Swap is used when mem pages are not modified since a long time.
To Reproduce
Expected behavior
no or limited swap usage
Screenshots
Additional context
issue detected on the mainnet
Error log attached is
kill -6 PID
after 22h runtimechain-maind-errlog.txt
The text was updated successfully, but these errors were encountered: