Responsible Memory Usage #562

gavofyork · 2020-03-02T14:00:00Z

News: https://hackmd.io/pvj8fsC3QbSc7o54ZH4qJQ

There are memory leaks in the codebase. Not only do we need to fix them, we need a proper attitude to ensure that they do not happen again.

Thus we need to achieve three things:

A solid and well-documented approach to heap tracing.
To rewrite our optional allocations under an attitude of zero-cost optimisations and, in general, ensure that all dynamic allocations are bounded.
To integrate all dynamic allocations into our triad of reporting apparatus: prometheus, telemetry and informant.

Heap tracing

It should not take days of research to figure out how to determine what lines in the codebase are resulting in so much memory being allocated. We need a clear, well-maintained document on how to setup and run Substrate (and derivative clients like Polkadot) in order to get full traces of allocations.

There are several tools that may be used (heaptrack, jemalloc, Massif/Valgrind, ...) and it is not clear which are viable, and/or sensible.

Zero-cost optimisations

This is principle a little like the "pay for what you use" mantra of Zero-Cost Abstractions that Rust gives. Essentially, it should be possible to "turn off" or reduce non-mandatory memory allocations to a large degree. This would come at a potentially large performance cost, but we will assume for now that under some circumstances this is acceptable, and in any case, having the ability to tune how much memory is allowed to be allocated in each of the various dynamic subsystems is useful.

We need to introduce CLI options such that, at the extreme end, a user (or, developer) can minimise expected memory usage. In many cases, this memory usage (given by buffer sizes, cache limits, queue capacities and pre-allocation values) is hard-coded. In other cases, it's unbounded. It should, at all times, be bounded whose value is given by the user to be configured over the CLI.

It should be configurable for all dynamically allocating subsystems. Every buffer, queue and cache. Every time something it allocated and kept around for later use. Every allocation that isn't strictly required for normal operation. If a buffer or queue is automatically grown under some circumstance, it should be instantly shrunk again once the need is gone.

One example would be the 1GB that is given to rocks db's cache size - it should be possible to lower this, all the way to zero, ideally. There are, I'm sure, many examples.

Items that allocate large amounts but that are used transiently (such the memory slab for wasm execution) should be configurable so that they remain allocated only as long as they are being actively used. This will mean that every execution will need to allocate afresh, but we don't care.

The end result of this effort is to be able to lower the expected amount of memory used in stable-state (i.e. when not importing or authoring) to around 250 MB, even with dozens of peers.

Footprint reporting

In addition to this, every dynamically allocated queue, buffer and cache should be continuously reporting its size and footprint out to telemetry, informant and/or prometheus. We should obviously begin with the lower-hanging fruit - those that we suspect of being bad behavers, but eventually we should have an attitude of reporting everything.

We probably want to do this only in a certain mode of operation, to avoid performance costs under normal operation.

Related: paritytech/substrate#4679

gnunicorn · 2020-04-06T13:50:16Z

@burdges could you elaborate? I can't find that crate in our dependency tree.

darkfriend77 · 2021-02-16T16:05:29Z

what are the tools suggested to investigate the bad behaviour in v3.0.0 concerning memory?

paritytech/substrate#8117

heaptrack didn't worked out for me here

bkchr · 2023-02-28T12:43:21Z

Thought this is an open project. Who deletes valid comments in such an nontransparent manner?

No one should have this right. Solana outages happen because of such arrogant behavior, especially by mvines (systemic-flaws/dapp-platforms#5).

Just going around spamming issues is a reason to have comments removed. You can go around and you will find from time to time comments that have been removed. There are no comments being removed that are about the topic of the issue. Sorry if that is against your believes.

bkchr · 2023-02-28T13:36:40Z

At the point you commented on multiple issues, which looked like spam.

ghost · 2023-03-05T14:45:23Z

(just for the sake of completeness)

@gavofyork , this is actually an issue suitable for a bounty:

Experience Level: high
Bounty Amount: in the range of $3K to $10K
- You work things out alone (no Q&A with core-devs, they are busy with other work)
- Payout Rules: we like it, we use it, we pay you
Task: provide a mechanism to detect and prevent memory leaks/over-usage early
- Should be usable by all ParityTech rust repositories
- Can depend on payed services (an open-source option must exist though)

Place such a simple(!) bounty (without essays about processing etc., it is a standard-problem), and you should have a result soon (instead of having the issue open for further 2 years).

Triage

Needs to be removed from https://github.com/paritytech/substrate/milestone/10 (closed milestone should not have issues assigned)

Co-authored-by: sumitsnk <[email protected]>

…ch#562)

gavofyork added I8-footprint labels Mar 2, 2020

bkchr mentioned this issue Mar 2, 2020

Expose state-db memory info paritytech/substrate#5110

Merged

twittner mentioned this issue Mar 5, 2020

Add more prometheus metrics to network::Protocol. paritytech/substrate#5145

Merged

gavofyork mentioned this issue Mar 18, 2020

Prepare for Polkadot launch and Substrate 2.0 freeze paritytech/substrate#4961

Closed

24 tasks

This comment was marked as outdated.

Sign in to view

tomaka mentioned this issue May 20, 2020

Increase network buffer sizes even more paritytech/substrate#6080

Merged

paritytech deleted a comment Feb 24, 2023

This comment was marked as outdated.

Sign in to view

bkchr removed the U0-drop_everything label Feb 28, 2023

This comment was marked as outdated.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as outdated.

Sign in to view

altonen transferred this issue from paritytech/substrate Aug 24, 2023

the-right-joyce removed the I8-footprint label Aug 25, 2023

claravanstaden pushed a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023

Assets V2 (paritytech#562)

f27b762

Co-authored-by: sumitsnk <[email protected]>

helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024

Add DefaultFeePerGas associated type to pallet-base-fee (parityte…

3652409

…ch#562)

bkchr pushed a commit that referenced this issue Apr 10, 2024

Bump structopt from 0.3.20 to 0.3.21 (#562)

c155c31

This was referenced Jun 5, 2024

Update polkadot-sdk from v1.7.0 to v1.11.0 moondance-labs/tanssi#573

Closed

Update polkadot-sdk from v1.10.0 to v1.11.0 moondance-labs/tanssi#577

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Responsible Memory Usage #562

Responsible Memory Usage #562

gavofyork commented Mar 2, 2020 •

edited

Loading

This comment was marked as outdated.

gnunicorn commented Apr 6, 2020

This comment was marked as outdated.

darkfriend77 commented Feb 16, 2021 •

edited

Loading

This comment was marked as outdated.

bkchr commented Feb 28, 2023

This comment was marked as outdated.

This comment was marked as off-topic.

bkchr commented Feb 28, 2023

This comment was marked as off-topic.

This comment was marked as outdated.

ghost commented Mar 5, 2023 •

edited by ghost

Loading

Responsible Memory Usage #562

Responsible Memory Usage #562

Comments

gavofyork commented Mar 2, 2020 • edited Loading

Heap tracing

Zero-cost optimisations

Footprint reporting

This comment was marked as outdated.

gnunicorn commented Apr 6, 2020

This comment was marked as outdated.

darkfriend77 commented Feb 16, 2021 • edited Loading

This comment was marked as outdated.

bkchr commented Feb 28, 2023

This comment was marked as outdated.

This comment was marked as off-topic.

bkchr commented Feb 28, 2023

This comment was marked as off-topic.

This comment was marked as outdated.

ghost commented Mar 5, 2023 • edited by ghost Loading

Triage

gavofyork commented Mar 2, 2020 •

edited

Loading

darkfriend77 commented Feb 16, 2021 •

edited

Loading

ghost commented Mar 5, 2023 •

edited by ghost

Loading