Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO resource exhaustion due to LevelDB compaction #10987

Closed
4 tasks
mdyring opened this issue Jan 20, 2022 · 10 comments
Closed
4 tasks

IO resource exhaustion due to LevelDB compaction #10987

mdyring opened this issue Jan 20, 2022 · 10 comments
Labels
T: Performance Performance improvements

Comments

@mdyring
Copy link

mdyring commented Jan 20, 2022

Summary of Bug

It seems like LevelDB compaction is still a major source of IOPS and resource exhaustion nodes. Related to #2131.

I've taken the liberty of classifying this as a bug as it is severely affecting performance around Epoch on Osmosis.

I still believe the major reason for this is that primary keys are changed constantly in IAVL resulting in reorganisation at the LevelDB layer. See #2131 (comment).

This is the output of a "tail -F application.db/LOG" on Osmosis inside the data directory. Compaction is neverending.

14:27:50.398490 table@remove removed @578813422
14:27:50.398656 table@remove removed @578813423
14:27:50.398812 table@remove removed @578813424
14:27:50.423933 table@build created L4@578936704 N·16730 S·2MiB "s/k..\x9fͧ,v15278708771":"s/k..\xbe\xfaM,v14154351544"
14:27:50.451567 table@build created L4@578936705 N·16790 S·2MiB "s/k..J\x04\x1c,v14470627550":"s/k..\x18>%,v13869343316"
14:27:50.493837 table@build created L4@578936706 N·16854 S·2MiB "s/k..\r\b\xa2,v13806765219":"s/k..\x94\x9e\x90,v13792776153"
14:27:50.524111 table@build created L4@578936707 N·16734 S·2MiB "s/k..\x82\x9d\xc2,d14646985255":"s/k..\xeb\xf4\x99,v13830988726"
14:27:50.560716 table@build created L4@578936708 N·16726 S·2MiB "s/k..M<\x89,d15457203810":"s/k..\x9c{\x16,v14040882491"
14:27:50.588833 table@build created L4@578936709 N·16723 S·2MiB "s/k..\xaa5j,v15136954798":"s/k..b\x1f\xd8,v13912212433"
14:27:50.618200 table@build created L4@578936710 N·16656 S·2MiB "s/k..\xa6`N,v14269629201":"s/k..G\xf8D,v14965908327"
14:27:50.659704 table@build created L4@578936711 N·16830 S·2MiB "s/k..\xcf\xd8s,d15219954542":"s/k..\xa3\x1e\x1a,v15454875630"
14:27:50.689781 table@build created L4@578936712 N·16720 S·2MiB "s/k..\x18\xd4\xf0,d15403026974":"s/k..\xe5\xdcU,d14744029808"
14:27:50.718397 table@build created L4@578936713 N·16738 S·2MiB "s/k..gK\xa9,v14337882857":"s/k..\x92\xec\xa1,v15082124250"
14:27:50.737218 table@build created L4@578936714 N·12653 S·1MiB "s/k..\xaf\xa5\xed,v13945212050":"s/k..x\xb8\x90,v15543504764"
14:27:50.748085 version@stat F·[2 63 524 6866 61137 111106] S·316GiB[4MiB 98MiB 1000MiB 9GiB 97GiB 208GiB] Sc·[0.50 0.99 1.00 1.00 1.00 0.21]
14:27:50.748411 table@compaction committed F-1 S-41KiB Ke·0 D·139 T·356.557019ms
14:27:50.748480 table@compaction L3·1 -> L4·11 S·20MiB Q·15663152286
14:27:50.750988 table@remove removed @578929937
14:27:50.751155 table@remove removed @578813425
14:27:50.751324 table@remove removed @578813426
14:27:50.752215 table@remove removed @578813427
14:27:50.753175 table@remove removed @578813428
14:27:50.754112 table@remove removed @578813429
14:27:50.754281 table@remove removed @578813430
14:27:50.754418 table@remove removed @578813431
14:27:50.754533 table@remove removed @578813432
14:27:50.754656 table@remove removed @578813433
14:27:50.754843 table@remove removed @578813434
14:27:50.755169 table@remove removed @578813435
14:27:50.782922 table@build created L4@578936715 N·16757 S·2MiB "s/k..\xd7a\u007f,v14773042553":"s/k..\x9fdž,v14895895544"
14:27:50.824974 table@build created L4@578936716 N·16691 S·2MiB "s/k..\xbfG\xa7,d14854839068":"s/k..\xce*\x14,v13844684335"
14:27:50.853902 table@build created L4@578936717 N·16659 S·2MiB "s/k..\xb2\x06\x9a,v15178258742":"s/k..\x1c(L,v15638391037"
14:27:50.883319 table@build created L4@578936718 N·16711 S·2MiB "s/k..jϨ,v15081971365":"s/k..\xd6#c,v14395926200"
14:27:50.921047 table@build created L4@578936719 N·16779 S·2MiB "s/k..AFT,d15547278614":"s/k..)zF,v13970205403"
14:27:50.962081 table@build created L4@578936720 N·16607 S·2MiB "s/k..U\xfa:,d14976839257":"s/k..\x93\x13\xf3,v15220886009"
14:27:50.994548 table@build created L4@578936721 N·16667 S·2MiB "s/k..My\x83,v14149929774":"s/k..!\x05{,v14389380771"
14:27:51.023897 table@build created L4@578936722 N·16606 S·2MiB "s/k..\x8en\x12,v13788835958":"s/k..˥\xac,v13835114743"
14:27:51.052407 table@build created L4@578936723 N·16731 S·2MiB "s/k..\xa2%@,v14599773418":"s/k..o\u007f\xd0,v15452226796"
14:27:51.081226 table@build created L4@578936724 N·16663 S·2MiB "s/k..\xc5|\xd2,v15345762999":"s/k..=\x8e\x8c,v14770478362"
14:27:51.087563 table@build created L4@578936725 N·3561 S·438KiB "s/k..\x9e\"%,v14734243923":"s/k..MG\xa3,v14936293158"
14:27:51.098908 version@stat F·[2 63 524 6865 61137 111106] S·316GiB[4MiB 98MiB 1000MiB 9GiB 97GiB 208GiB] Sc·[0.50 0.99 1.00 1.00 1.00 0.21]
14:27:51.099212 table@compaction committed F-1 S-7KiB Ke·0 D·123 T·350.693837ms
14:27:51.099282 table@compaction L3·1 -> L4·11 S·21MiB Q·15663152286
14:27:51.100698 table@remove removed @578929938
14:27:51.100878 table@remove removed @578813436
14:27:51.101001 table@remove removed @578813437
14:27:51.101684 table@remove removed @578813438
14:27:51.102007 table@remove removed @578813439
14:27:51.102459 table@remove removed @578813440
14:27:51.102893 table@remove removed @578813441
14:27:51.103765 table@remove removed @578813442
14:27:51.104239 table@remove removed @578813443
14:27:51.106133 table@remove removed @578813444
14:27:51.106567 table@remove removed @578813445
14:27:51.107025 table@remove removed @578813446

(video here to see just how bad it is)
https://twitter.com/mdyring/status/1484169718378287107?s=20

Version

All networks / versions. Dates back to early Cosmos-SDK days.

Steps to Reproduce

Run any Cosmos-SDK enabled chain. Easily observable on the more busy ones such as Osmosis or Cosmos Hub.

Use tail -F application.db/LOG in the data directory.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@tac0turtle
Copy link
Member

I have a branch that adds somethings that could fix this, I can spend sometime on getting it finished.

@tac0turtle tac0turtle added the T: Performance Performance improvements label Jan 20, 2022
@faddat
Copy link
Contributor

faddat commented Jan 20, 2022

@marbar3778 do the changes in your branch affect Rocksdb also, or are they goleveldb specific?

@tac0turtle
Copy link
Member

they are goleveldb optimisations. I would need to read more into rocks to better understand how it works

@tac0turtle tac0turtle moved this from Icebox to Backlog in Cosmos SDK Maintenance Mar 11, 2022
@GrapeBaBa
Copy link

@marbar3778 What is the progress for this issue? We came across this issue too.

@tac0turtle
Copy link
Member

tac0turtle commented Sep 25, 2022

we tested change and got way less compaction but the down side is possible increase in disk size, but it was marginal. the changes are in cosmos-db which we will migrate to post 0.47. in 1-1.5 months

This along side key format changes in iavl will produce way less compaction

@yihuang
Copy link
Collaborator

yihuang commented Sep 26, 2022

we tested change and got way less compaction but the down side is possible increase in disk size, but it was marginal. the changes are in cosmos-db which we will migrate to post 0.47. in 1-1.5 months

Interesting, may I know what the change is?

This along side key format changes in iavl will produce way less compaction

this one right? this should help the compaction to my understanding.

@tac0turtle
Copy link
Member

Interesting, may I know what the change is?

disabling seek compaction

@elias-orijtech
Copy link
Contributor

What's the status of this issue, and work needed to close it? The mentioned issue was closed without merging.

@tac0turtle
Copy link
Member

With the upcoming iavl changes we are hoping this will be more efficient. We plan on testing it this coming week

@tac0turtle
Copy link
Member

we have been profiling the new iavl changes on IAVL and they show a significant reduction in compaction, closing this issue. Look for the Eden release in order to use these changes

@github-project-automation github-project-automation bot moved this from Backlog to Done in Cosmos SDK Maintenance May 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T: Performance Performance improvements
Projects
No open projects
Development

No branches or pull requests

6 participants