Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage on 1-node CockroachDB #25346

Closed
gomezjdaniel opened this issue May 7, 2018 · 9 comments
Closed

High CPU usage on 1-node CockroachDB #25346

gomezjdaniel opened this issue May 7, 2018 · 9 comments
Assignees
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior. O-community Originated from the community

Comments

@gomezjdaniel
Copy link

gomezjdaniel commented May 7, 2018

Hello,

BUG REPORT

  1. Please supply the header (i.e. the first few lines) of your most recent
    log file for each node in your cluster.
I180507 07:24:56.878569 1 util/log/clog.go:1104  [config] file created at: 2018/05/07 07:24:56
I180507 07:24:56.878569 1 util/log/clog.go:1104  [config] running on machine: 964ac46d307a
I180507 07:24:56.878569 1 util/log/clog.go:1104  [config] binary: CockroachDB CCL v2.0.1 (x86_64-unknown-linux-gnu, built 2018/04/23 18:39:21, go1.10)
I180507 07:24:56.878569 1 util/log/clog.go:1104  [config] arguments: [/cockroach/cockroach start --http-port 26256 --insecure]

debug.zip
profile.zip

  1. Please describe the issue you observed:
  • What did you do?

I had been long time developing a project (in Go) that uses CockroachDB so that everytime that I run tests the database is dropped, tables created again and queries executed.

  • What did you expect to see?

I expect them to be ran smooth and consistently in time

  • What did you see instead?

For no apparent reason sometimes the same package testcases are executed in X seconds and minutes later when running them again they last twice, thrice or much more while being tests exactly same (recreating the database + schema is what apparently things gets slower),

while as top output shows CockroachDB is taking a lot CPU, both, while running tests but also when is idle, i.e. no test or process that's using cockroach is running at the moment.

cockroach cpu

@tbg
Copy link
Member

tbg commented May 7, 2018

Cc @jordanlewis for triage. Maybe cache hit for @andreimatei.

@tbg tbg closed this as completed May 7, 2018
@tbg tbg reopened this May 7, 2018
@gomezjdaniel
Copy link
Author

Maybe related with #24762 and #20753

@nvanbenschoten
Copy link
Member

@gomezjdaniel thank you for including the CPU profile in the initial bug report!

Yes, I think you are correct that it's related to both of those issues. As discussed, CockroachDB does not currently handle large numbers of tables (O(1k)) very well at the moment. Unfortunately, because CockroachDB uses an MVCC scheme under the hood, even tables that have been deleted will stick around for 24 hours by default before being removed completely and no longer counting towards this limit. This is necessary to allow historical queries on these now-deleted tables, but exacerbates the issue here.

One way to help get around this is to drop your garbage collection TTL, as explained here in the docs. This isn't a great solution, because it means that you won't be able to perform historical queries as far back in time, but it will allow CockroachDB to clean up these deleted tables earlier.

This is a problem that we are aware of and we plan to make progress on for our 2.1 release cycle. I'm going to close this because it is a duplicate of the other two issues listed, but I appreciate you filing so that we know that this is affecting other users.

@gomezjdaniel
Copy link
Author

For the record,

I have entirely deleted my cockroach-data/ folder, set gc.ttlseconds to 600 in the .default replication zone and still get 200% CPU peaks while running tests :(

@nvanbenschoten
Copy link
Member

@gomezjdaniel do you have an estimate on the number of tables your test suite is creating and deleting?

@gomezjdaniel
Copy link
Author

Yes,

Right now 26 tables are created, then, these are deleted with DROP DATABASE XXX CASCADE

I though you could access to the databases in the debug.zip file. If so, you can see the schema in databases is_development_tests or is_store_tests

@vivekmenezes vivekmenezes reopened this May 17, 2018
@vivekmenezes vivekmenezes self-assigned this May 17, 2018
@vivekmenezes vivekmenezes added A-bulkio-schema-changes O-community Originated from the community C-performance Perf of queries or internals. Solution not expected to change functional behavior. labels May 17, 2018
@vivekmenezes
Copy link
Contributor

@gomezjdaniel this is probably the same as #24762 . While you are creating only 26 tables, the very fact that you're doing it in a loop means that there are more than 26 tables needing to be GC-ed later which release 2.0 is bad at. We're going to be publishing a new alpha release in early June with the fix; is this something you could use? or do you need the fix to be backported to the 2.0 release. Thanks!

@gomezjdaniel
Copy link
Author

We'd appreciate if you guys backport the fix to the 2.0 release 👍

@vivekmenezes
Copy link
Contributor

@gomezjdaniel the fix will be in our next 2.0 release; mid June. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior. O-community Originated from the community
Projects
None yet
Development

No branches or pull requests

4 participants