Shrinking databse/Blocking database opreations Give False Downtime #2470

JacksonChen666 · 2022-12-25T12:00:24Z

⚠️ Please verify that this bug has NOT been raised before.

I checked and didn't find similar issue

🛡️ Security Policy

I agree to have read this project Security Policy

Description

One of my monitors said it was down because: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?

I was deleting a monitor, probably with a lot of data (i had history set for 365 previously. insane, i know), so deleting took a long time, which then caused the monitor being down.

The issue can also be caused by a manually triggered shrink database operation.

Have some monitors (any should be fine, just fast enough like at most 20 seconds)
Have a large database (say >512MB)
Shrink database (Settings > Monitor History > Shrink database)
Experience behavior

👀 Expected behavior

The monitors will continue as "up" and saving the correct data later (if needed).

😓 Actual Behavior

The monitors are considered "down" because a blocking database operation is happening.

🐻 Uptime-Kuma Version

1.19.0

💻 Operating System and Arch

macOS 13.1

🌐 Browser

LibreWolf 108.0.1-1

🐋 Docker Version

No response

🟩 NodeJS Version

v16.18.1

📝 Relevant log output

Dec 25 12:36:43 laptop-server npm[1618271]: 2022-12-25T11:36:43Z [RATE-LIMIT] INFO: remaining requests: 20
Dec 25 12:37:06 laptop-server npm[1618271]: 2022-12-25T11:37:06Z [MONITOR] WARN: Monitor #6 'mastodon/mcrblgng (micro.)': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
| Max retries: 12 | Retry: 1 | Retry Interval: 60 seconds | Type: keyword
Dec 25 12:37:10 laptop-server npm[1618271]: 2022-12-25T11:37:10Z [MONITOR] WARN: Monitor #7 'peertube (videos.)': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? | Max re
tries: 12 | Retry: 1 | Retry Interval: 30 seconds | Type: keyword
Dec 25 12:37:10 laptop-server npm[1618271]: 2022-12-25T11:37:10Z [MONITOR] WARN: Monitor #40 'conduit (conduit.hazmat.)': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
| Max retries: 15 | Retry: 1 | Retry Interval: 60 seconds | Type: keyword
Dec 25 12:37:11 laptop-server npm[1618271]: 2022-12-25T11:37:11Z [MONITOR] WARN: Monitor #36 'unbound DNS server (telemetry)': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) c
all? | Max retries: 2 | Retry: 1 | Retry Interval: 30 seconds | Type: keyword
Dec 25 12:37:12 laptop-server npm[1618271]: Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
Dec 25 12:37:12 laptop-server npm[1618271]:     at Client_SQLite3.acquireConnection (/home/uptime/uptime-kuma/node_modules/knex/lib/client.js:305:26)
Dec 25 12:37:12 laptop-server npm[1618271]:     at async Runner.ensureConnection (/home/uptime/uptime-kuma/node_modules/knex/lib/execution/runner.js:259:28)
Dec 25 12:37:12 laptop-server npm[1618271]:     at async Runner.run (/home/uptime/uptime-kuma/node_modules/knex/lib/execution/runner.js:30:19)
Dec 25 12:37:12 laptop-server npm[1618271]:     at async RedBeanNode.findOne (/home/uptime/uptime-kuma/node_modules/redbean-node/dist/redbean-node.js:515:19)
Dec 25 12:37:12 laptop-server npm[1618271]:     at async Function.handleStatusPageResponse (/home/uptime/uptime-kuma/server/model/status_page.js:23:26)
Dec 25 12:37:12 laptop-server npm[1618271]:     at async /home/uptime/uptime-kuma/server/routers/status-page-router.js:16:5 {
Dec 25 12:37:12 laptop-server npm[1618271]:   sql: undefined,
Dec 25 12:37:12 laptop-server npm[1618271]:   bindings: undefined
Dec 25 12:37:12 laptop-server npm[1618271]: }
Dec 25 12:37:12 laptop-server npm[1618271]:     at process.<anonymous> (/home/uptime/uptime-kuma/server/server.js:1779:13)
Dec 25 12:37:12 laptop-server npm[1618271]:     at process.emit (node:events:513:28)
Dec 25 12:37:12 laptop-server npm[1618271]:     at emit (node:internal/process/promises:140:20)
Dec 25 12:37:12 laptop-server npm[1618271]:     at processPromiseRejections (node:internal/process/promises:274:27)
Dec 25 12:37:12 laptop-server npm[1618271]:     at processTicksAndRejections (node:internal/process/task_queues:97:32)
Dec 25 12:37:13 laptop-server npm[1618271]: If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues
Dec 25 12:37:13 laptop-server npm[1618271]: 2022-12-25T11:37:13Z [MONITOR] WARN: Monitor #44 'prometheus (prometheus.)': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? |
 Max retries: 12 | Retry: 1 | Retry Interval: 60 seconds | Type: keyword
Dec 25 12:37:14 laptop-server npm[1618271]: 2022-12-25T11:37:14Z [AUTH] INFO: Successfully logged in user jackson. IP=176.241.52.131
Dec 25 12:37:15 laptop-server npm[1618271]: 2022-12-25T11:37:15Z [RATE-LIMIT] INFO: remaining requests: 20
Dec 25 12:37:19 laptop-server npm[1618271]: 2022-12-25T11:37:19Z [MONITOR] WARN: Monitor #34 'ntfy localhost': Failing: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? | Interval:
 20 seconds | Type: http | Down Count: 0 | Resend Interval: 15
Dec 25 12:37:43 laptop-server npm[1618271]: 2022-12-25T11:37:43Z [RATE-LIMIT] INFO: remaining requests: 20

The text was updated successfully, but these errors were encountered:

louislam · 2023-01-01T16:45:42Z

Due to the limitation of SQLite, it may be a unsolvable bug.

Maybe I will change to MySQL in 2.0.0.

JacksonChen666 · 2023-01-01T16:55:43Z

@louislam if you're considering supporting other databases, i would personally suggest considering postgresql and mysql and the pros/cons

i don't have mysql on my server, because nothing uses it. pretty much everything i run (peertube, mastodon, synapse (matrix homeserver)) are using postgres.

anyways, here's some already existing issues/comments:

kosssi · 2023-01-19T11:27:16Z

I'm in the same configuration as @JacksonChen666.
I really hope that Uptime Kuma 2.0 will not be limited to MySQL and will be compatible with Postgres.
But given the ideas for version 2 on your dashboard, I'm afraid that's not the case.
Did you make a decision on it @louislam and if so could you explain why?

PS: Thank you very much for the creation and maintenance of this tool, I deploy it as part of the CHATONS in France.

manuelkamp · 2023-06-16T19:04:23Z

Is there an update on that issue? Or any workaround, for me after some months of flawless usage, it is now unusable, because for every action a lot of monitors shows wrong thins error as down.

CommanderStorm · 2023-07-26T21:39:06Z

@JacksonChen666 This will be resolved in the 1.23-release given that #2800 and #3380 were merged.

harryzcy · 2023-09-15T22:06:47Z

MariaDB support is merged already. And I'm submitting Postgres support in #3748

Saibamen · 2024-02-19T21:45:09Z

I think this is already resolved in the 1.23-release given that #2800 and #3380 were merged

@CommanderStorm: No. Today I read your comment here and click on Shrink database button. After few minutes I was able to see frontend page without backend-connection bug, list of monitors was still loading 3-4 minutes (I only have 53 Monitors) and after this, our Slack was spammed by 🔴 DOWN messages for all Monitors. My Customer almost got heart attack because of this...

DB size before shrink: 860 MB
DB size after shrink: 838.4 MB
History data retention: 30 days (I changed it today from 90)
Kuma Version: 1.23.11

CommanderStorm · 2024-02-19T22:59:04Z

Yea, indeed shrinking is not the same as deleting monitors. Should have read more carefully, sorry about that

Saibamen · 2024-02-20T08:43:46Z

FYI: Today, after clear-old-data job (reminder: I was changed history retention from 90 to 30 days), my DB size changed from ~838.4 MB to 780.6 MB

And I wonder if this text is correct:

Trigger database VACUUM for SQLite. If your database is created after 1.10.0, AUTO_VACUUM is already enabled and this action is not needed.

But I checked changes in 1.10.0 (here), and await R.exec("PRAGMA auto_vacuum = FULL"); was added in PR #794 for connect() function here, so I think recreating database from scratch is not needed, because AUTO_VACUUM will be added right after starting backend (or after every Kuma version update).

Please correct me if I'm wrong

chakflying · 2024-02-20T09:14:47Z

The description is not entirely accurate, I don't remember exactly but there is some slight difference between a manually triggered VACUUM and AUTO_VACUUM. But I also don't know how to write a better description so it is how it is.

Saibamen · 2024-02-20T15:00:43Z

I've just checked database -> PRAGMA auto_vacuum; returns 2, so this is set to INCREMENTAL, no need to recreate database file as you may think after reading this description under Shrink Database button.

Saibamen · 2024-02-20T15:16:19Z

But I also don't know how to write a better description so it is how it is.

Maybe just this for now:

Trigger database VACUUM for SQLite. AUTO_VACUUM is already enabled and this action is not needed in most cases.

OK, in documentation (https://www.sqlite.org/pragma.html#pragma_auto_vacuum) we have this:

Auto-vacuum does not defragment the database nor repack individual database pages the way that the VACUUM command does. In fact, because it moves pages around within the file, auto-vacuum can actually make fragmentation worse.

IMO, we can write:

Trigger VACUUM for SQLite to defragment and repack database. Remember, AUTO_VACUUM is already enabled, but this does not defragment the database nor repack individual database pages.

or just:

Trigger database VACUUM for SQLite. AUTO_VACUUM is already enabled but this does not defragment the database nor repack individual database pages the way that the VACUUM command does.

chakflying · 2024-02-20T15:47:36Z

The last one sounds pretty good to me.

Saibamen · 2024-02-20T16:25:34Z

OK, I will create PR to change en.js for this - this night or tomorrow

Saibamen · 2024-02-20T21:19:28Z

Done in #4508

k-matti · 2024-12-26T09:10:53Z

Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? still occurs for me

JacksonChen666 added the bug Something isn't working label Dec 25, 2022

This comment was marked as spam.

Sign in to view

CommanderStorm added the area:core issues describing changes to the core of uptime kuma label Dec 7, 2023

This comment was marked as off-topic.

Sign in to view

This comment has been minimized.

Sign in to view

CommanderStorm closed this as completed Feb 17, 2024

CommanderStorm reopened this Feb 19, 2024

Saibamen mentioned this issue Feb 20, 2024

Better description for Shrink Database button #4508

Closed

7 tasks

CommanderStorm mentioned this issue Jun 3, 2024

Better description for shrink database button #4814

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shrinking databse/Blocking database opreations Give False Downtime #2470

Shrinking databse/Blocking database opreations Give False Downtime #2470

JacksonChen666 commented Dec 25, 2022

louislam commented Jan 1, 2023

JacksonChen666 commented Jan 1, 2023

This comment was marked as spam.

kosssi commented Jan 19, 2023 •

edited

Loading

manuelkamp commented Jun 16, 2023

CommanderStorm commented Jul 26, 2023

harryzcy commented Sep 15, 2023 •

edited

Loading

This comment was marked as off-topic.

This comment has been minimized.

This comment has been minimized.

Saibamen commented Feb 19, 2024 •

edited

Loading

CommanderStorm commented Feb 19, 2024

Saibamen commented Feb 20, 2024 •

edited

Loading

chakflying commented Feb 20, 2024

Saibamen commented Feb 20, 2024

Saibamen commented Feb 20, 2024

chakflying commented Feb 20, 2024

Saibamen commented Feb 20, 2024

Saibamen commented Feb 20, 2024

k-matti commented Dec 26, 2024

Shrinking databse/Blocking database opreations Give False Downtime #2470

Shrinking databse/Blocking database opreations Give False Downtime #2470

Comments

JacksonChen666 commented Dec 25, 2022

⚠️ Please verify that this bug has NOT been raised before.

🛡️ Security Policy

Description

👟 Reproduction steps

👀 Expected behavior

😓 Actual Behavior

🐻 Uptime-Kuma Version

💻 Operating System and Arch

🌐 Browser

🐋 Docker Version

🟩 NodeJS Version

📝 Relevant log output

louislam commented Jan 1, 2023

JacksonChen666 commented Jan 1, 2023

This comment was marked as spam.

kosssi commented Jan 19, 2023 • edited Loading

manuelkamp commented Jun 16, 2023

CommanderStorm commented Jul 26, 2023

harryzcy commented Sep 15, 2023 • edited Loading

This comment was marked as off-topic.

This comment has been minimized.

This comment has been minimized.

Saibamen commented Feb 19, 2024 • edited Loading

CommanderStorm commented Feb 19, 2024

Saibamen commented Feb 20, 2024 • edited Loading

chakflying commented Feb 20, 2024

Saibamen commented Feb 20, 2024

Saibamen commented Feb 20, 2024

chakflying commented Feb 20, 2024

Saibamen commented Feb 20, 2024

Saibamen commented Feb 20, 2024

k-matti commented Dec 26, 2024

kosssi commented Jan 19, 2023 •

edited

Loading

harryzcy commented Sep 15, 2023 •

edited

Loading

Saibamen commented Feb 19, 2024 •

edited

Loading

Saibamen commented Feb 20, 2024 •

edited

Loading