Uptime calculation improvement and 1-year uptime #2750

louislam · 2023-02-09T10:41:08Z

Try to improve the uptime calculation performance by an aggregate table.

However the definition of uptime will be a little bit different:

	Before	After
24 hours	24 hours	24 hours (No change)
30 days	30 * 24 hours	29days + today from 00:00
1 year	/	364days + today from 00:00

Assume that the heartbeat interval is 20 seconds.

Before:
The current best/worst case should be 3 * 60 * 24 * 30 = 129,600 for 30-day uptime, which means it will sums up 129,600 numbers. This process will be triggered 3 times per minute for a monitor.

After:
The worst case of summation should be 1 year. It would be 3 * 60 * 24 + 364 = 4,684 numbers. The best case will be 29 (1-month) / 364 (1-year) numbers (at 00:00). It should be a lot faster.

Using aggregate table is actually suggested by ChatGPT, they also suggested time-series database. I do not consider this first.

chakflying · 2023-02-09T19:07:07Z

Just a wild suggestion: In theory maybe we can cache it such that each update for a particular window will only do 1 read and 1 write.

On startup, calculate the full uptime, cache values total_duration and uptime_duration
Then on each heartbeat, read the last value that is leaving the calculation window, e.g. the last beat exactly 24 hours before now
If this beat is up, we subtract its duration from both total_duration and uptime_duration, otherwise we only subtract it from total_duration
Store the latest beat, add its duration to total_duration and uptime_duration depending on the status
Calculate percentage from the updated values

Haven't tried this and don't know if there are edge cases to handle tho.

Edit: Oops I realized if the beat leaving the window has a longer duration than the current rate, it would lead to the same beat being subtracted multiple times...

I guess if we can somehow handle these 2 cases it would work?

louislam · 2023-02-10T09:03:08Z

Good point. I also recheck the current logic, it seems that it is also not handled the edge cases correctly. But I am not quite sure.

I think for this part, maybe it is good to start with writing test cases first.

uptime-kuma/server/model/monitor.js

Lines 992 to 1020 in e241728

    
                       SELECT 
        
                          -- SUM all duration, also trim off the beat out of time window 
        
                           SUM( 
        
                               CASE 
        
                                   WHEN (JULIANDAY(\`time\`) - JULIANDAY(?)) * 86400 < duration 
        
                                   THEN (JULIANDAY(\`time\`) - JULIANDAY(?)) * 86400 
        
                                   ELSE duration 
        
                               END 
        
                           ) AS total_duration, 
        
                          -- SUM all uptime duration, also trim off the beat out of time window 
        
                           SUM( 
        
                               CASE 
        
                                   WHEN (status = 1 OR status = 3) 
        
                                   THEN 
        
                                       CASE 
        
                                           WHEN (JULIANDAY(\`time\`) - JULIANDAY(?)) * 86400 < duration 
        
                                               THEN (JULIANDAY(\`time\`) - JULIANDAY(?)) * 86400 
        
                                           ELSE duration 
        
                                       END 
        
                                   END 
        
                           ) AS uptime_duration 
        
                       FROM heartbeat 
        
                       WHERE time > ? 
        
                       AND monitor_id = ? 
        
                   `, [ 
        
                       startTime, startTime, startTime, startTime, startTime, 
        
                       monitorID, 
        
                   ]);

louislam · 2023-03-27T14:36:41Z

The uptime calculation actually make me a bit headache, because when I tried to look into it, there are a lot of weird cases to be handled.

I am rethinking the time-series database option, it seems that QuestDB is quite promising, because it could sum up a large set of data with a simple sql. If it is really gaining a lot of performance, it maybe an ultimate solution for #1740 too.

But I don't know the ram usage, I will try to import 1,000,000 heartbeat records into QuestDB and test it.

Reference from QuestDB's README:

louislam · 2023-03-28T15:27:48Z

After some tests, I think QuestDB is really the way to go.

For example, I try to sum up 30-day uptime:

QuestDB Result

Oracle Cloud free instance (2cores + 1GB RAM)

The execution time: around 2ms - 7ms

SQLite Result

My notebook (11gen i7 8cores + 16GB RAM)

I don't know how to use the sqlite command to display the execution time, so I ran it on my pc.

The execution time: around 69ms - 75ms

So even though the oracle cloud instance is weak, it is still faster than sqlite on my pc.

louislam · 2023-04-06T07:18:51Z

But unfortunately, the memory usage of QuestDB is too large.

Will look into these databases later:

RedisTimeSeries
InfluxDB

mabed-fr · 2023-06-30T14:41:16Z

But unfortunately, the memory usage of QuestDB is too large.

Will look into these databases later:

RedisTimeSeries

InfluxDB

Hello

I find it strange that QuestDB is so energy intensive, because when I see their presentation

Docker hub introduction --> QuestDB is an open-source database designed to make time-series lightning fast and easy

Is it possible to test with QuestDB on external docker?

What is the final strategy? have a TSDB + relational database?(external-mariadb OR mysqlite)?

Otherwise, on my side, I find REDIS-TSDB / InfluxDB / QuestDB very good in addition to improving the graphic part

louislam · 2023-07-01T12:20:34Z

I likely stick back to SQLite/MariaDB, as the setup is easier and it won't use a lot of RAM.

I may look into the sliding window or rolling window algorithm later.

mhkarimi1383 · 2023-07-08T19:39:07Z

By using PostgreSQL we could use timescaledb as an extension and activate that for some tables only, also I love Postgres more than MySQL🙂

veitorg · 2023-07-10T08:44:26Z

Do you already have a timeframe for when this feature will be released?

CommanderStorm · 2023-07-10T12:39:16Z

Do you already have a timeframe for when this feature will be released?

See #2720 (comment)

louislam · 2023-08-19T20:02:28Z

I think most time-series database use too many memory, which is not ideal for Uptime Kuma (I hope Uptime Kuma could be relatively lightweight), so I go back to my original plan - aggregate tables.

It is going well, getting a 1-year uptime is less than 1ms.

# Subtest: Worst case
    # Subtest: get1YearUptime()
    ok 2 - get1YearUptime()
      ---
      duration_ms: 0.587246

Also, I added support for native Node.js test runner along with this pr.
https://nodejs.org/api/test.html

mhkarimi1383 · 2023-08-19T20:05:24Z

I think most time-series database use too many memory, so I go back to my original plan - aggregate tables.

It is going well, getting a 1-year uptime is less than 1ms.
# Subtest: Worst case
    # Subtest: get1YearUptime()
    ok 2 - get1YearUptime()
      ---
      duration_ms: 0.587246

The size of the database will get bigger and bigger soon I think 🤔

But query performance is great...

louislam · 2023-08-19T20:13:29Z

I think most time-series database use too many memory, so I go back to my original plan - aggregate tables.
It is going well, getting a 1-year uptime is less than 1ms.
# Subtest: Worst case
    # Subtest: get1YearUptime()
    ok 2 - get1YearUptime()
      ---
      duration_ms: 0.587246
The size of the database will get bigger and bigger soon I think 🤔

But query performance is great...

I tried to make these tables as small as possible, they are all int and float.

Also I will try to save some space by eliminating duplicate strings in #3595

mhkarimi1383 · 2023-08-19T20:18:54Z

I think only one record for each monitor in aggregated tables is enough.

Can you tell me if we need them?
If yes, we could have a more restricted retention policy on them

mhkarimi1383 · 2023-08-19T20:27:11Z

For that we could use a custom CreateOrUpdate function to check if record already exist or not?

camilonova · 2023-08-21T21:14:47Z

Table size is a decent downside if performance is great.

mhkarimi1383 · 2023-08-23T05:08:48Z

Table size is a decent downside if performance is great.

Big tables may make select queries slower...

camilonova · 2023-08-23T16:09:59Z

Big tables may make select queries slower...

Isn't the whole point of a database to have big tables?

aciducen1995 · 2023-11-07T13:48:21Z

is this WIP or already a feature? cause i cant find setting to change status page uptime to monthly etc in latest version

chakflying · 2023-11-07T14:33:37Z

There is no such feature available.

louislam added this to the 2.0.0 milestone Feb 9, 2023

louislam mentioned this pull request Feb 19, 2023

Reduce monitor minimum interval to 1 second #1740

Open

5 tasks

andreasbrett mentioned this pull request Mar 5, 2023

Show summary statisitics (Uptime, Events) on the status page #1667

Open

1 task

louislam force-pushed the improve-uptime branch from b404af6 to 7231763 Compare March 21, 2023 08:43

louislam changed the base branch from 2.0.X to master March 21, 2023 08:46

louislam modified the milestones: 2.0.0, 1.22.0 Mar 21, 2023

louislam modified the milestones: 1.22.0, 1.23.0 May 15, 2023

CommanderStorm mentioned this pull request Jun 17, 2023

How to improve performance of uptime kuma #3279

Closed

2 tasks

louislam modified the milestones: 1.23.0, Pending Jun 29, 2023

This was referenced Jun 30, 2023

2.0.0 #2720

Merged

Export each Uptime (24-hour / 30-day / 1year) for prometheus endpoint #3338

Open

louislam changed the base branch from master to 2.0.X July 1, 2023 12:21

louislam modified the milestones: Pending, 2.0.0 Jul 1, 2023

louislam mentioned this pull request Jul 8, 2023

Browser freezes on big chart response #3367

Open

2 tasks

louislam linked an issue Jul 18, 2023 that may be closed by this pull request

Yearly uptime #241

Closed

CommanderStorm mentioned this pull request Jul 18, 2023

Yearly uptime #241

Closed

louislam added 2 commits August 20, 2023 03:21

WIP

fdc4648

WIP

fd5b0e6

louislam added 4 commits August 30, 2023 02:31

WIP

d67a82a

WIP

d0c2a5e

WIP

ad0d411

WIP

7e2ba4e

louislam merged commit 076331b into 2.0.X Aug 31, 2023

CommanderStorm mentioned this pull request Sep 22, 2023

Browsers don't show host or very slow refresh #3789

Closed

2 tasks

This was referenced Nov 5, 2023

Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? #3978

Closed

Updating without losing data #3968

Closed

This was referenced Dec 3, 2023

Query often without taking up tons of database space #3280

Closed

Statistics Database Option #1361

Closed

CommanderStorm deleted the improve-uptime branch December 3, 2023 19:10

This was referenced Feb 17, 2024

v2.0.0-Release <- Read this for performance problems #4500

Closed

Shrinking databse/Blocking database opreations Give False Downtime #2470

Open

This was referenced Feb 26, 2024

Running Uptime-Kuma on Kubernetes #4530

Closed

Postgres support #959

Open

CommanderStorm mentioned this pull request Apr 17, 2024

Is the dashboard loading the entire history of heartbeat? #4684

Closed

1 task

CommanderStorm mentioned this pull request May 8, 2024

Uptime-Kuma does not respond #4747

Closed

2 tasks

CommanderStorm mentioned this pull request May 27, 2024

Can we use database encryption instead of writing everything as plaintext #4778

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uptime calculation improvement and 1-year uptime #2750

Uptime calculation improvement and 1-year uptime #2750

louislam commented Feb 9, 2023

chakflying commented Feb 9, 2023 •

edited

Loading

louislam commented Feb 10, 2023

louislam commented Mar 27, 2023 •

edited

Loading

louislam commented Mar 28, 2023

louislam commented Apr 6, 2023

mabed-fr commented Jun 30, 2023 •

edited

Loading

louislam commented Jul 1, 2023 •

edited

Loading

mhkarimi1383 commented Jul 8, 2023

veitorg commented Jul 10, 2023

CommanderStorm commented Jul 10, 2023

louislam commented Aug 19, 2023 •

edited

Loading

mhkarimi1383 commented Aug 19, 2023 •

edited

Loading

louislam commented Aug 19, 2023

mhkarimi1383 commented Aug 19, 2023

mhkarimi1383 commented Aug 19, 2023

camilonova commented Aug 21, 2023

mhkarimi1383 commented Aug 23, 2023

camilonova commented Aug 23, 2023

aciducen1995 commented Nov 7, 2023 •

edited

Loading

chakflying commented Nov 7, 2023

Uptime calculation improvement and 1-year uptime #2750

Uptime calculation improvement and 1-year uptime #2750

Conversation

louislam commented Feb 9, 2023

chakflying commented Feb 9, 2023 • edited Loading

louislam commented Feb 10, 2023

louislam commented Mar 27, 2023 • edited Loading

louislam commented Mar 28, 2023

QuestDB Result

SQLite Result

louislam commented Apr 6, 2023

mabed-fr commented Jun 30, 2023 • edited Loading

louislam commented Jul 1, 2023 • edited Loading

mhkarimi1383 commented Jul 8, 2023

veitorg commented Jul 10, 2023

CommanderStorm commented Jul 10, 2023

louislam commented Aug 19, 2023 • edited Loading

mhkarimi1383 commented Aug 19, 2023 • edited Loading

louislam commented Aug 19, 2023

mhkarimi1383 commented Aug 19, 2023

mhkarimi1383 commented Aug 19, 2023

camilonova commented Aug 21, 2023

mhkarimi1383 commented Aug 23, 2023

camilonova commented Aug 23, 2023

aciducen1995 commented Nov 7, 2023 • edited Loading

chakflying commented Nov 7, 2023

chakflying commented Feb 9, 2023 •

edited

Loading

louislam commented Mar 27, 2023 •

edited

Loading

mabed-fr commented Jun 30, 2023 •

edited

Loading

louislam commented Jul 1, 2023 •

edited

Loading

louislam commented Aug 19, 2023 •

edited

Loading

mhkarimi1383 commented Aug 19, 2023 •

edited

Loading

aciducen1995 commented Nov 7, 2023 •

edited

Loading