Node benchmarking utility #6198

urtho · 2024-12-16T12:43:39Z

go-algorand could use a standardized way to compare and benchmark the underlying hardware, ideally with a repeatable workload that closely matches a real scenario.

Users could compare their results online and make sure their hardware's performance is above the median so that network peak performance can grow with the number of new nodes.

The catchpointdump utility is the perfect first candidate for such a utility.

It is already in the repo
Can simulate a fast catchup procedure closely in a repeatable setting

This patch adds a bench command to the utility by combining both network and file restore scenarios. Download, SQLite loading and Merkle tree build can be benchmarked all in one go.
It reuses some of the dependencies that are already in go.mod to get information about the hardware - at least on the Linux platform.

Results are optionally dumped to a JSON file and ready for submission to some central benchmark repository.

Examples

Simple Network, SSD and CPU test

Known catchpoint label, sourced from a random relay/archiver :

./catchpointdump bench -r 41600000 -n mainnet.algorand.network

# Benchmark report:
# >> stage:network duration_sec:89.1 duration_min:1.5 cpu_sec:101
# >> stage:database duration_sec:648.8 duration_min:10.8 cpu_sec:507
# >> stage:digest duration_sec:385.1 duration_min:6.4 cpu_sec:550

SSD and CPU test with local file

Benchmarking the disk and CPU part only using the already downloaded ledger snapshot:

catchpointdump bench -n mainnet.algorand.network -t mainnet/snap/41600000.tar

Full benchmark with JSON report and hosted snapshot

A repeatable benchmark with a CloudFlare hosted catchpoint and report dump

./catchpointdump bench -r 41600000 -n mainnet.algorand.network -p snap.nodely.io -j report.json

Report file

Sample report.json:

{
    "report": "a193cbc7-6e6a-732b-93cf-36f0c0589864",
    "stages": [
        {
            "stage": "network",
            "duration_sec": 39,
            "cpu_time_sec": 59
        },
        {
            "stage": "database",
            "duration_sec": 795,
            "cpu_time_sec": 629
        },
        {
            "stage": "digest",
            "duration_sec": 363,
            "cpu_time_sec": 482
        }
    ],
    "host": {
        "cores": 20,
        "log_cores": 20,
        "base_mhz": 2500,
        "max_mhz": 3500,
        "cpu_name": "13th Gen Intel(R) Core(TM) i5-13500",
        "cpu_vendor": "Intel",
        "mem_mb": 64105,
        "os": "linux",
        "uuid": "c3acdb4e-3937-a9a6-2266-d80ce615ef45"
    }
}

File can be uploaded to a 3rd pty benchmark site like:

curl -X POST https://benchmarks.nodely.io/api/report -d @report.json
#{"success":true,"goto":"https://benchmarks.nodely.io/edit/a193cbc7-6e6a-732b-93cf-36f0c0589864"}

codecov · 2024-12-16T13:08:53Z

Codecov Report

Attention: Patch coverage is 0% with 168 lines in your changes missing coverage. Please review.

Project coverage is 51.74%. Comparing base (f87ae8a) to head (9436a7c).

Files with missing lines	Patch %	Lines
cmd/catchpointdump/bench.go	0.00%	96 Missing ⚠️
cmd/catchpointdump/bench_report.go	0.00%	71 Missing ⚠️
cmd/catchpointdump/commands.go	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6198      +/-   ##
==========================================
- Coverage   51.85%   51.74%   -0.11%     
==========================================
  Files         639      641       +2     
  Lines       85508    85676     +168     
==========================================
- Hits        44336    44335       -1     
- Misses      38356    38524     +168     
- Partials     2816     2817       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

urtho · 2024-12-18T00:11:35Z

Submitting a report might be fun :

algorandskiy

Good work! I left few comments.
The PR will need an update after #6177 gets merged.

algorandskiy · 2024-12-18T00:53:29Z

cmd/catchpointdump/bench_report.go

+	return fmt.Sprintf(">> stage:%s duration_sec:%.1f duration_min:%.1f cpu_sec:%d", bs.stage, bs.duration.Seconds(), bs.duration.Minutes(), bs.cpuTimeNS/1000000000)
+}
+
+func maybeGetTotalMemory() uint64 {


consider moving to util/util.go

algorandskiy · 2024-12-18T00:55:30Z

cmd/catchpointdump/bench.go

+	benchCmd.Flags().IntVarP(&round, "round", "r", 0, "Specify the round number ( i.e. 7700000 )")
+	benchCmd.Flags().StringVarP(&relayAddress, "relay", "p", "", "Relay address to use ( i.e. r-ru.algorand-mainnet.network:4160 )")
+	benchCmd.Flags().StringVarP(&catchpointFile, "tar", "t", "", "Specify the catchpoint file (either .tar or .tar.gz) to process")
+	benchCmd.Flags().StringVarP(&reportJsonPath, "report", "j", "", "Specify the file to save the Json formatted report to")


Suggested change

benchCmd.Flags().StringVarP(&reportJsonPath, "report", "j", "", "Specify the file to save the Json formatted report to")

benchCmd.Flags().StringVarP(&reportJsonPath, "report", "j", "", "Specify the file to save the JSON formatted report to")

algorandskiy · 2024-12-18T00:56:29Z

cmd/catchpointdump/bench_report.go

+}
+
+func GetCPU() int64 {
+	usage := new(syscall.Rusage)


same, move to util

algorandskiy · 2024-12-18T00:58:03Z

cmd/catchpointdump/bench.go

+		addrs = []string{relayAddress}
+	} else {
+		//append relays
+		dnsaddrs, err := tools.ReadFromSRV(context.Background(), "algobootstrap", "tcp", networkName, "", false)


"algobootstrap" probably should not be here since they not obliged to have catchpoints except few most recent ones.

feat: cachpointdump benchmark

9436a7c

algorandskiy reviewed Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node benchmarking utility #6198

Node benchmarking utility #6198

urtho commented Dec 16, 2024

codecov bot commented Dec 16, 2024

urtho commented Dec 18, 2024

algorandskiy left a comment

algorandskiy Dec 18, 2024

algorandskiy Dec 18, 2024

algorandskiy Dec 18, 2024

algorandskiy Dec 18, 2024

	benchCmd.Flags().StringVarP(&reportJsonPath, "report", "j", "", "Specify the file to save the Json formatted report to")
	benchCmd.Flags().StringVarP(&reportJsonPath, "report", "j", "", "Specify the file to save the JSON formatted report to")

Node benchmarking utility #6198

Are you sure you want to change the base?

Node benchmarking utility #6198

Conversation

urtho commented Dec 16, 2024

Examples

Simple Network, SSD and CPU test

SSD and CPU test with local file

Full benchmark with JSON report and hosted snapshot

Report file

codecov bot commented Dec 16, 2024

Codecov Report

urtho commented Dec 18, 2024

algorandskiy left a comment

Choose a reason for hiding this comment

algorandskiy Dec 18, 2024

Choose a reason for hiding this comment

algorandskiy Dec 18, 2024

Choose a reason for hiding this comment

algorandskiy Dec 18, 2024

Choose a reason for hiding this comment

algorandskiy Dec 18, 2024

Choose a reason for hiding this comment