Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node benchmarking utility #6198

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

urtho
Copy link
Contributor

@urtho urtho commented Dec 16, 2024

go-algorand could use a standardized way to compare and benchmark the underlying hardware, ideally with a repeatable workload that closely matches a real scenario.

Users could compare their results online and make sure their hardware's performance is above the median so that network peak performance can grow with the number of new nodes.

The catchpointdump utility is the perfect first candidate for such a utility.

  • It is already in the repo
  • Can simulate a fast catchup procedure closely in a repeatable setting

This patch adds a bench command to the utility by combining both network and file restore scenarios. Download, SQLite loading and Merkle tree build can be benchmarked all in one go.
It reuses some of the dependencies that are already in go.mod to get information about the hardware - at least on the Linux platform.

Results are optionally dumped to a JSON file and ready for submission to some central benchmark repository.

Examples

Simple Network, SSD and CPU test

Known catchpoint label, sourced from a random relay/archiver :

./catchpointdump bench -r 41600000 -n mainnet.algorand.network

# Benchmark report:
# >> stage:network duration_sec:89.1 duration_min:1.5 cpu_sec:101
# >> stage:database duration_sec:648.8 duration_min:10.8 cpu_sec:507
# >> stage:digest duration_sec:385.1 duration_min:6.4 cpu_sec:550

SSD and CPU test with local file

Benchmarking the disk and CPU part only using the already downloaded ledger snapshot:

catchpointdump bench -n mainnet.algorand.network -t mainnet/snap/41600000.tar 

Full benchmark with JSON report and hosted snapshot

A repeatable benchmark with a CloudFlare hosted catchpoint and report dump

./catchpointdump bench -r 41600000 -n mainnet.algorand.network -p snap.nodely.io -j report.json

Report file

Sample report.json:

{
    "report": "a193cbc7-6e6a-732b-93cf-36f0c0589864",
    "stages": [
        {
            "stage": "network",
            "duration_sec": 39,
            "cpu_time_sec": 59
        },
        {
            "stage": "database",
            "duration_sec": 795,
            "cpu_time_sec": 629
        },
        {
            "stage": "digest",
            "duration_sec": 363,
            "cpu_time_sec": 482
        }
    ],
    "host": {
        "cores": 20,
        "log_cores": 20,
        "base_mhz": 2500,
        "max_mhz": 3500,
        "cpu_name": "13th Gen Intel(R) Core(TM) i5-13500",
        "cpu_vendor": "Intel",
        "mem_mb": 64105,
        "os": "linux",
        "uuid": "c3acdb4e-3937-a9a6-2266-d80ce615ef45"
    }
}

File can be uploaded to a 3rd pty benchmark site like:

curl -X POST https://benchmarks.nodely.io/api/report -d @report.json
#{"success":true,"goto":"https://benchmarks.nodely.io/edit/a193cbc7-6e6a-732b-93cf-36f0c0589864"}

Copy link

codecov bot commented Dec 16, 2024

Codecov Report

Attention: Patch coverage is 0% with 168 lines in your changes missing coverage. Please review.

Project coverage is 51.74%. Comparing base (f87ae8a) to head (9436a7c).

Files with missing lines Patch % Lines
cmd/catchpointdump/bench.go 0.00% 96 Missing ⚠️
cmd/catchpointdump/bench_report.go 0.00% 71 Missing ⚠️
cmd/catchpointdump/commands.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6198      +/-   ##
==========================================
- Coverage   51.85%   51.74%   -0.11%     
==========================================
  Files         639      641       +2     
  Lines       85508    85676     +168     
==========================================
- Hits        44336    44335       -1     
- Misses      38356    38524     +168     
- Partials     2816     2817       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@urtho
Copy link
Contributor Author

urtho commented Dec 18, 2024

Submitting a report might be fun :

image

Copy link
Contributor

@algorandskiy algorandskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! I left few comments.
The PR will need an update after #6177 gets merged.

return fmt.Sprintf(">> stage:%s duration_sec:%.1f duration_min:%.1f cpu_sec:%d", bs.stage, bs.duration.Seconds(), bs.duration.Minutes(), bs.cpuTimeNS/1000000000)
}

func maybeGetTotalMemory() uint64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider moving to util/util.go

benchCmd.Flags().IntVarP(&round, "round", "r", 0, "Specify the round number ( i.e. 7700000 )")
benchCmd.Flags().StringVarP(&relayAddress, "relay", "p", "", "Relay address to use ( i.e. r-ru.algorand-mainnet.network:4160 )")
benchCmd.Flags().StringVarP(&catchpointFile, "tar", "t", "", "Specify the catchpoint file (either .tar or .tar.gz) to process")
benchCmd.Flags().StringVarP(&reportJsonPath, "report", "j", "", "Specify the file to save the Json formatted report to")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
benchCmd.Flags().StringVarP(&reportJsonPath, "report", "j", "", "Specify the file to save the Json formatted report to")
benchCmd.Flags().StringVarP(&reportJsonPath, "report", "j", "", "Specify the file to save the JSON formatted report to")

}

func GetCPU() int64 {
usage := new(syscall.Rusage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, move to util

addrs = []string{relayAddress}
} else {
//append relays
dnsaddrs, err := tools.ReadFromSRV(context.Background(), "algobootstrap", "tcp", networkName, "", false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"algobootstrap" probably should not be here since they not obliged to have catchpoints except few most recent ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants