Continuous Forge Tests - Stable #1625

Workflow file for this run

.github/workflows/forge-stable.yaml at 0da1d72

	# Continuously run stable forge tests against the latest main branch.
	name: Continuous Forge Tests - Stable

	# We have various Forge Stable tests here, that test out different situations and workloads.
	#
	# Dashboard showing historical results: https://grafana.aptoslabs.com/d/bdnt45ggsg000f/forge-stable-performance?orgId=1

	# Tests are named based on how they are set up, some of the common flavors are:
	# * "realistic-env" - tests with "realistic-env" in their name try to have network and hardware environemnt
	# be more realisistic. They use "wrap_with_realistic_env", which sets:
	# * MultiRegionNetworkEmulationTest which splits nodes into 4 "regions", which have different
	# x-region and in-region latencies and reliability rates
	# * CpuChaosTest which tries to make nodes have heterogenous hardware, by loading a few cores fully
	# on a few nodes. But this is not too helpful, as block execution time variance is minimal
	# (as we generally have a few idle cores, and real variance mostly comes from variance in cpu speed instead)
	# * sweep - means running a multiple tests within a single test, by having everything the same, except for one
	# thing - i.e. the thing we sweep over. There are two main dimensions we "sweep" over:
	# * load sweep - this generally uses const tps workload, and varies the load across the tests (i.e. 10 vs 100 vs 1000 TPS)
	# * workload sweep - this varies the transaction type being submitted, trying to test out how the system behaves
	# when different part of the system are stressed (i.e. low vs high output sizes, good vs bad gas calibration, parallel vs sequential, etc)
	# * graceful - tests where we are overloading the system - i.e. submitting more transactions than we expect system to handle,
	# and seeing how it behaves. overall e2e latency is then high, but we can test that only validator -> block proposal has increased.
	# additionally, we generally add a small TPS high-fee traffic in these tests, to confirm it is unaffected by the high load.
	# * changing-working-quorum - tests where we intentionally make nodes unreachable (cut their network), and bring them back,
	# and go to cut network on next set of nodes - requiring state-sync to catch up, consensus to work with different set of
	# nodes being required to form consensus. During each iteration, we test that enough progress was made.
	#
	# Main success criteria used across the tests are:
	# * throughput and expiration/rejection rate
	# * latency (avg / p50 / p90 / p99 )
	# * latency breakdown across the components - currently within a validator alone:
	# batch->pos->proposal->ordered->committed
	# TODO: we should add other stages - before batch and after committed to success criteria, to be able to track them better
	# * chain progress - checking longest pause between the blocks (both in num failed runs, and in absolute time) across the run
	# * catchup - that all nodes can go over the same version at the end of the load (i.e. catching if any individual validator got stuck)
	# * no restarts - fails if any node has restarted within the test
	# * system metrics - checks for CPU and RAM utilization during the test
	# TODO: we should add network bandwidth and disk iops as system metrics checks as well
	#
	# You can find more details about the individual tests below.

	permissions:
	issues: write
	pull-requests: write
	contents: read
	id-token: write
	actions: write #required for workflow cancellation via check-aptos-core

	on:
	# Allow triggering manually
	workflow_dispatch:
	inputs:
	IMAGE_TAG:
	required: false
	type: string
	description: The docker image tag to test. This may be a git SHA1, or a tag like "<branch>_<git SHA1>". If not specified, Forge will find the latest build based on the git history (starting from GIT_SHA input)
	GIT_SHA:
	required: false
	type: string
	description: The git SHA1 to checkout. This affects the Forge test runner that is used. If not specified, the latest main will be used
	TEST_NAME:
	required: true
	type: choice
	description: The specific stable test to run. If 'all', all stable tests will be run
	default: 'all'
	options:
	- all
	- framework-upgrade-test
	# Test varies the load, i.e. sending 10, 100, 1000, 5000, etc TPS, and then measuring
	# onchain TPS, expired rate, as well as p50/p90/p99 latency, among other things,
	# testing that we don't degrade performance both for low, mid and high loads.
	- realistic-env-load-sweep
	# Test varies the workload, across some basic workloads (i.e. some cheap, some expensive),
	# and checks that throughput and performance across different stages
	- realistic-env-workload-sweep
	# Test sends ConstTps workload above what the system can handle, while additionally sending
	# non-small high-fee traffic (1000 TPS), and measures overall system performance.
	- realistic-env-graceful-overload
	# Test varies the workload (opposite ends of gas calibration, high and low output sizes,
	# sequential / parallel, etc), and sends ConstTPS for each above what the system can handle,
	# while sending low TPS of high fee transactions. And primarily confirms that high-fee traffic
	# has predictably low latency, and execution pipeline doesn't get backed up.
	- realistic-env-graceful-workload-sweep
	# Test varies workload, which is user-contracts, such that max throughput varies from high to mid to low,
	# while testing that unrelated transactions paying the same gas price, are able to go through.
	- realistic-env-fairness-workload-sweep
	# Test which tunes all configurations for largest throughput possible (potentially sacrificing latency a bit)
	- realistic-network-tuned-for-throughput
	# Run small-ish load, but checks that at all times all nodes are making progress,
	# catching any unexpected unreliabilities/delays in consensus
	- consensus-stress-test
	# Send a mix of different workloads, to catch issues with different interactions of workloads.
	# this is MUCH MUCH less comprehensive than replay-verify, but the best we can do with txn emitter at the moment
	- workload-mix-test
	- single-vfn-perf
	- fullnode-reboot-stress-test
	- compat
	# Send low TPS (100 TPS)
	# Cut network on enough nodes, such that all others are needed for consensus. Then bring a few back, and cut
	# same amount of new ones - requiring all that were brought back to state-sync and continue executing.
	# Check that in each iteration - we were able to make meaningful progress.
	- changing-working-quorum-test
	# Same as above run-forge-changing-working-quorum-test, just sending a bit higher load - 500TPS
	# TODO - we should probably increase load here significantly
	- changing-working-quorum-test-high-load
	- pfn-const-tps-realistic-env
	# Run a production config (same as land blocking run) max load (via mempool backlog), but run it for 2 hours,
	# to check reliability and consistency of the newtork.
	- realistic-env-max-load-long
	JOB_PARALLELISM:
	required: false
	type: number
	description: The number of test jobs to run in parallel. If not specified, defaults to 1
	default: 1

	# NOTE: to support testing different branches on different schedules, you need to specify the cron schedule in the 'determine-test-branch' step as well below
	# Reference: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#schedule
	schedule:
	- cron: "0 22 * * 0,2,4" # The main branch cadence. This runs every Sun,Tues,Thurs
	pull_request:
	paths:
	- ".github/workflows/forge-stable.yaml"
	- "testsuite/find_latest_image.py"

	concurrency:
	group: forge-stable-${{ format('{0}-{1}-{2}-{3}', github.ref_name, inputs.GIT_SHA, inputs.IMAGE_TAG, inputs.TEST_NAME) }}
	cancel-in-progress: true

	env:
	AWS_ACCOUNT_NUM: ${{ secrets.ENV_ECR_AWS_ACCOUNT_NUM }}
	AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
	AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
	IMAGE_TAG: ${{ inputs.IMAGE_TAG }} # this is only used for workflow_dispatch, otherwise defaults to empty
	AWS_REGION: us-west-2

	jobs:
	generate-matrix:
	runs-on: ubuntu-latest
	outputs:
	matrix: ${{ steps.set-matrix.outputs.matrix }}
	imageVariants: ${{ steps.set-matrix.outputs.imageVariants }}
	steps:
	- name: Compute matrix
	id: set-matrix
	uses: actions/github-script@v7
	env:
	TEST_NAME: ${{ inputs.TEST_NAME }}
	with:
	result-encoding: string
	script: \|
	const testName = process.env.TEST_NAME \|\| 'all';
	console.log(`Running job: ${testName}`);
	const tests = [
	{ TEST_NAME: 'framework-upgrade-test', FORGE_RUNNER_DURATION_SECS: 7200, FORGE_TEST_SUITE: 'framework_upgrade' },
	{ TEST_NAME: 'realistic-env-load-sweep', FORGE_RUNNER_DURATION_SECS: 1800, FORGE_TEST_SUITE: 'realistic_env_load_sweep' },
	{ TEST_NAME: 'realistic-env-workload-sweep', FORGE_RUNNER_DURATION_SECS: 2000, FORGE_TEST_SUITE: 'realistic_env_workload_sweep' },
	{ TEST_NAME: 'realistic-env-graceful-overload', FORGE_RUNNER_DURATION_SECS: 1200, FORGE_TEST_SUITE: 'realistic_env_graceful_overload' },
	{ TEST_NAME: 'realistic-env-graceful-workload-sweep', FORGE_RUNNER_DURATION_SECS: 2100, FORGE_TEST_SUITE: 'realistic_env_graceful_workload_sweep' },
	{ TEST_NAME: 'realistic-env-fairness-workload-sweep', FORGE_RUNNER_DURATION_SECS: 900, FORGE_TEST_SUITE: 'realistic_env_fairness_workload_sweep' },
	{ TEST_NAME: 'realistic-network-tuned-for-throughput', FORGE_RUNNER_DURATION_SECS: 900, FORGE_TEST_SUITE: 'realistic_network_tuned_for_throughput', FORGE_ENABLE_PERFORMANCE: true },
	{ TEST_NAME: 'consensus-stress-test', FORGE_RUNNER_DURATION_SECS: 2400, FORGE_TEST_SUITE: 'consensus_stress_test' },
	{ TEST_NAME: 'workload-mix-test', FORGE_RUNNER_DURATION_SECS: 900, FORGE_TEST_SUITE: 'workload_mix' },
	{ TEST_NAME: 'single-vfn-perf', FORGE_RUNNER_DURATION_SECS: 480, FORGE_TEST_SUITE: 'single_vfn_perf' },
	{ TEST_NAME: 'fullnode-reboot-stress-test', FORGE_RUNNER_DURATION_SECS: 1800, FORGE_TEST_SUITE: 'fullnode_reboot_stress_test' },
	{ TEST_NAME: 'compat', FORGE_RUNNER_DURATION_SECS: 300, FORGE_TEST_SUITE: 'compat' },
	{ TEST_NAME: 'changing-working-quorum-test', FORGE_RUNNER_DURATION_SECS: 1200, FORGE_TEST_SUITE: 'changing_working_quorum_test', FORGE_ENABLE_FAILPOINTS: true },
	{ TEST_NAME: 'changing-working-quorum-test-high-load', FORGE_RUNNER_DURATION_SECS: 900, FORGE_TEST_SUITE: 'changing_working_quorum_test_high_load', FORGE_ENABLE_FAILPOINTS: true },
	{ TEST_NAME: 'pfn-const-tps-realistic-env', FORGE_RUNNER_DURATION_SECS: 900, FORGE_TEST_SUITE: 'pfn_const_tps_with_realistic_env' },
	{ TEST_NAME: 'realistic-env-max-load-long', FORGE_RUNNER_DURATION_SECS: 7200, FORGE_TEST_SUITE: 'realistic_env_max_load_large' }
	];

	const matrix = testName != "all" ? tests.filter(test => test.TEST_NAME === testName) : tests;
	core.debug(`Matrix: ${JSON.stringify(matrix)}`);

	core.summary.addHeading('Forge Stable Run');

	const testsToRunNames = matrix.map(test => `${test.TEST_NAME} (Needs Failpoint: ${test.FORGE_ENABLE_FAILPOINTS \|\| false}, Needs Performance: ${test.FORGE_ENABLE_PERFORMANCE \|\| false})`);
	core.summary.addRaw("The following tests will be run:", true);
	core.summary.addList(testsToRunNames);

	core.summary.write();

	const needsFailpoints = matrix.some(test => test.FORGE_ENABLE_FAILPOINTS);
	const needsPerformance = matrix.some(test => test.FORGE_ENABLE_PERFORMANCE);

	let requiredImageVariants = [];
	if (needsFailpoints) {
	requiredImageVariants.push('failpoints');
	}
	if (needsPerformance) {
	requiredImageVariants.push('performance');
	}

	core.setOutput('matrix', JSON.stringify({ include: matrix }));
	core.setOutput('imageVariants', requiredImageVariants.join(' '));

	# This job determines the image tag and branch to test, and passes them to the other jobs
	# NOTE: this may be better as a separate workflow as the logic is quite complex but generalizable
	determine-test-metadata:
	runs-on: ubuntu-latest
	needs: ["generate-matrix"]
	outputs:
	IMAGE_TAG: ${{ steps.get-docker-image-tag.outputs.IMAGE_TAG }}
	IMAGE_TAG_FOR_COMPAT_TEST: ${{ steps.get-last-released-image-tag-for-compat-test.outputs.IMAGE_TAG }}
	BRANCH: ${{ steps.determine-test-branch.outputs.BRANCH }}
	BRANCH_HASH: ${{ steps.hash-branch.outputs.BRANCH_HASH }}
	steps:
	- uses: actions/checkout@v4

	- name: Determine branch based on cadence
	id: determine-test-branch
	# NOTE: the schedule cron MUST match the one in the 'on.schedule.cron' section above
	run: \|
	if [[ "${{ github.event_name }}" == "schedule" ]]; then
	if [[ "${{ github.event.schedule }}" == "0 22 * * 0,2,4" ]]; then
	echo "Branch: main"
	echo "BRANCH=main" >> $GITHUB_OUTPUT
	else
	echo "Unknown schedule: ${{ github.event.schedule }}"
	exit 1
	fi
	elif [[ "${{ github.event_name }}" == "push" ]]; then
	echo "Branch: ${{ github.ref_name }}"
	echo "BRANCH=${{ github.ref_name }}" >> $GITHUB_OUTPUT
	# on workflow_dispatch, this will simply use the inputs.GIT_SHA given (or the default)
	elif [[ -n "${{ inputs.GIT_SHA }}" ]]; then
	echo "BRANCH=${{ inputs.GIT_SHA }}" >> $GITHUB_OUTPUT
	# if GIT_SHA not provided, use the branch where workflow runs on
	else
	echo "BRANCH=${{ github.head_ref }}" >> $GITHUB_OUTPUT
	fi
	# Use the branch hash instead of the full branch name to stay under kubernetes namespace length limit
	- name: Hash the branch
	id: hash-branch
	run: \|
	# Hashing the branch name
	echo "BRANCH_HASH=$(echo -n "${{ steps.determine-test-branch.outputs.BRANCH }}" \| sha256sum \| cut -c1-10)" >> $GITHUB_OUTPUT

	- uses: aptos-labs/aptos-core/.github/actions/check-aptos-core@main
	with:
	cancel-workflow: ${{ github.event_name == 'schedule' }} # Cancel the workflow if it is scheduled on a fork

	# actions/get-latest-docker-image-tag requires docker utilities and having authenticated to internal docker image registries
	- uses: aptos-labs/aptos-core/.github/actions/docker-setup@main
	id: docker-setup
	with:
	GCP_WORKLOAD_IDENTITY_PROVIDER: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }}
	GCP_SERVICE_ACCOUNT_EMAIL: ${{ secrets.GCP_SERVICE_ACCOUNT_EMAIL }}
	EXPORT_GCP_PROJECT_VARIABLES: "false"
	AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
	AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
	AWS_DOCKER_ARTIFACT_REPO: ${{ secrets.AWS_DOCKER_ARTIFACT_REPO }}
	GIT_CREDENTIALS: ${{ secrets.GIT_CREDENTIALS }}

	- uses: aptos-labs/aptos-core/.github/actions/get-latest-docker-image-tag@main
	id: get-docker-image-tag
	with:
	branch: ${{ steps.determine-test-branch.outputs.BRANCH }}
	variants: ${{ needs.generate-matrix.outputs.imageVariants }}

	- uses: aptos-labs/aptos-core/.github/actions/determine-or-use-target-branch-and-get-last-released-image@main
	id: get-last-released-image-tag-for-compat-test
	with:
	base-branch: ${{ steps.determine-test-branch.outputs.BRANCH }}
	variants: ${{ needs.generate-matrix.outputs.imageVariants }}

	- name: Write summary
	run: \|
	IMAGE_TAG=${{ steps.get-docker-image-tag.outputs.IMAGE_TAG }}
	IMAGE_TAG_FOR_COMPAT_TEST=${{ steps.get-last-released-image-tag-for-compat-test.outputs.IMAGE_TAG }}
	TARGET_BRANCH_TO_FETCH_IMAGE_FOR_COMPAT_TEST=${{ steps.get-last-released-image-tag-for-compat-test.outputs.TARGET_BRANCH }}
	BRANCH=${{ steps.determine-test-branch.outputs.BRANCH }}
	if [ -n "${BRANCH}" ]; then
	echo "BRANCH: [${BRANCH}](https://github.com/${{ github.repository }}/tree/${BRANCH})" >> $GITHUB_STEP_SUMMARY
	fi
	echo "IMAGE_TAG: [${IMAGE_TAG}](https://github.com/${{ github.repository }}/commit/${IMAGE_TAG})" >> $GITHUB_STEP_SUMMARY
	echo "To cancel this job, do `pnpm infra ci cancel-workflow ${{ github.run_id }}` from internal-ops" >> $GITHUB_STEP_SUMMARY

	run:
	name: forge-${{ matrix.TEST_NAME }}
	needs: [determine-test-metadata, generate-matrix]
	if: ${{ github.event_name != 'pull_request' }}
	strategy:
	fail-fast: false
	max-parallel: ${{ inputs.JOB_PARALLELISM && fromJson(inputs.JOB_PARALLELISM) \|\| 1 }}
	matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}
	uses: aptos-labs/aptos-core/.github/workflows/workflow-run-forge.yaml@main
	secrets: inherit
	with:
	IMAGE_TAG: ${{ ((matrix.FORGE_TEST_SUITE == 'compat' && needs.determine-test-metadata.outputs.IMAGE_TAG_FOR_COMPAT_TEST) \|\| needs.determine-test-metadata.outputs.IMAGE_TAG) }}
	FORGE_NAMESPACE: forge-${{ matrix.TEST_NAME }}-${{ needs.determine-test-metadata.outputs.BRANCH_HASH }}
	FORGE_TEST_SUITE: ${{ matrix.FORGE_TEST_SUITE }}
	FORGE_RUNNER_DURATION_SECS: ${{ matrix.FORGE_RUNNER_DURATION_SECS }}
	FORGE_ENABLE_PERFORMANCE: ${{ matrix.FORGE_ENABLE_PERFORMANCE \|\| false }}
	FORGE_ENABLE_FAILPOINTS: ${{ matrix.FORGE_ENABLE_FAILPOINTS \|\| false }}
	POST_TO_SLACK: true
	SEND_RESULTS_TO_TRUNK: true

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continuous Forge Tests - Stable #1625

Workflow file

Continuous Forge Tests - Stable #1625

Jobs

Run details

Workflow file for this run