Performance testing next steps #1093

swissspidy · 2024-03-26T21:51:41Z

From the below, top of mind are currently:

Lack of care around the dashboard (no alerts, people not checking it)
Outdated test suite (still testing TT3, not testing TT5, baseline is 6.1)
Stability
Project ownership

Improve core performance tests

Core
- Unclear ownership / maintenance
- Measure interactivity metrics in performance tests
- Measure layout stability metrics in performance tests
- Add tests for Twenty Twenty-Five
- Test more combinations (e.g. Multisite, object cache enabled)
  - Expand performance tests setup wordpress-develop#6515
- Outdated baseline (6.1)
Gutenberg
- Measure more frontend metrics
Tests quality and stability
- Alignment between core and Gutenberg
  - Core relies on GitHub Actions, while GB uses a sophisticated node script for running tests
  - GB regressions are often only found after merging into trunk (too late)
- Improve demo content
  - Consider covering specific realistic use cases (e.g., homepage, page with large LCP image, etc)
  - Relevant for testing interactivity and layout stability
- Performance Metrics Stabilization #849
  - Use WordPress Playground for more stability
    - Just for context, this was something that @dmsnell brought up, most recently in this Slack message
    - We could simulate different I/O latencies and intercept every DB and file operation
    - wp-now has some design problems that make it less favorable for testing (needs some refactoring first?)
Dashboard
- Current custom dashboard is limited
- Maybe build a more powerful dashboard using Grafana or similar
- Lack of alerts for significant changes or broken workflow

Adoption

Run performance tests at scale using Tide?
More outreach (1:1, developer blog, etc.)
Directly reach out to a few bigger plugins to help them set up performance testing, setting a precedent for others
Consider moving GitHub Action to the WordPress GitHub org to make it more official

Typical reasons why plugin developers do not adopt similar performance tests:

They use a different testing setup (Codeception, Cypress, Puppeteer, k6) or CI (Bitbucket, GitLab)
Lack of tests setup in general (e.g. not even unit tests in place)
Focus is more on backend performance rather than frontend performance
Lack of resources for implementing additional kinds of testing
No testing strategy, e.g. even if they had the whole setup, they wouldn't know where to start
Metrics stability concerns
- Makes it especially difficult to tie metrics change to specific code change

The text was updated successfully, but these errors were encountered:

joemcgill · 2024-03-29T20:50:26Z

Thanks for kicking this discussion off, @swissspidy.

I appreciate the distinction between two main objectives. For now, I'm going to limit my thoughts to the first objective, "Improve Core/GB performance testing".

One of the things that I observed during the 6.5 release is that finding the source of a server timing regression was challenging when that regression was committed to the Core repo as part of a larger Gutenberg sync. I think we can improve this somewhat by updating the performance tests in the Gutenberg repo to include the same server timing metrics that we record for each commit to the WP Core repo (i.e., wpTotal, wpBeforeTemplate, and wpTemplate). While TTFB is a close proximity, the additional noise from network requests and calculating the metric in a headless browser makes pinpointing potential regressions more difficult.

I also strongly agree with your suggestion to improve our demo content, and think this applies as much to the Core tests as the Gutenberg tests. Currently in Core, we are only testing the default homepage for the Twenty Twenty-One and Twenty Twenty-Three themes after importing a Theme Test Data from this commit. While this keeps test content consistent over time, there are a number of limitations to this approach including the fact that some specific use cases that we care about (e.g., measuring the effect of image optimizations on LCP) are not covered by our current test content.

From a visualization point of view, our current dashboards at codevitals.run have become harder to use over time as we've added additional metrics. I'd love to investigate improving or replacing these dashboards with a system where we could more granularly filter results by metric, theme, template, object cache, etc. In doing so, we should also evaluate how we are normalizing and storing the raw data which currently gets normalized before it's saved to the dashboard's database meaning we don't have the ability to build new reports using the original unfiltered data.

Last idea for now, is that when we introduced these tests we used Twenty Twenty-One and Twenty Twenty-Three as representative themes for classic and block themes, but that has proven to be overly simplistic as some performance regressions are only visible based on characteristics of a theme (e.g., how many template part variations they register). At minimum, we should add Twenty Twenty-Four to our test matrix in both repos.

TL;DR:

Add server timing metrics to the GB performance workflow
Improve demo content for the performance workflow in both repos
Add specific use cases (e.g., homepage, page with large LCP image, etc) to our workflows for both repos
Improve the dashboard for tracking performance over time so it's more user friendly
Improve data collection process to make raw data queryable in the future
Add tests for Twenty Twenty-Four

swissspidy · 2024-04-03T14:15:59Z

That all makes sense to me. I think most of those suggestions are already mentioned in some place or another. I also previously explored a Grafana-based dashboard that would be more user friendly and could be fed raw data.

Some more thoughts on adoption:

Let's start with directly reaching out to some top plugins that we think could benefit from performance testing but don't leverage that yet. Find out why, figure out what's missing, and help them get started.
This way, we can iterate on the tooling before publishing another blog post or improving the GitHub Action.

swissspidy · 2024-07-01T09:53:09Z

Improve Core/GB performance testing

Mentioned above:

Improve demo content
- Needs an owner
- Priority?
Add specific use cases (e.g., homepage, page with large LCP image, etc)
- Also relevant for things like interactivity and layout stability metrics
- Needs some fleshing out
- Priority?
Better dashboard with raw data
- Needs an owner
- Priority?
Add tests for Twenty Twenty-Four
- This was added in one of the recent refactorings. 2021, 2023, and 2024 are now tested.
Alignment between core and Gutenberg
- Core relies on GitHub Actions, while GB uses a sophisticated node script for running tests.
- Should we have 1 script for both that could also be used by the ecosystem via npm or similar? Or is a GitHub Action good enough?
Use WordPress Playground for more stability
- Just for context, this was something that @dmsnell brought up, most recently in this Slack message
- We could simulate different I/O latencies and intercept every DB and file operation
- wp-now has some design problems that make it less favorable for testing (needs some refactoring first?)

More thoughts:

Test more combinations (e.g. Multisite, object cache enabled)
- Expand performance tests setup wordpress-develop#6515

Adoption

Let's start with directly reaching out to some top plugins that we think could benefit from performance testing but don't leverage that yet. Find out why, figure out what's missing, and help them get started.

The main points I've heard so far in various discussions:

They use a different testing setup (Codeception, Cypress, Puppeteer, k6) or CI (Bitbucket, GitLab)
Lack of tests setup in general (e.g. not even unit tests in place)
Focus is more on backend performance rather than frontend performance
Lack of resources for implementing additional kinds of testing
No testing strategy, e.g. even if they had the whole setup, they wouldn't know where to start
Metrics stability concerns
- Makes it especially difficult to tie metrics change to specific code change

Mentioned above:

GitHub Action
- Great, but lack of setup step for plugins
  - Add blueprint support so that plugins can easily prepare their testing environment
    - Blueprint support swissspidy/wp-performance-action#71
    - @adamziel Curious to hear your thoughts on that one
  - Concern: wp-now appears to be unsuitable for testing like this :( (see above)
- Once this is in place we can think about further blog posts
All-in-one npm package for running tests & doing comparisons
- Does it make sense? Usually not advisable to run tests on your own machine vs. a dedicate one

adamziel · 2024-07-01T22:36:00Z

I have a small backlog but I just added this to our board to make sure this gets a follow-up.

joemcgill · 2024-07-03T19:50:38Z

Thanks for the updates, @swissspidy. This looks like a good list to me. I do think that separating the Core/GB work and the Adoption work into two separate epics likely makes sense, and will make it easier to track progress of the two initiatives, even if there is some overlap.

From my perspective, focusing on improving and aligning the metrics that are taken and the way dashboard to display them between the WP and GB repos should be a high priority, so we're able to more quickly identify the likely cause of performance regressions based on server timing data earlier in feature development in the GB repo. We also need to fix some existing issues with the WP Performance workflows that have caused big jumps to show up in the dashboard due to the way the baseline checks were taken and add metrics for the TT4 theme in our WP dashboard view.

adamziel · 2024-07-16T16:59:15Z

Great, but lack of setup step for plugins
Add blueprint support so that plugins can easily prepare their testing environment
swissspidy/wp-performance-action#71
@adamziel Curious to hear your thoughts on that one

@swissspidy do you mean preparing a generic GitHub action that plugin authors could use to get a performance testing link on their PRs? If yes, that sounds fantastic. There's a related Playground issue and here's some inspiration to reuse:

If, on the other hand, that's about running E2E tests locally or in CI, @WunderBart's explorations here are highly relevant:

If I misunderstood, would you share some more details about that question?

swissspidy · 2024-07-16T17:17:02Z

@adamziel The idea is to provide a generic GitHub Action (https://github.com/swissspidy/wp-performance-action) that automatically runs e2e performance tests in CI. It already works (using wp-env) but blueprint support would be a killer feature. This way, projects can declaratively add test data/fixtures.

@dmsnell originally suggested using Playground as we could then have full I/O and DB control, which would hopefully help with getting stable results and would be a nice side effect.

WordPress/gutenberg#62692 comes close, it's just not stable enough :-)
Previously I used wp-now to avoid building Playground myself, but using Playground directly of course also works.

Regarding wp-now, Dennis mentioned some architectural issues that "make it less favorable for testing"

I also saw https://github.com/adamziel/wordpress-php-blueprints/ but not exactly sure what it does 🤔 Could I leverage this to use blueprints with an arbitrary setup? This way, I could already use blueprints while Playground is still being improved upon.

swissspidy · 2024-07-19T09:05:24Z

@joemcgill Extracted core improvements to #1380 now, we can keep this one for the adoption part

swissspidy · 2024-07-22T15:00:49Z

Here's a first successful attempt at using Playground for the testing environment: swissspidy/wp-performance-action#173

adamziel · 2024-09-11T11:43:53Z

@adamziel The idea is to provide a generic GitHub Action (swissspidy/wp-performance-action) that automatically runs e2e performance tests in CI. It already works (using wp-env) but blueprint support would be a killer feature. This way, projects can declaratively add test data/fixtures.

I saw you got Blueprints to work in swissspidy/wp-performance-action#173 via @wp-playground/cli, yay! 🎉 Are they providing the value you were counting on? Or could they be improved in some ways?

@dmsnell originally suggested using Playground as we could then have full I/O and DB control, which would hopefully help with getting stable results and would be a nice side effect.

Yes! This control enables explorations like HTTPS support, and with more work there could be knobs like "throttle network", "throttle DB", "log the data accessed by the plugin to provide an iOS-like permissions modal" etc.

WordPress/gutenberg#62692 comes close, it's just not stable enough :-)

I know @WunderBart keeps exploring this. A perfect conclusion would be a browser extension to record and replay test runs in Playground :)

I also saw adamziel/wordpress-php-blueprints but not exactly sure what it does 🤔 Could I leverage this to use blueprints with an arbitrary setup? This way, I could already use blueprints while Playground is still being improved upon.

That is an old repo is where I initially developed https://github.com/WordPress/blueprints-library/. The intention is to eventually run Blueprints via a PHP library that would run on any webhost where WordPress can run. I've put that work on hold as there were too many unanswered questions, e.g. how is a Blueprint different from a runtime configuration? Now that Playground is about to support multiple sites and needs its own runtime configuration format, we're in a good place to start looking for answers. I expect parts of the PHP library to start landing in Playground relatively soon, but getting to a full-featured PHP CLI tool might still take a while.

swissspidy · 2024-09-11T18:27:35Z

Are they providing the value you were counting on? Or could they be improved in some ways?

After I got everything to work, they seemed to work well, there were only occasional quirks like timeouts. The format is perhaps not the most intuitive, and it misses things like a step to copy an entire directory. I also had to figure out how to manually merge two blueprints together, maybe that could be built in :-)

A perfect conclusion would be a browser extension to record and replay test runs in Playground :)

Yeah something like that would be nice to get started with (performance) tests. https://playwright.dev/docs/codegen comes close, but that spits out JS code and not something more declarative. I don't really wanna execute arbitrary user-provided code :-) A blueprint-like format would be easier to validate. Related: swissspidy/wp-performance-action#174)

adamziel · 2024-09-16T19:48:38Z

I also had to figure out how to manually merge two blueprints together, maybe that could be built in :-)

I would love that! :) I just replied in WordPress/wordpress-playground#1631

A blueprint-like format would be easier to validate

💡 interesting! I wonder what formats like that already exist specifically for E2E tests 🤔

swissspidy · 2024-09-24T10:16:11Z

A blueprint-like format would be easier to validate

💡 interesting! I wonder what formats like that already exist specifically for E2E tests 🤔

So far I only found https://www.npmjs.com/package/declarative-e2e-test

brandonpayton · 2024-09-30T12:38:20Z

@swissspidy, is there anything more to do here from the perspective of WordPress Playground? We are tracking this issue on our project board and want to be sure you get what you need.

swissspidy · 2024-09-30T13:30:38Z

@brandonpayton Thanks for staying on top of things, appreciate it!

I tried running multiple blueprints but couldn't get that to work, see WordPress/wordpress-playground#1631 (reply in thread)

I have a workaround, so it's not urgent, but it would be nice if that worked reliably.

Apart from that I don't think there's a concrete urgent missing piece.

swissspidy · 2024-12-12T17:06:46Z

Split some quick wins into a new issue: #1735

swissspidy added [Focus] Measurement Needs Discussion Anything that needs a discussion/agreement labels Mar 26, 2024

This comment was marked as resolved.

Sign in to view

sstopfer removed the [Focus] Measurement label Jun 9, 2024

joemcgill added this to WP Performance 2024 Jun 26, 2024

joemcgill moved this to Definition ✏️ in WP Performance 2024 Jun 26, 2024

joemcgill added the [Issue] Overview Provides an overview of a specific project label Jun 26, 2024

adamziel added this to Playground Board Jul 1, 2024

adamziel moved this to Needs Triage/Our Reply in Playground Board Jul 1, 2024

sstopfer assigned swissspidy Jul 16, 2024

adamziel moved this from Needs Triage/Our Reply to Needs Author's Reply in Playground Board Jul 16, 2024

adamziel moved this from Needs Author's Reply to Inbox in Playground Board Jul 16, 2024

swissspidy mentioned this issue Jul 19, 2024

Core performance tests improvements #1380

Open

5 tasks

swissspidy changed the title ~~Performance testing next steps~~ Performance testing next steps (adoption) Jul 19, 2024

bgrgicak moved this from Inbox to Needs Triage/Our Reply in Playground Board Jul 22, 2024

adamziel moved this from Needs Triage/Our Reply to Needs Author's Reply in Playground Board Sep 11, 2024

adamziel moved this from Needs Author's Reply to Inbox in Playground Board Sep 11, 2024

adamziel moved this from Inbox to Needs Author's Reply in Playground Board Sep 16, 2024

adamziel moved this from Needs Author's Reply to Inbox in Playground Board Sep 24, 2024

brandonpayton moved this from Inbox to Needs Author's Reply in Playground Board Sep 30, 2024

adamziel moved this from Needs Author's Reply to Inbox in Playground Board Sep 30, 2024

bgrgicak moved this from Inbox to Needs Triage/Our Reply in Playground Board Oct 8, 2024

swissspidy removed their assignment Dec 5, 2024

swissspidy changed the title ~~Performance testing next steps (adoption)~~ Performance testing next steps Dec 5, 2024

swissspidy mentioned this issue Dec 12, 2024

Core performance tests improvements (quick wins) #1735

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance testing next steps #1093

Performance testing next steps #1093

swissspidy commented Mar 26, 2024 •

edited

Loading

joemcgill commented Mar 29, 2024

swissspidy commented Apr 3, 2024

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

swissspidy commented Jul 1, 2024 •

edited

Loading

adamziel commented Jul 1, 2024

joemcgill commented Jul 3, 2024

adamziel commented Jul 16, 2024

swissspidy commented Jul 16, 2024

swissspidy commented Jul 19, 2024

swissspidy commented Jul 22, 2024

adamziel commented Sep 11, 2024

swissspidy commented Sep 11, 2024

adamziel commented Sep 16, 2024

swissspidy commented Sep 24, 2024

brandonpayton commented Sep 30, 2024

swissspidy commented Sep 30, 2024

swissspidy commented Dec 12, 2024

Performance testing next steps #1093

Performance testing next steps #1093

Comments

swissspidy commented Mar 26, 2024 • edited Loading

joemcgill commented Mar 29, 2024

swissspidy commented Apr 3, 2024

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

swissspidy commented Jul 1, 2024 • edited Loading

adamziel commented Jul 1, 2024

joemcgill commented Jul 3, 2024

adamziel commented Jul 16, 2024

swissspidy commented Jul 16, 2024

swissspidy commented Jul 19, 2024

swissspidy commented Jul 22, 2024

adamziel commented Sep 11, 2024

swissspidy commented Sep 11, 2024

adamziel commented Sep 16, 2024

swissspidy commented Sep 24, 2024

brandonpayton commented Sep 30, 2024

swissspidy commented Sep 30, 2024

swissspidy commented Dec 12, 2024

swissspidy commented Mar 26, 2024 •

edited

Loading

swissspidy commented Jul 1, 2024 •

edited

Loading