Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fact Table Query Optimization #1923

Merged
merged 23 commits into from
Jan 13, 2024
Merged

Fact Table Query Optimization #1923

merged 23 commits into from
Jan 13, 2024

Conversation

jdorn
Copy link
Member

@jdorn jdorn commented Dec 6, 2023

Summary

If multiple metrics from the same Fact Table are added to an experiment, combine them into a single SQL query for increased performance.

image

Instead of the query returning columns for a single metric (e.g. main_sum, main_sum_squares), it will return multiple sets of columns with a separate prefix for each metric (e.g. m0_main_sum, m0_main_sum_squares, m1_main_sum, etc.). In addition, each prefix will also have an id column so we can identify which fact metric it belongs to.

Before calling the stats engine with the query result, we split it back into multiple metrics. This avoids needing to update the Python code.

Current State: Happy path works end-to-end. Need lots of testing, review, and error handling.

Changes

  • New Integration method: getExperimentFactMetricsQuery. Very similar to existing experiment metric query method, but accepts an array of fact metrics and returns back a wide table with prefixed columns for each metric
  • Update ExperimentResultsQueryRunner to group related fact metrics together and call the new Integration method when applicable. Respects a MAX_METRICS_PER_QUERY setting.
  • Update stats.ts to break grouped results back into multiple separate metrics before calling stats engine
  • Improve View Queries Modal - when it sees a Fact Metric Id in the results, it shows a badge with the metric name instead of the opaque id
  • Add a data source property to broadcast that MySQL has an inefficient percentile calculation

TODO:

Research

Some of the engines have strict limits on the number of columns, as low as 1000. A single metric may have up to 10 columns (regression adjustment, denominator, capping, etc.), so we need to do some chunking to limit how many metrics are included in a single query.

If multiple metrics from the same Fact Table are added to an experiment, combine them into a single SQL query
Copy link

github-actions bot commented Dec 6, 2023

Your preview environment pr-1923-bttf has been deployed.

Preview environment endpoints are available at:

@jdorn jdorn changed the title Fact Table Query Optimization [WIP] Fact Table Query Optimization Dec 9, 2023
jdorn and others added 10 commits December 8, 2023 20:02
* first attempt

* Second attempt

* Notebook generating

* lint

* Typo

* pyright and finish migration

* Update version, fix notebook

* Reformat

* Fix manual

* Add jstat declaration

* Fix manual snapshot issue; remove manual snapshot preview

* Fix var_id_map in notebook

* Create return types

* Remove unused import
…ion, show optimized badge in View Queries modal
@jdorn jdorn marked this pull request as ready for review January 12, 2024 06:47
Copy link

github-actions bot commented Jan 12, 2024

Deploy preview for docs ready!

✅ Preview
https://docs-91fneymtv-growthbook.vercel.app

Built with commit 9f7b36b.
This pull request is being automatically deployed with vercel-action

@jdorn jdorn merged commit 46d2c52 into main Jan 13, 2024
6 checks passed
@jdorn jdorn deleted the fact-table-optimization branch January 13, 2024 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants