Fact Table Query Optimization #1923

jdorn · 2023-12-06T06:21:22Z

Summary

If multiple metrics from the same Fact Table are added to an experiment, combine them into a single SQL query for increased performance.

Instead of the query returning columns for a single metric (e.g. main_sum, main_sum_squares), it will return multiple sets of columns with a separate prefix for each metric (e.g. m0_main_sum, m0_main_sum_squares, m1_main_sum, etc.). In addition, each prefix will also have an id column so we can identify which fact metric it belongs to.

Before calling the stats engine with the query result, we split it back into multiple metrics. This avoids needing to update the Python code.

Current State: Happy path works end-to-end. Need lots of testing, review, and error handling.

Changes

New Integration method: getExperimentFactMetricsQuery. Very similar to existing experiment metric query method, but accepts an array of fact metrics and returns back a wide table with prefixed columns for each metric
Update ExperimentResultsQueryRunner to group related fact metrics together and call the new Integration method when applicable. Respects a MAX_METRICS_PER_QUERY setting.
Update stats.ts to break grouped results back into multiple separate metrics before calling stats engine
Improve View Queries Modal - when it sees a Fact Metric Id in the results, it shows a badge with the metric name instead of the opaque id
Add a data source property to broadcast that MySQL has an inefficient percentile calculation

TODO:

Merge Batch calls to the stats engine to improve performance #1823
Update Python notebook generator to support new query result format
Feature flag and/or org setting to enable/disable fact metric grouping (in case errors were introduced)
Lots of testing!

Research

Some of the engines have strict limits on the number of columns, as low as 1000. A single metric may have up to 10 columns (regression adjustment, denominator, capping, etc.), so we need to do some chunking to limit how many metrics are included in a single query.

If multiple metrics from the same Fact Table are added to an experiment, combine them into a single SQL query

github-actions · 2023-12-06T06:32:29Z

Your preview environment pr-1923-bttf has been deployed.

Preview environment endpoints are available at:

…true

…s engine

…ove unnecessary length check

* first attempt * Second attempt * Notebook generating * lint * Typo * pyright and finish migration * Update version, fix notebook * Reformat * Fix manual * Add jstat declaration * Fix manual snapshot issue; remove manual snapshot preview * Fix var_id_map in notebook * Create return types * Remove unused import

…ion, show optimized badge in View Queries modal

github-actions · 2024-01-12T07:03:02Z

Deploy preview for docs ready!

✅ Preview
https://docs-91fneymtv-growthbook.vercel.app

Built with commit 9f7b36b.
This pull request is being automatically deployed with vercel-action

Fact Table Query Optimization [WIP]

1e9ecc6

If multiple metrics from the same Fact Table are added to an experiment, combine them into a single SQL query

jdorn added 8 commits December 6, 2023 08:31

More helper methods

34264ec

Move some of the unnest logic into SqlIntegration

6efb23a

Enterprise checks, skip multi-metric queries if skipPartialData is …

b9a0db1

…true

Merge remote-tracking branch 'origin/main' into fact-table-optimization

e104d63

Fix type checks

9bf23c0

PoC multi-metric query returning data

30007fe

Wide table, show fact metric name in View Queries modal, hook up stat…

382f300

…s engine

Merge remote-tracking branch 'origin/main' into fact-table-optimization

b4d1a95

jdorn changed the title ~~Fact Table Query Optimization [WIP]~~ Fact Table Query Optimization Dec 9, 2023

jdorn and others added 10 commits December 8, 2023 20:02

fix type error

a800318

Fix cuped and percentile capping typos

069d3aa

Use new query even if there's only one fact (more efficient SQL), rem…

16e770e

…ove unnecessary length check

Merge remote-tracking branch 'origin/main' into fact-table-optimization

3dd065b

Fix filters to get value, not id

328d3a8

Merge remote-tracking branch 'origin/main' into fact-table-optimization

1918c34

Merge remote-tracking branch 'origin/main' into fact-table-optimization

1c147a2

Merge remote-tracking branch 'origin/main' into fact-table-optimization

e9cd6f1

Add org toggle to disable optimization, fix refresh cache race condit…

1a1f8d0

…ion, show optimized badge in View Queries modal

jdorn marked this pull request as ready for review January 12, 2024 06:47

Update docs and references to 'coming soon'

b7f6495

jdorn added 3 commits January 12, 2024 10:37

Show query statistics in View Queries modal

7e899c9

Add overall query stats to view queries modal

9ee1763

Merge remote-tracking branch 'origin/main' into fact-table-optimization

9f7b36b

jdorn merged commit 46d2c52 into main Jan 13, 2024
6 checks passed

jdorn deleted the fact-table-optimization branch January 13, 2024 02:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fact Table Query Optimization #1923

Fact Table Query Optimization #1923

jdorn commented Dec 6, 2023 •

edited

Loading

github-actions bot commented Dec 6, 2023 •

edited

Loading

github-actions bot commented Jan 12, 2024 •

edited

Loading

Fact Table Query Optimization #1923

Fact Table Query Optimization #1923

Conversation

jdorn commented Dec 6, 2023 • edited Loading

Summary

Changes

Research

github-actions bot commented Dec 6, 2023 • edited Loading

github-actions bot commented Jan 12, 2024 • edited Loading

jdorn commented Dec 6, 2023 •

edited

Loading

github-actions bot commented Dec 6, 2023 •

edited

Loading

github-actions bot commented Jan 12, 2024 •

edited

Loading