Validating and deriving clinical-decision rules. Work-in-progress.
This is a collaborative repository intended to validate and derive clinical-decision rules. We use a unified pipeline across a variety of contributed datasets to vet previous modeling practices for clinical decision rules. Additionally, we hope to externally validate the rules under study here with data from UCSF.
Dataset | Task | Size | References | Processed |
---|---|---|---|---|
iai_pecarn | Predict intra-abdominal injury requiring acute intervention before CT | 12,044 patients, 203 with IAI-I | 📄, 🔗 | ✅ |
tbi_pecarn | Predict traumatic brain injuries before CT | 42,412 patients, 376 with ciTBI | 📄, 🔗 | ✅ |
csi_pecarn | Predict cervical spine injury in children | 3,314 patients, 540 with CSI | 📄, 🔗 | ✅ |
tig_pecarn | Predict bacterial/non-bacterial infections in febrile infants from RNA transcriptional biosignatures | 279 patients, ? with infection | 🔗 | ❌ |
exxagerate | Predict 30-day mortality for acute exacerbations of chronic obstructive pulmonary disease (AECOPD) | 1,696 patients, 17 mortalities | 📄, 🔗 | ❌ |
heart_disease_uci | Predict heart disease presence from basic attributes / screening | 920 patients, 509 with heart disease | 📄, 🔗 | ❌ |
Research paper 📄, Data download link 🔗
Datasets are all tabular (or at least have interpretable input features), reasonably large (e.g. have at least 100 positive and negative cases), and have a binary outcome. For PECARN datasets, please read and agree to the research data use agreement on the PECARN website.
Possible data sources: PECARN datasets | Kaggle datasets | MDCalc | UCI | OpenML | MIMIC | UCSF De-ID Potential specific datasets: Maybe later will expand to other high-stakes datasets (e.g. COMPAS, loan risk).
To contribute a new project (e.g. a new dataset + modeling), create a pull request following the steps below. The easiest way to do this is to copy-paste an existing project (e.g. iai_pecarn) into a new folder and then edit that one.
Helpful docs: Collaboration details | Lab writeup | Slides
- Repo set up
- Create a fork of this repo (see tutorial on forking/merging here)
- Install the repo as shown below
- Select a dataset - once you've selected, open an issue in this repo with the name of the dataset + a brief description so others don't work on the same dataset
- Assign a
project_name
to the new project (e.g.iai_pecarn
)
- Data preprocessing
- Download the raw data into
data/{project_name}/raw
- Don't commit any very large files
- Copy the template files from
rulevetting/projects/iai_pecarn
to a new folderrulevetting/projects/{project_name}
- Rewrite the functions in
dataset.py
for processing the new dataset (e.g. see the dataset for iai_pecarn) - Document any judgement calls you aren't sure about using the
dataset.get_judgement_calls_dictionary
function- See the template file for documentation of each function or the API documentation
- Notebooks / helper functions are optional, all files should be within
rulevetting/projects/{project_name}
- Rewrite the functions in
- Download the raw data into
- Data description
- Describe each feature in the processed data in a file named
data_dictionary.md
- Summarize the data and the prediction task in a file named
readme.md
. This should include basic details of data collection (who, how, when, where), why the task is important, and how a clinical decision rule may be used in this context. Should also include your names/affiliations.
- Describe each feature in the processed data in a file named
- Modeling
- Baseline model - implement
baseline.py
for predicting given a baseline rule (e.g. from the existing paper)- should override the model template in a class named
Baseline
- should override the model template in a class named
- New model - implement
model_best.py
for making predictions using your newly derived best model- also should override the model template in a class named
Model
- also should override the model template in a class named
- Baseline model - implement
- Lab writeup (see instructions)
- Save writeup into
writeup.pdf
+ include source files - Should contain details on exploratory analysis, modeling, validation, comparisons with baseline, etc.
- Save writeup into
- Submitting
- Ensure that all tests pass by running
pytest --project {project_name}
from the repo directory - Open a pull request and it will be reviewed / merged
- Ensure that all tests pass by running
- Reviewing submissions
- Each pull request will be reviewed by others before being merged
Note: requires python 3.7 and pytest (for running the automated tests). It is best practice to create a venv or pipenv for this project.
python -m venv rule-env
source rule-env/bin/activate
Then, clone the repo and install the package and its dependencies.
git clone https://github.com/Yu-Group/rule-vetting
cd rule-vetting
pip install -e .
Now run the automatic tests to ensure everything works (warnings are fine as long as all test pass).
pytest --project iai_pecarn
To use with jupyter, might have to add this venv as a jupyter kernel.
python -m ipykernel install --user --name=rule-env
Dataset | Task | Size | References | Processed |
---|---|---|---|---|
bronch_pecarn | Effectiveness of oral dexamethasone for acute bronchiolitisintra-abdominal injury requiring acute intervention before CT | 600 patients, 50% control | 📄, 🔗 | ❌ |
gastro_pecarn | Impact of Emergency Department Probiotic Treatment of Pediatric Gastroenteritis | 886 patients, 50% control | 📄, 🔗 | ❌ |
Research paper 📄, Data download link 🔗
Background reading
- Be familiar with the imodels: package
- See the TRIPOD statement on medical reporting
- See the Veridical data science paper
Related packages
- imodels: rule-based modeling
- veridical-flow: stability-based analysis
- gplearn: symbolic regression/classification
- pygam: generative additive models
- interpretml: boosting-based gam
Updates
- For updates, star the repo, see this related repo, or follow @csinva_
- Please make sure to give authors of original datasets appropriate credit!
- Contributing: pull requests very welcome!
Related open-source collaborations
- The imodels package maintains many of the rule-based models here
- Inspired by the BIG-bench effort.
- See also NL-Augmenter and NLI-Expansion