{farrago} is an R package serving as a collection of tools for data workflows and analysis, with focus on health surveillance data. Although {farrago} primarily serves as a personal collection of odds-and-ends picked-up or created over the past several years, it may assist wider audiences as well. The package is organized by general purpose/functionality, which may eventually be separated into discrete packages.
{farrago} is only available from GitHub and the latest version can be installed with:
# install.packages("devtools")
devtools::install_github("al-obrien/farrago")
{farrago} has a variety of functions available. They are roughly organized into the following categories:
-
Calculations: provides algorithms for some routine processes such as…
-
Determining pregnancy trimesters
-
Assigning episode periods (e.g. for repeat infections)
-
Collapsing time-steps
-
Determining overlaps in time
-
Basic metrics such as rates, max/min, etc.
-
-
Conversions: helper functions to convert between common formats in epidemiology
-
Replace all blank values to NA (e.g. when importing data from SAS)
-
Switch between flu and calendar weeks
-
Determine flu season from date
-
Quickly convert a table to image (png)
-
Basic conversions such as from numbers to percent, number to factor, etc.
-
-
Creation: generate new content
-
Make multi-level factors similar to SAS ‘multi-label’ functionality
-
Create hypercubes (i.e. n-dimensional table including group summaries and totals)
-
Determine break points from set of values
-
-
Transferal: methods to move objects and data
-
Easily
stow()
andretrieve()
data-sets to make efficient use of RAM -
File transfer using WinSCP wrapper
-
Locate files
-
Pass code and retrieve data from SAS (primarily for use with Classic 9.4)
-
-
Plotting: helper functions for shared legends and less common plots such as bulls-eye charts and X-splines
-
Miscellaneous
This is a basic example using a sub-set of functions from {farrago}…
# Load libraries
library(farrago)
library(magrittr)
library(dplyr)
library(lubridate)
# Download from configured SFTP location
transfer_winscp(file ='my_rmt_file.csv'),
direction = 'download',
connection = 'sftp://myusername:[email protected]/'
rmt_path = './location/',
drop_location = 'C:/PATH/TO/DESIRED/FOLDER/')
# Non-sense data for example
my_rmt_file <- tibble::tribble(~grp_id, ~date, ~date_of_birth, ~condition, ~date_of_birth_child,
1, '2020-01-01', '1970-06-04', 'alive', '1991-01-01',
1, '2020-01-01', '1980-04-05', '', '1990-02-04',
1, '2020-01-03', '1930-04-05', 'alive', '',
1, '2020-01-04', '1967-04-05', 'alive', '1998-01-21',
2, '2020-01-01', '1978-04-05', 'alive', '1998-06-21',
2, '2020-09-10', '1970-04-05', 'alive', '1992-09-13',
2, '2020-09-21', '1949-04-05', 'dead', '1987-01-03',
3, '2020-01-01', '1977-04-05', '', '1992-01-21',
3, '2020-01-02', '1944-04-05', 'alive', '',
3, '2020-01-21', '1943-06-05', 'alive', '1967-09-12',
3, '2020-01-22', '1969-07-05', 'alive', '2006-12-21',
3, '2020-04-22', '', NA, NA,
3, '2021-06-09', '1978-09-21', 'dead', '1992-01-21') %>%
dplyr::mutate_at(vars(contains('date')), ymd)
# Remove blanks
my_rmt_file <- convert_blank2NA(my_rmt_file)
# Determine episode period based on first date by group
my_rmt_file$episode <- assign_episode(data = my_rmt_file,
grp_id = grp_id,
date = date,
threshold = 10)
# Determine age and age group from date
my_rmt_file$age <- calculate_age(my_rmt_file$date_of_birth)
my_rmt_file$age_grp <- create_breaks(my_rmt_file$age, breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90), format = TRUE)
# Calculate trimester based on dob (of child)
my_rmt_file <- calculate_trimesters(my_rmt_file, date_of_birth_child)
#> Warning in calculate_trimesters(my_rmt_file, date_of_birth_child): No variable
#> for gestation length was provided, all pregnancies will assume the average
#> pregnancy length of: 40
# View final dataset
knitr::kable(my_rmt_file)
grp_id | date | date_of_birth | condition | date_of_birth_child | episode | age | age_grp | tri1_s | tri1_e | tri2_s | tri2_e | tri3_s | preterm |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2020-01-01 | 1970-06-04 | alive | 1991-01-01 | 1 | 51 | 50-59 | 1990-03-27 | 1990-06-26 | 1990-06-27 | 1990-09-26 | 1990-09-27 | 0 |
1 | 2020-01-01 | 1980-04-05 | NA | 1990-02-04 | 1 | 41 | 40-49 | 1989-04-30 | 1989-07-30 | 1989-07-31 | 1989-10-30 | 1989-10-31 | 0 |
1 | 2020-01-03 | 1930-04-05 | alive | NA | 1 | 91 | >=90 | NA | NA | NA | NA | NA | NA |
1 | 2020-01-04 | 1967-04-05 | alive | 1998-01-21 | 1 | 54 | 50-59 | 1997-04-16 | 1997-07-16 | 1997-07-17 | 1997-10-16 | 1997-10-17 | 0 |
2 | 2020-01-01 | 1978-04-05 | alive | 1998-06-21 | 1 | 43 | 40-49 | 1997-09-14 | 1997-12-14 | 1997-12-15 | 1998-03-16 | 1998-03-17 | 0 |
2 | 2020-09-10 | 1970-04-05 | alive | 1992-09-13 | 2 | 51 | 50-59 | 1991-12-08 | 1992-03-08 | 1992-03-09 | 1992-06-08 | 1992-06-09 | 0 |
2 | 2020-09-21 | 1949-04-05 | dead | 1987-01-03 | 3 | 72 | 70-79 | 1986-03-29 | 1986-06-28 | 1986-06-29 | 1986-09-28 | 1986-09-29 | 0 |
3 | 2020-01-01 | 1977-04-05 | NA | 1992-01-21 | 1 | 44 | 40-49 | 1991-04-16 | 1991-07-16 | 1991-07-17 | 1991-10-16 | 1991-10-17 | 0 |
3 | 2020-01-02 | 1944-04-05 | alive | NA | 1 | 77 | 70-79 | NA | NA | NA | NA | NA | NA |
3 | 2020-01-21 | 1943-06-05 | alive | 1967-09-12 | 2 | 78 | 70-79 | 1966-12-06 | 1967-03-07 | 1967-03-08 | 1967-06-07 | 1967-06-08 | 0 |
3 | 2020-01-22 | 1969-07-05 | alive | 2006-12-21 | 2 | 52 | 50-59 | 2006-03-16 | 2006-06-15 | 2006-06-16 | 2006-09-15 | 2006-09-16 | 0 |
3 | 2020-04-22 | NA | NA | NA | 3 | NA | NA | NA | NA | NA | NA | NA | NA |
3 | 2021-06-09 | 1978-09-21 | dead | 1992-01-21 | 4 | 43 | 40-49 | 1991-04-16 | 1991-07-16 | 1991-07-17 | 1991-10-16 | 1991-10-17 | 0 |
# Save file for easy retrieval later
my_rmt_file_stowed <- stow(my_rmt_file, cleanup = TRUE)