R package to add 'dplyr'-like Syntax for Summary Statistics of Survey Data
The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:
srvyr focuses on calculating summary statistics from survey data, such as the mean, total or quantile. It allows for the use of many dplyr verbs, such as
mutate, the convenience of pipe-able functions, rlang’s style of non-standard evaluation and more consistent return types than the survey package.
You can try it out:
install.packages("srvyr") # or for development version # devtools::install_github("gergness/srvyr")
First, describe the variables that define the survey’s structure with the function
as_survey()with the bare column names of the names that you would use in functions from the survey package like
library(srvyr, warn.conflicts = FALSE) data(api, package = "survey")
dstrata % as_survey_design(strata = stype, weights = pw)
Now many of the dplyr verbs are available.
mutate()adds or modifies a variable.
dstrata % mutate(api_diff = api00 - api99)
summarise()calculates summary statistics such as mean, total, quantile or ratio.
dstrata %>% summarise(api_diff = survey_mean(api_diff, vartype = "ci")) #> # A tibble: 1 x 3 #> api_diff api_diff_low api_diff_upp #> #> 1 32.9 28.8 37.0
summarise()creates summaries by groups.
dstrata %>% group_by(stype) %>% summarise(api_diff = survey_mean(api_diff, vartype = "ci")) #> Warning: The `add` argument of `group_by()` is deprecated as of dplyr 1.0.0. #> Please use the `.add` argument instead. #> This warning is displayed once every 8 hours. #> Call `lifecycle::last_warnings()` to see where this warning was generated. #> # A tibble: 3 x 4 #> stype api_diff api_diff_low api_diff_upp #> #> 1 E 38.6 33.1 44.0 #> 2 H 8.46 1.74 15.2 #> 3 M 26.4 20.4 32.4
my_model #> Call: #> svyglm(formula = api99 ~ stype, design = dstrata) #> #> Survey design: #> Called via srvyr #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 635.87 13.34 47.669 <2e-16 *** #> stypeH -18.51 20.68 -0.895 0.372 #> stypeM -25.67 21.42 -1.198 0.232 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> (Dispersion parameter for gaussian family taken to be 16409.56) #> #> Number of Fisher Scoring iterations: 2
[srvyr] lets us use the survey library’s functions within a data analysis pipeline in a familiar way.
– Kieran Healy, in Data Visualization: A practical introduction
–Thomas Lumley, in the Biased and Inefficient blog
I do appreciate bug reports, suggestions and pull requests! I started this as a way to learn about R package development, and am still learning, so you’ll have to bear with me. Please review the Contributor Code of Conduct, as all participants are required to abide by its terms.