Posts + Papers
This is Statistics | American Statistical Association
Many students don’t realize how much they love statistics until they take their first class. This was the case for Dr. Eric J. Daza, a health data scientist with over two decades of experience. Throughout his educational and professional experiences, Eric has learned that statistics can be a framework for life.
Among his many accomplishments, Eric has contributed to the evolution of personalized health data analysis. His innovative work at Evidation Health demonstrates how data scientists and statisticians can actually change the world for the better.
Eric is also at the forefront of the American Statistical Association (ASA) efforts to improve diversity and inclusion practices across the profession as the Professional Development Chair of ASA’s Justice, Equity, Diversity, and Inclusion (JEDI) Outreach Group.
How long have you been in tech?
I’ve been in health tech for more than four years. This is my second career, having previously spent seven years in clinical trials biostatistics.
How did you know you wanted to get into tech?
I got into health tech through a few statistically significant events. ...
I failed or almost failed out of a few core STEM courses in my biology undergrad — including calculus. I barely passed my core major GPA requirement to graduate, and scored a middling overall GPA.
Today, I hold a doctorate in biostatistics from a top public health school, and completed a postdoc in health behavior at a top medical school. And in 2022, I was recognized by both Forbes and Fortune for my innovative work in health data science.
How did I get here?
K Brabaw. Fortune.
Fortune has a long history of highlighting innovative leaders. That tradition continues with our inaugural list spotlighting 10 people and teams creating the future of healthcare. Many of this year’s winners are finding creative solutions to systemic healthcare problems, from dramatically lowering the costs of prescription medication, and creating greater access to mental health services for communities of color, to building an easy-to-access opioid addiction recovery program with incredible retention rates. They’re business leaders, entrepreneurs, inventors, influencers, educators, and problem solvers. Each finalist has had a major accomplishment over the last year and is using their influence to increase health and wellness access and equity. Learn more about them below.
A Holzwarth. Forbes.
These 16 innovators are changing the face of healthcare. They are behavioral scientists, product leaders, academic researchers, statisticians, physicians, clinicians, strategists, experts in diversity equity and inclusion, implementation scientists, and tech leaders. They cross boundaries and repel traditional healthcare models. They are on the front lines and behind the scenes. And what they all have in common is their passion and commitment to innovation in health.
EJ Daza, A Holzwarth. Pattern Health.
A Moment with Eric Daza is part of our interview series featuring thought leaders in research and healthcare. Each interview includes 7 short and stimulating questions.
Sneak peak at the full (free-to-read) interview:
1. Tell us something we don’t know. (Anything!)
The American English word “boondocks” (as in “the boonies”) comes from the Filipino Tagalog word “bundok” (boon-DOCK). ...
2. Which fiction book would you recommend to researchers and innovators in healthcare, and why?
I recommend reading the Foundation Trilogy (if not Series) by Isaac Asimov. I had no idea I’d go into statistics as a career! ...
"Why does 'consistency' matter in causal inference?" EJ Daza. Towards Data Science.
Consistency just says that the outcome you observe is exactly the outcome you thought you would observe. You want to be sure you’re measuring what you think you’re measuring.
There’s [a] mundane violation of consistency. ... You may have been prescribed the wrong dose of medication… [or] you accidentally took two pills a day instead of one.
This happens in an RCT, too. ... Dr. A and the other physicians intentionally committed a medication error against the study protocol. And patient B and the other participants were nonadherent to the assigned treatment.
'For Eric J. Daza, “how you sell your work matters in setting you up for success.”' EJ Daza, B Huberman. Towards Data Science.
In the Author Spotlight series, TDS Editors chat with members of our community about their career path in data science, their writing, and their sources of inspiration. Today, we’re thrilled to share
Eric J. Daza, DrPH, MPS’s conversation with Ben Huberman.
"Report your modeling strategy or statistical analysis plan before seeing any data". EJ Daza. Towards Data Science.
If your model isn’t performing well in prod on new data, untracked HARKing might be why. (tweet)
Imagine calling your shot in pool after you made it! That’s HARKing — a bad research habit. Preregistration is when you call each shot even before stepping up to the table. (tweet)
"But keep statistical evidence. How? A statistician shares a writing sample". EJ Daza. Towards Data Science.
“significant” p-value ≠ “significant” finding: Significance of evidence is not evidence of significance. (tweet)
"significant" p-value = "discernible" finding: Significance of evidence is evidence of discernibility.
EJ Daza. LAist.
Guest contributor Eric Daza writes about his journey from the Filipino friend who blended in and bit his tongue when encountering casual racism to embracing his own Brown-ness — and with that, calling out racism.
"Two common wrong phrases about statistical significance". EJ Daza. Towards Data Science.
There was a significant decrease of D in the outcome.
There was no significant association between variables X and Y.
"Significance does not imply importance — but you need it to judge quality". EJ Daza. Towards Data Science.
Ask yourself if a randomized controlled trial’s reported effect size estimate is meaningful, regardless of sample size.
Train yourself to internalize that significance does not imply importance.
Remember that sample size does not correlate with effect size.
Never just say “significant” when you really mean “statistically significant”. You will be misunderstood as saying “important”. Instead, always say or write out the whole phrase “statistically significant”.
"Causal inference tutorial in R using synthetic data (Part 2)". EJ Daza. Towards Data Science.
We would overstate our health app’s effectiveness by claiming it reduces the risk of new coronavirus infections by 16.9% — when in fact it will only reduce this risk by 3.1%.
But we can re-weight our real-world evidence results to provide more accurate risk-reduction estimates of either 2.3% or 2.2%.
"Causal inference tutorial in R using synthetic data (Part 1)". EJ Daza. Towards Data Science.
We would overstate our telemedicine app’s effectiveness by claiming it reduces the risk of new coronavirus infections by 16.9% — when in fact it will only reduce this risk by 3.1%.
EJ Daza. Towards Data Science.
This is exactly the time to temper the sprinting agility of data science with the scientifically rigorous methodology of biostatistics.
I Matias, EJ Daza, K Wac. Digital Health.
Heart rate (HR), especially at nighttime, is an important biomarker for cardiovascular health. It is known to be influenced by overall physical fitness, as well as daily life physical or psychological stressors like exercise, insufficient sleep, excess alcohol, certain foods, socialization, or air travel causing physiological arousal of the body. However, the exact mechanisms by which these stressors affect nighttime HR are unclear and may be highly idiographic (i.e. individual-specific). A single-case or “n-of-1” observational study (N1OS) is useful in exploring such suggested effects by examining each subject's exposure to both stressors and baseline conditions, thereby characterizing suggested effects specific to that individual.
Our objective was to test and generate individual-specific N1OS hypotheses of the suggested effects of daily life stressors on nighttime HR. As an N1OS, this study provides conclusions for each participant, thus not requiring a representative population.
We studied three healthy, nonathlete individuals, collecting the data for up to four years. Additionally, we evaluated model-twin randomization (MoTR), a novel Monte Carlo method facilitating the discovery of personalized interventions on stressors in daily life.
We found that physical activity can increase the nighttime heart rate amplitude, whereas there were no strong conclusions about its suggested effect on total sleep time. Self-reported states such as exercise, yoga, and stress were associated with increased (for the first two) and decreased (last one) average nighttime heart rate.
This study implemented the MoTR method evaluating the suggested effects of daily stressors on nighttime heart rate, sleep time, and physical activity in an individualized way: via the N-of-1 approach. A Python implementation of MoTR is freely available.
Aug 2022 (In Preparation)
Model-Twin Randomization (MoTR): A Monte Carlo Method for Estimating the Within-Individual Average Treatment Effect Using Wearable Sensors (Pre-Print)
EJ Daza, L Schneider. arXiv.
Temporally dense single-person "small data" have become widely available thanks to mobile apps and wearable sensors. Many caregivers and self-trackers want to use these data to help a specific person change their behavior to achieve desired health outcomes. Ideally, this involves discerning possible causes from correlations using that person's own observational time series data. In this paper, we estimate within-individual average treatment effects of physical activity on sleep duration, and vice-versa. We introduce the model twin randomization (MoTR; "motor") method for analyzing an individual's intensive longitudinal data. Formally, MoTR is an application of the g-formula (i.e., standardization, back-door adjustment) under serial interference. It estimates stable recurring effects, as is done in n-of-1 trials and single case experimental designs. We compare our approach to standard methods (with possible confounding) to show how to use causal inference to make better personalized recommendations for health behavior change, and analyze 222 days of Fitbit sleep and steps data for one of the authors.
Estimating the Burden of Influenza-like Illness on Daily Activity at the Population Scale Using Commercial Wearable Sensors
A Mezlini, A Shapiro, EJ Daza, E Caddigan, E Ramirez, T Althoff, L Foschini. JAMA Network Open.
Question: How can the true burden of influenza-like illnesses (ILIs) be estimated given that most cases of ILIs are mild and go undocumented?
Findings: This cohort study of 15 122 adults who reported ILI symptoms and had data from wearable sensors at symptom onset found an overall reduction in mobility equivalent to 15% of the active US population becoming completely immobilized for 1 day. More than 60% of this reduction occurred among persons who had sought no medical care.
Meaning: This study suggests that the burden of ILIs is much greater than had previously been understood.
Editors: J Nikles, EJ Daza, S McDonald, E Hekler, NJ Schork. Frontiers in Psychiatry, Psychology, Digital Health, Neurology, Public Health, and Sociology.
N-of-1 randomized controlled trials (RCTs) provide an opportunity to evaluate individual patient response to interventions, by randomly allocating different time periods within an individual to repeated intervention and control conditions and comparing responses. N-of-1 observational studies involve the repeated measurement of an outcome (e.g. pain) in a patient over time, but with no intervention implemented, in order to draw conclusions about naturally-occurring patterns and predictors of outcomes over time.
Both N-of-1 RCTs and observational studies can have a ‘self-study’ design, where an individual conducts the study on themselves, to answer research questions they have generated themselves. N-of-1 RCTs and observational studies provide individualized findings that can be aggregated to produce results equivalent to those found in traditional group-based RCTs and population-level epidemiological studies, respectively, but requiring fewer patients for the same power. N-of-1 RCTs and observational studies are well-suited to complement, strengthen, and generate advances in precision medicine, patient-centred healthcare, and personalised health. Since 2015, the number of N-of-1 articles has doubled annually.
Similarly, digital health is an exploding field, with over 1,000 studies registered on clinicaltrials.gov. Digital health, and digital therapeutics in particular, complement N-of-1 RCTs and observational studies by providing relevant individualized health data from, for example, worn sensors, implants, regular lab assays, or -omics sequencing. Such data can be compared to population-health databases to target a patient’s strongest possible treatment option (as in cancer-risk studies) and, in turn, inform the design of an N-of-1 RCT to evaluate it. Digital health data can also be continuously monitored during the study itself and used to help tailor a treatment to the needs and preferences of patients in real time.
This Research Topic will cover digital health applications, delivery, and analysis of N-of-1 RCTs and observational studies (including self-studies) in any health discipline. The focus is on:
mobile health (mHealth) and applications (apps)
wearable devices, sensors and implants,
real-time tracking, data analytics and online registries,
patient experience of digital health and mobile health, patients as collaborators in personalised medicine, self-tracking in citizen science, etc.
The articles can be original research, methodology pieces, opinion pieces, reviews, systematic reviews, protocols, short reports, or case studies.
Effects of sleep deprivation on blood glucose, food cravings, and affect in a non-diabetic: An n-of-1 randomized pilot study
EJ Daza, K Wac, M Oppezzo. Healthcare.
Sleep deprivation is a prevalent and rising health concern, one with known effects on blood glucose (BG) levels, mood, and calorie consumption. However, the mechanisms by which sleep deprivation affects calorie consumption (e.g., measured via self-reported types craved food) are unclear, and may be highly idiographic (i.e., individual specific). Single-case or “n-of-1” randomized trials (N1RT) are useful in exploring such effects by exposing each subject to both sleep deprivation and baseline conditions, thereby characterizing effects specific to that individual. We had two objectives: (1) To test and generate individual-specific N1RT hypotheses of the effects of sleep deprivation on next-day BG level, mood, and food cravings in two non-diabetic individuals; (2) To refine and guide a future n-of-1 study design for testing and generating such idiographic hypotheses for personalized management of sleep behavior in particular, and for chronic health conditions more broadly. We initially did not find evidence for an idiographic effect of sleep deprivation, but better-refined post hoc findings indicate that sleep deprivation may have increased BG fluctuations, cravings, and negative emotions. We also introduce an application of mixed-effects models and pancit plots to assess idiographic effects over time.
Jan 2019 (In Preparation)
Person as population: A longitudinal view of single-subject causal inference for analyzing self-tracked health data (Pre-Print)
EJ Daza. arXiv.
Single-subject health data are becoming increasingly available thanks to advances in self-tracking technology (e.g., mobile devices, apps, sensors, implants). Many users and health caregivers seek to use such observational time series data to recommend changing health practices in order to achieve desired health outcomes. However, there are few available causal inference approaches that are flexible enough to analyze such idiographic data. We develop a recently introduced framework, and implement a flexible random-forests g-formula approach to estimating a recurring individualized effect called the "average period treatment effect". In the process, we argue that our approach essentially resembles that of a longitudinal study by partitioning a single time series into periods taking on binary treatment levels. We analyze six years of the author's own self-tracked physical activity and weight data to demonstrate our approach, and compare the results of our analysis to one that does not properly account for confounding.
EJ Daza. Methods of Information in Medicine.
I'm very proud of this piece. It's clunky, lumbering, and overwrought. Still, I hope I did the source material justice in my first true (impostor-syndromic) attempt at telling an honest story of a single person's health habits through the language of doubt.
Conclusions. Causal analysis of an individual's time series data can be facilitated by an n-of-1 randomized trial counterfactual framework. However, for inference to be valid, the veracity of certain key assumptions must be assessed critically, and the hypothesized causal models must be interpretable and meaningful.
Thyroid cancer mortality is higher in Filipinos in the United States: An analysis using National Mortality Records from 2003 through 2012
ML Nguyen, J Hu, K Hastings, E Daza, M Cullen, L Orloff, L Palaniappan. Cancer.
Conclusions. Negative prognostic factors for thyroid cancer traditionally include age >45 years and male sex. The results of the current study demonstrate that Filipinos die of thyroid cancer at higher rates than NFA and NHW individuals of similar ages. Highly educated Filipinos and Filipino women may be especially at risk of poor thyroid cancer outcomes. Filipino ethnicity should be factored into clinical decision making in the management of patients with thyroid cancer.
Estimating inverse-probability weights for longitudinal data with dropout or truncation: The xtrccipw command
EJ Daza, MG Hudgens, AH Herring. The Stata Journal.
Individuals may drop out of a longitudinal study, rendering their outcomes unobserved but still well defined. However, they may also undergo truncation (for example, death), beyond which their outcomes are no longer meaningful. Kurland and Heagerty (2005, Biostatistics 6: 241–258) developed a method to conduct regression conditioning on nontruncation, that is, regression conditioning on continuation (RCC), for longitudinal outcomes that are monotonically missing at random (for example, because of dropout). This method first estimates the probability of dropout among continuing individuals to construct inverse-probability weights (IPWs), then fits generalized estimating equations (GEE) with these IPWs. In this article, we present the xtrccipw command, which can both estimate the IPWs required by RCC and then use these IPWs in a GEE estimator by calling the glm command from within xtrccipw. In the absence of truncation, the xtrccipwcommand can also be used to run a weighted GEE analysis. We demonstrate the xtrccipw command by analyzing an example dataset and the original Kurland and Heagerty (2005) data. We also use xtrccipw to illustrate some empirical properties of RCC through a simulation study.
AP Keil, EJ Daza, SM Engel, JP Buckley, JK Edwards. Statistical Methods in Medical Research.
Epidemiologists often wish to estimate quantities that are easy to communicate and correspond to the results of realistic public health interventions. Methods from causal inference can answer these questions. We adopt the language of potential outcomes under Rubin’s original Bayesian framework and show that the parametric g-formula is easily amenable to a Bayesian approach. We show that the frequentist properties of the Bayesian g-formula suggest it improves the accuracy of estimates of causal effects in small samples or when data are sparse. We demonstrate an approach to estimate the effect of environmental tobacco smoke on body mass index among children aged 4–9 years who were enrolled in a longitudinal birth cohort in New York, USA. We provide an algorithm and supply SAS and Stan code that can be adopted to implement this computational approach more generally.
CG Brown-Johnson, A Burbank, EJ Daza, A Wassmann, A Chieng, GW Rutledge, JJ Prochaska. American Journal of Preventive Medicine.
Conclusions. Examination of online patient–provider communications provides insight into consumer health experience with emerging alternative tobacco products. Patient concerns largely related to harms and safety, and patients preferred provider responses positively inclined toward e-cigarettes. Lacking conclusive evidence of e-cigarette safety or efficacy, healthcare providers encouraged smoking cessation and recommended first-line cessation treatment approaches.
Likelihood of unemployed smokers vs nonsmokers attaining reemployment in a one-year observational study
JJ Prochaska, AK Michalek, C Brown-Johnson, EJ Daza, M Baiocchi, N Anzai, A Rogers, M Grigg, A Chieng. JAMA Internal Medicine.
Conclusions and Relevance. To our knowledge, this is the first study to prospectively track reemployment success by smoking status. Smokers had a lower likelihood of reemployment at 1 year and were paid significantly less than nonsmokers when reemployed. Treatment of tobacco use in unemployment service settings is worth testing for increasing reemployment success and financial well-being.