The intern will construct and quality-control longitudinal EHR analytic files and conduct descriptive analyses of patient demographics and clinical characteristics. They will also produce documented code, draft tables and figures, and contribute to written deliverables for the CKD study.
About the Project
The CKD study uses 5 years of structured EHR data from a large private nephrology practice with over 50,000 patients. The study aims to:
- Build standardized, analysis-ready analytic files (SAFs) spanning 2021–2025
- Assess feasibility of longitudinal data elements (labs, prescriptions, disease history)
- Characterize CD patients using contemporary clinical and treatment data
- Evaluate the availability of specific variables (imaging, genetics, family history) in unstructured clinical records
The intern will be embedded in an active project team that includes biostatisticians, epidemiologists, data scientists, and clinical nephrologists, and will contribute to analytic work from day one.
Key Responsibilities
- Contribute to construction and QC of longitudinal electronic health record (EHR) analytic files using structured data
- Conduct descriptive analyses of patient demographics, lab values, medication use, and clinical characteristics
- Summarize data availability, follow-up patterns, and measurement frequency across CKD subgroups
- Support feasibility assessments by generating counts, proportions, and distributional summaries
- Produce clean, well-documented analytic code and contribute to draft tables and figures
- Participate in biweekly internal team meetings and client meetings, and contribute to written deliverables
Qualifications
Required:
- Currently enrolled in a graduate program (MPH, MS, PhD, or equivalent) in biostatistics, epidemiology, data science, health informatics, or a related field
- Proficiency in Python, R, or SAS for data manipulation and descriptive analysis
- Comfort working with big data – large, messy, real-world datasets
- Strong attention to detail and ability to write clean, reproducible, well-commented code
- Ability to work independently with remote supervision
- Comfort using AI-assisted coding tools (e.g., Claude, GitHub Copilot)
Preferred:
- Familiarity with EHR data or claims-based data
- Experience with longitudinal data structures (e.g., repeated lab measurements, time-to-event)
- Experience with version control (Git)
Position Details
- Duration: approximately July 1 – August 29, 2026 (flexible start; contingent on contract execution)
- Hours: full-time (~40 hrs/week) or near full-time
- Location: fully remote; no travel required
- Compensation: paid internship (rate commensurate with experience)
- Supervisor: Brian Bieber, MS, Research Scientist, Data Science
How to Apply
Submit a CV and a brief cover letter (1 page max) describing your relevant experience and availability. Applications will be reviewed on a rolling basis — early submission is strongly encouraged given the July start date.
Pay
$27 USD per hour