Quick Links
Figure 1: Number of AFC patients by county
Figure 2: Comparison of factor weights for 6 social deprivation measures
About the Data
Data Overview
The American Family Cohort (AFC) data are derived from the American Board of Family Medicine (ABFM) PRIME Registry. The PRIME Registry is a Qualified Clinical Data Registry (QCDR) for primary care and provides tools to evaluate primary care practice performance, support population health and risk stratification, improve primary care practice as well as patient outcomes, and alleviate Centers for Medicare and Medicaid Services (CMS) reporting burden for their payment programs. As of January 1, 2023, the PRIME Registry, certified by CMS in 2016, represents over 3,000 active clinicians representing 50 states from data on over eight million patients, dating back to 2010. The PRIME Registry is the largest clinical registry for primary care in the nation.
PRIME includes both structured and unstructured data, typical of disparate EHRs. Data elements include patient demographics, diagnoses and interventions for the patients such as medications and therapies, encounter-specific data, patient-reported outcomes (PROs), and some limited clinician-specific details. All data are collected during routine assessment of clinical care of patients whose main goals are to support practice-specific quality improvement activities as well as CMS-specific quality reporting for payment. The PRIME registry includes National Qualify Forum (NQF)-endorsed measures and a patient-reported outcome (PRO) measure tool that aids in tracking practice performance.
The American Family Cohort or AFC is a research dataset derived from the PRIME Registry. There are several versions curated for different research use cases.
Detailed sociodemographic data
Racially and ethnically diverse data. As of May 1, 2023, AFC includes over 500,000 Black patients, 150,000 Asian patients, 50,000 Native American and Alaska Native patients and 16,000 Native Hawaiian and Pacific Islander patients. The remaining 4.8 million patients are White, and over 750,000 patients have identified as Hispanic or Latino. The diversity is a major strength for addressing research questions which focus on underserved and marginalized populations. AFC also includes disaggregated racial and ethnic data.
AFC contains data on nearly 1 million children, including a large number from minority populations from practices in rural areas of the US, allowing for analyses across the life course, for various types of family units and across generations in diverse groups.
Granular time and geography variables: A particular strength of AFC is the inclusion of granular time and geographic variables which give researchers the ability to explore questions about social, economic, environmental and structural determinants of health in a rigorous way.
Ancillary Datasets
The AFC Consortium have prepared five social deprivation indexes and made them available with accompanying crosswalk files for overlay by geographic unit (census tract, zip code or PUMA) with health or other outcome data.
Although there is considerable overlap between the factors used in these measures, there are important differences both in which elements are used and how they are weighted. The AFC consortium can help you select the right index for your project.
Data Elements
The AFC data include key patient and primary care attributes of interest. These can be divided into five major categories: (1) health-need attributes (2) race/ethnicity (3) social deprivation indices (4) regional and county-level resource indicators and (5) clinical presentation and treatment.
Health-need Attributes: Patient attributes include patient’s sex at birth, patient age, diagnoses of chronic conditions via encounter-specific diagnosis codes and visit history. Diagnoses of chronic conditions are assigned using the International Classification of Diseases, 10th Revision diagnosis codes.
American Family Cohort Versions
PHS models our data naming conventions and standards after those used by the Centers for Medicare and Medicaid Services and the Observational Medical Outcomes Partnership (OMOP). The OMOP Common Data Model (CDM) is an open community data standard, designed to standardize the structure and content of observational data and to enable efficient analyses that can produce reliable evidence. We encourage the use of OMOP versions of AFC to aid in research reproducibility.
Strengths of the American Family Cohort Data
The AFC data are ideal for projects in primary care that require a national focus, particularly around policy, chronic disease or other common conditions. The granular geography and work conducted by the AFC Consortium in preparing data for research on social and environmental exposures make these data ideal for studying health impacts of climate change, air quality and other environmental exposures.
Sociodemographic data, including race and ethnicity allow researchers to conduct research in health equity, access to care and other similar topics.
Versions of AFC
As noted above, our data categorizations are modeled on the Centers for Medicare and Medicaid Services categories and requirements for use. The de-identified AFC data in OMOP are sufficient for most projects. If you require additional variables, we will work with you to ensure you are able to complete your project.
AFC OMOP DID: These data have been transformed into the OMOP common data model and de-identified. They may be accessed by any researcher who has completed all requisite requirements including approvals, security and regulatory certifications. These data are sufficient for most research projects.
AFC OMOP LDS: These data have data have additional variables not available in the DID data such as more granular date and geography variables. These data are suitable for studies which require overlay of social, environmental or similar exposures.
AFC OMOP RIF: These data have had person readable identifiers removed but retain other information such granular dates, geography and detailed health data. These data are highly restricted and require significant additional regulatory and security steps to access. Access to select variables may be granted as necessary.
AFC LDS and RIF including de-identified notes: For research which requires data that have not been transformed to OMOP, there are versions of the data that map to these categories. Please schedule an appointment with the AFC Consortium to discuss your research question: phsofficehours.stanford.edu
Getting access to the AFC Data
You can find detailed information on how to get access to the AFC data here: