It can be hard to navigate the different guidelines for labeling. There isn’t one “right answer” for how to label demographic variables. It’s important for researchers to think through what is right for their specific project.
This guide provides reflection questions that may help with this process.
Remember: demographic labeling is a choice with real consequences.
Biorepository research can affect medical care, drug development, public policy, and how resources are shared. Researchers should work to label and sort groups with care and attention.
Do you need to use social or demographic labels at all?
The National Academies of Sciences, Engineering, and Medicine (NASEM) suggests that researchers should “directly evaluate the environmental factors or exposures” that are relevant to the study. They should do this instead of relying on population descriptors as proxies, or stand-ins.(1)
What labeling schema was used in the original dataset(s)?
Could this schema be missing accuracy or detail?
For example, maybe the dataset lists people only as “male” or “female.” Think about how this schema could create stigma or make research less useful for people whose gender does not fit those labels.
Can you update the schema to use community-informed approaches?
Can you do this without affecting the integrity of the dataset?
Researchers reusing repository data may feel limited by the original labeling schema. But there may be methodologically sound ways to improve it.
What labeling schema is the most relevant to your research question?
For example, there may or may not be reasons to subdivide the demographic label “women” into “cisgender” and “transgender”. Your choice will depend on your specific research question. If things like hormone use, age at puberty, or certain social factors matter to your study, you might need to subdivide this demographic label.
How will you acknowledge your limitations?
If you plan to publish your study, think about how you can clearly and openly explain the limitations of your labeling schema.