Population genetics
Insights into the role of genetics in disease mechanisms initally relies on knowledge on the influence of genetics in healthy or undiagnosed samples, or, the population as a whole. Population genetics attempts to unravel the genetic composition of populations1. Knowledge on the genetic variability in a population is essential in understanding the effect of genetic alterations on pathology2. Evidently, substantially large samples are necessary to adequately model the full breadth of genetic variance in a population. In addition, considerable resources are required to perform phenotyping of these samples in a number of different domains that will allow researchers to study the trait of interest.
One of the main challenges in population genetics is how to define a “population”. Should one study only the population that comprises your target sample (e.g. SCZ patients in Norway), or should one constrain itself to cultural boundaries, geopolitical borders, cross-ethnic boundaries, etc.3? In the absence of infinite resources these choices are mostly made for a variety of economic, political, and practical reasons. For instance, the government of a European country might be more willing to allocate resources into geno- and phenotyping of people living within their own borders than it would for the people living within the borders of a country on another continent. Inequality in the availability of population samples then follows the same patterns highlighted in Psychiatry and mental health. As a consequence, the largest population samples are still concentrated in the Global North. This means that in the first decades of GWAS4, and even today5,6, nearly 90% of studies focused on samples with White European ancestry with more than 70% of studies using participants recruited in either the United States, United Kingdom, or Iceland4. This means that, as the field stands now, when researchers are discussing population genetics they are most often discussing the genetics of a population with predominantly White European ancestry7,8.
The implications for these unbalanced samples are that findings are not generalizable to the population as a whole. However, obtaining balanced samples that cover the diversity of a population demands systemic allocation of resources and the imbalance will take many years to overcome. In the meanwhile, studies using samples comprising predominantly White Europeans need to acknowledge that their findings cannot automatically be translated to other populations. This being said, until resources are more equally allocated, the important work of studying human genetics relies on the largest and best-phenotyped resources currently available, such as the UKB9,10, HUNT11, and the FinnGen study12, although phenotypes in these resources can also deviate from the general population13.
Resources such as the UKB aim to capture the full scope of a population by pseudo-random selection of participants. It attempts to attenuate inclusion bias by using records from the British National Health Service and by operating assessment centers across the country with good access for public transport and flexible operating hours14. Since its conception in 2007, the UK Biobank has provided invaluable insight in disease progression on a number of domains15,16, including cardiology17, infectious diseases18, and mental health19,20. The UKB inclusion focused on individuals aged 40 to 69 in order to increase the likelihood that any potential diagnosis in a participant is recognized and registered, but also to ensure that negative health outcomes do not affect the inclusion, e.g. through survival bias10. Nonetheless, the UKB sample is generally healthier than the general population on a number of items, including weight and alcohol and tobacco intake, and had generally lower mortality at age 70 to 7413. The UKB collects a wide variety of data types including genetics, questionnaires, blood markers, environmental factors, and neuroimaging scans using structural and functional sequences. Since its inception, the UKB has become one of the primary resources for genetic studies on a wide variety of phenotypes. A vast majority of applications for use of the UKB data request the genetic data, and mental health is listed among the top interest areas of applications of UKB data21.