It is commonly known that most genomic databases are biased toward people with European ancestry. Scientists have warned that leaving out other populations could skew results in areas such as drug development, diagnostic testing, and polygenic risk scores -- which looks at many genetic variations in a person's DNA to predict their disease risk.
Now, researchers at the Institute for Genome Sciences (IGS) at the University of Maryland School of Medicine (UMSOM) have developed a broad and deep genomic database of Latin Americans which gathers genome-wide data of Latin American populations into a single source and allows other scientists to easily add that population to their own research studies without straining budgets. Cell Genomics published their work on Oct. 31.
The researchers define Latin Americans as those with heritage from Spanish or Portuguese speaking countries in the Americas. In the United States, Latin Americans make up 18% of the population and is the fastest growing group but remains understudied in biomedical research, leading to health disparities in the population. Worldwide the Latin American population is 656 million or about 8.5% of the world's population.
The Genetics of Latin American Diversity Database (GLADdb) includes genome-wide information from almost 54,000 Latin Americans representing 46 geographical regions. The new resource explores patterns of Latin American population structure and allows other scientists to match genes to their external samples through GLAD-match, a new web tool the researchers developed. GLAD-match provides a strong foundation for the human genetics community as it moves from categorizing people broadly to viewing ancestry as a continuum.
"Treating Latin Americans as one homogenous group over-simplifies their genetic diversity and hinders efforts to improve population health and clinical treatment for many diseases," said Timothy O'Connor, PhD, the lead author of the paper, scientist at IGS, and Associate Professor of Medicine at UMSOM. "By promoting genomic research in Latin Americans, we are contributing to the promise of precision medicine for more people."
The team built GLADdb and GLAD-match by pulling Latin Americans' data from whole genome sequencing projects across the Americas, as well as from dbGaP -- the database of Genotypes and Phenotypes -- developed by the National Human Genome Research Institute (NHGRI) to archive and distribute data and results from genotype-phenotype studies in humans. Most genomic studies funded by NHGRI are required to submit their data to dbGaP upon publication of the results.
"Since we define 'heritage' as including culture, geography, and genetics, one of the most interesting parts of this research is that we were able to explore the distant genetic relatedness among Latin American countries through population structure and migration patterns," said Victor Borda, PhD, corresponding author on the paper and an IGS Research Associate. "For example, in the early 1900s social and economic problems caused a large migration out of Puerto Rico. We found that those who migrated to Hawaii were recognized as countryside people who traditionally farmed, while those who left Puerto Rico for New York represented a cross-section of economic and social classes."
Since Latin Americans represent only about 0.38% of participants in genome-wide association studies (GWAS), the researchers say that GLADdb should contribute significantly to the understanding of Latin American population genetics and genetic epidemiology when comparing external samples to those within the database.
"Our lack of knowledge of Latin American genetic diversity and environmental factors influencing the health has limited medicine's understanding of complex traits in this population," Dr. O'Connor added. "Our hope is that other scientists use GLADdb and GLAD-match in their research to help deepen our understanding and lessen disparities in the Latin American population during the evolving age of precision medicine."