TY - JOUR
T1 - Establishment of a standardized system to perform population structure analyses with limited sample size or with different sets of SNP genotypes
AU - Kumasaka, Natsuhiko
AU - Yamaguchi-Kabata, Yumi
AU - Takahashi, Atsushi
AU - Kubo, Michiaki
AU - Nakamura, Yusuke
AU - Kamatani, Naoyuki
PY - 2010/8
Y1 - 2010/8
N2 - Recent studies have demonstrated that principal component analysis (PCA) can detect the presence of population mixture and admixture in a sample and thus can be used to correct population stratification in genome-wide association studies (GWAS). We propose a complementary approach to PCA that compensates for potential weaknesses associated with PCA, so that one can perform population structure analyses using limited numbers of subjects and single-nucleotide polymorphisms (SNPs). Our method first requires a PCA of the largest reference sample from a population to standardize the system. Once the system is established, it can perform PCA for each individual with a much smaller number of SNPs drawn from the same population. This is because of the introduction of the probabilistic PCA, so that the prediction of the principal components (PCs) is performed under a rigorous probabilistic framework. The subsequent linear discriminant analysis also helps to understand from which ancestries or subpopulations a given individual is more likely to derive, in terms of posterior probabilities given the predicted PCs. A real-world prototype of the system for the Japanese population is developed based on 19 260 subjects, which illustrates the potential usefulness of the system as an aid in the detection of population structures in validation samples, or to help with the correction of population stratification in GWAS.
AB - Recent studies have demonstrated that principal component analysis (PCA) can detect the presence of population mixture and admixture in a sample and thus can be used to correct population stratification in genome-wide association studies (GWAS). We propose a complementary approach to PCA that compensates for potential weaknesses associated with PCA, so that one can perform population structure analyses using limited numbers of subjects and single-nucleotide polymorphisms (SNPs). Our method first requires a PCA of the largest reference sample from a population to standardize the system. Once the system is established, it can perform PCA for each individual with a much smaller number of SNPs drawn from the same population. This is because of the introduction of the probabilistic PCA, so that the prediction of the principal components (PCs) is performed under a rigorous probabilistic framework. The subsequent linear discriminant analysis also helps to understand from which ancestries or subpopulations a given individual is more likely to derive, in terms of posterior probabilities given the predicted PCs. A real-world prototype of the system for the Japanese population is developed based on 19 260 subjects, which illustrates the potential usefulness of the system as an aid in the detection of population structures in validation samples, or to help with the correction of population stratification in GWAS.
KW - genome-wide SNP genotypes
KW - linear discriminant analysis
KW - population structure
KW - prediction
KW - probabilistic PCA
UR - http://www.scopus.com/inward/record.url?scp=77957563048&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77957563048&partnerID=8YFLogxK
U2 - 10.1038/jhg.2010.63
DO - 10.1038/jhg.2010.63
M3 - Article
C2 - 20555335
AN - SCOPUS:77957563048
SN - 1434-5161
VL - 55
SP - 525
EP - 533
JO - Journal of Human Genetics
JF - Journal of Human Genetics
IS - 8
ER -