Understanding the structure and frequencies of haplotypes is important for associating genetic polymorphisms with a given trait and for inferring the genetic genealogy of alleles in a population. Single nucleotide polymorphism (SNP) haplotypes can be determined without ambiguity when an individual does not have more than one heterozygous site in a given genomic region. Using genome-wide SNP genotypes for 3397 individuals from the Japanese population, we detected SNP homozygotes in the genomic regions of 1955 genes, determined haplotypes, and examined the efficiency of haplotype frequency estimation based on the proportion of SNP homozygotes in the sample. The estimated haplotype frequencies were very similar to the frequencies obtained by two statistical methods, PHASE and SNPHAP. We applied this approach to the genomic regions of 11 351 genes, and the results suggested that the sum of the frequencies of unobserved haplotypes is negligible for an analysis of a 100 kb genomic region with 20 SNPs. Determination of haplotypes from homozygotes using genotype data from thousands of individuals, without a long computation time, appears to be useful for detecting real haplotypes including some low-frequency haplotypes. In addition, the unambiguously determined haplotypes with their estimated frequencies can be used as a catalog of haplotypes for the population, which is useful for the design of genome-wide association studies.
- haplotype frequency
- linkage disequilibrium
- single-nucleotide polymorphisms