The human genome is estimated to contain one single nucleotide polymorphism (SNP) every 300 base pairs. The presence of LD between SNP markers can be used to save genotyping cost via appropriate SNP tagging strategies, whereas absence or low level of LD between markers generally increase genotyping cost. It is quite common that a large proportion of tagging SNPs in a tagging scheme often turn out to be singleton SNPs, that is, SNPs that only tag themselves rather than contribute power to the rest of a region. If genotyping cost is a major concern, which often is the case at the present time for genome-wide association studies, these singleton tagging SNPs would be the primary targets to be removed from genotyping. It is important, however, to understand the characteristics of such SNPs and estimate the impact of removing them in a study. Using the HapMap genotype data and genome wide expression data, we assessed the distribution and functional implications of singleton SNPs in the human genome. Our results demonstrated that SNPs of potentially higher functional importance (eg, nonsynonymous SNPs, SNPs in splicing sites and SNPs in 5′ and 3′ UTR) are associated with a higher tendency to be singleton SNPs than SNPs in intronic and intergenic regions. We further assessed whether singleton SNPs can be tagged using haplotypes of tagSNPs in the three genome wide chips, that is, GeneChip 500k of Affymetrix, HumanHap300 and HumanHap550 of Illumina, and discussed the general implications on genetic association studies.
|Number of pages
|European Journal of Human Genetics
|Published - Apr 2008