Because dbSNP contains Dr. Jim Mullikin’s double-hit SNPs, can you tell me what method he uses to determine these double-hits?
In an email, Dr. Mullikin described his double-hit method (reprinted with permission): First, I align the following sequences to the human reference sequence: all human traces from the trace archive; all clone sequences not used in the reference sequence; cDNA sequence; the Celera WGSA assembly; and Celera reads from non-donor B individuals. For any rsIDs, I look at the alignment, and count how many times I see each allele. If I see each allele two or more times in different DNAs, I classify it as a double-hit SNP. I also use chimp to promote an allele from a count of one to two. For example, let’s say for an A/G SNP, A is seen in human DNA seven times, and G once. Then, if chimp is a G, it becomes a double-hit SNP. Also, if the chimp sequence is polymorphic, or does not agree with either human allele, the chimp allele(s) is not used. For human DNA, if the sequence comes from a single individual, I do not allow that individual to contribute to the allele counts more than once per allel