SNP rs4247888 maps to two positions, but the dbSNP XML files show this SNP in ds_ch4.xml and not in ds_chMulti.xml. What am I missing?
If a SNP hits once or twice on the same chromosome, it is assigned to a chromosome file; if it hits more than twice or hits on different chromosomes, it goes to ds_chMulti.xml. This was designed to take account of possible fragment redundancy in unfinished parts of the genomic sequence. There is one notable exception, however. When SNPs hit in the pseudo-autosomal region on Y, they are recorded on both the X and Y chromosome files. Also, we track hits to both the reference genome as well as the alternate assemblies and haplotypes. When a SNP hits on several alternate scaffolds, we record it several times but consider only the number of distinct loci when deciding whether to assign it to chMulti.
Related Questions
- I’m using the data exchange format of dbSNP’s XML files and noticed that there are two allele values ("N" and "+") that I haven’t seen before. What do these values represent?
- I have just downloaded the b125 XML files from the dbSNP ftp site and can’t find the population/frequency information. Why did you take it out?
- In the dbSNP XML files, where do I find the number of SNPs in coding vs. non-coding regions?