In the dbSNP XML files, where do I find the number of SNPs in coding vs. non-coding regions?
The gene information is encoded in the XML for those genes where we are able to map SNP(s) in your organism of interest. docsum_2005 to determine whether the SNP is coding. You can get the coding counts by grouping the coding-synon | coding-nonsynon | reference | together. You can get the non-coding counts by grouping the mrna-utr | intron |splice-site | locus-region | together. You can also obtain these counts for all genes in the human genome using Entrez SNP. Select the Preview/Index function and then set the Function Class limits at the top of the form, under the Limits link.
Related Questions
- I’m using the data exchange format of dbSNP’s XML files and noticed that there are two allele values ("N" and "+") that I haven’t seen before. What do these values represent?
- SNP rs4247888 maps to two positions, but the dbSNP XML files show this SNP in ds_ch4.xml and not in ds_chMulti.xml. What am I missing?
- Why Have So Many Tandem Repeats Evolved in Both Coding and Non-coding Regions?