Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

I’m using the data exchange format of dbSNP’s XML files and noticed that there are two allele values (“N” and “+”) that I haven’t seen before. What do these values represent?

April 26, 2017Allele Data dbsnp exchange files format I’m noticed values XML

0

Posted

I’m using the data exchange format of dbSNP’s XML files and noticed that there are two allele values (“N” and “+”) that I haven’t seen before. What do these values represent?

1 Answer

0

Posted

“N” has two meanings, depending on the context in which it is used. The first context in which “N” is used is allele frequency. In this context, “N” means “indeterminate frequency”. For example, if a submitter has a sample size of 120 chromosomes, they may submit A=40/C=78/N=2. The second context in which “N” is used is in SNP FASTA sequence. Here, a variation is represented by the IUPAC letters of A, C, M, G, R, S, V, T, W, Y, H, K, D, B, and N. If the variation is not represented by one of the first 14 letters, then it is considered an “N”. For example: All indels, microsatellites, and named variations are expressed as “N” in SNP FASTA sequences. Some submitters in the past have used “+” to represent the insertion part of an indel SNP. You could get the real inserted sequence (relative to the deletion) from the variation from the SNP assay. We realize that allowing “+” may confuse users, so we are in the process of substituting the real “insertion” sequence for the “+” currently avai