I’m using the data exchange format of dbSNP’s XML files and noticed that there are two allele values (“N” and “+”) that I haven’t seen before. What do these values represent?
“N” has two meanings, depending on the context in which it is used. The first context in which “N” is used is allele frequency. In this context, “N” means “indeterminate frequency”. For example, if a submitter has a sample size of 120 chromosomes, they may submit A=40/C=78/N=2. The second context in which “N” is used is in SNP FASTA sequence. Here, a variation is represented by the IUPAC letters of A, C, M, G, R, S, V, T, W, Y, H, K, D, B, and N. If the variation is not represented by one of the first 14 letters, then it is considered an “N”. For example: All indels, microsatellites, and named variations are expressed as “N” in SNP FASTA sequences. Some submitters in the past have used “+” to represent the insertion part of an indel SNP. You could get the real inserted sequence (relative to the deletion) from the variation from the SNP assay. We realize that allowing “+” may confuse users, so we are in the process of substituting the real “insertion” sequence for the “+” currently avai
Related Questions
- Why are there multiple versions of the ChemDraw CDX files within the Patent Application Data/XML v4.2 ICE and the Patent Grant Data/XML v4.2 ICE?
- format and eXtensible Markup Language (XML) format, is there any problem with maintaining subject data files in both formats?
- Who standardizes the B2B DTDs or XML schemas so that businesses can exchange data in a standard format?