What are the evidences for defining a CDS as real protein?
Most protein sequences are derived from translations of CoDing Sequence (CDS) derived from gene predictions. A CoDing Sequence (CDS) is a region of DNA or RNA whose sequence determines the sequence of amino acids in a protein. It should not be mixed up with an Open Reading Frame (ORF), which is a series of DNA codons that does not contain any STOP codons. All CDS are ORFs, but not all ORFs are CDS… Some of the predicted CDSs exhibit strong sequence similarity to known proteins in closely related species. For other proteins there is experimental evidence, such as Edman sequencing, clear identification by mass spectrometry (MSI), X-ray or NMR structure, detection by antibodies, etc. However, for some other proteins, there are no evidence at all. To indicate these different levels of evidence for the existence of a protein, we have introduced the PE (Protein Existence) line (see the protein existence criteria). Note that the PE line does not describe the accuracy or correctness of a seq