What are the evidences for defining a CDS as ot a real protein?
Gene prediction’s performance largely depends on the current biological knowledge. We use bioinformatics tools to align the proposed CDS with the latest version of nucleic acid sequences (genomic and RNA/ESTs). We sometimes recognized protein sequences to be ORFs or CDSs that have been wrongly predicted to code for proteins mainly because of the presence of new longer or shorter RNAs (split or fused predicted gene), absence of RNA (even in other species) and/or wrong intron/exon boundaries (in Eukaryota). Some other protein sequences may have been identified as pseudogenes in the literature. When there is enough evidence that these CDSs are not real proteins, we take the decision to remove them from UniProtKB.