How has GenBank learned from its mistakes?
Currently the database is full of evidence of the perils of over-confident gene characterization. It makes us work a bit when we come across a sequence containing the “complete cds” of a gene and then another one with an even longer coding region. Do these sequences refer to different splice variants, or is the shorter one just wrong? Even more curious are sequences that cannot be aligned to the reference genome. It is possible that both are correct, and everything will become clear as we learn more about genome variation, but what should GenBank do for now? Officially submitters themselves should make any needed revisions or updates, but what if they don’t? This is an issue for all publicly submitted data, so perhaps GenBank is doing about as well as its cousins. Except for the early days when GenBank curators did the sequence-gathering, it’s a member of the community of researchers, not GenBank, who contributed each sequence in the first place. If GenBank’s purpose is to include all