Why isn there a 1-1 relationship between RM repeats and Repbase repeats?
Currently RepeatMasker’s library and Repbase are not equivalent. RepeatMasker’s library contains a further level of curation where sequences are optimized for searching with RepeatMasker. This optimization improves both the search time, selectivity, sensitivity and clarity of annotation when used with RepeatMasker. LINE fragmentation is one example of this divergence. Since LINE copies tend to be 5′ truncated, full-length models of the detailed subfamily structure apparent in the well-represented and fast-evolving 3′ end are difficult to obtain. Rather than comparing the query to a large number of full-length (6-8 kb) consensus sequences that are identical except for the very 3′ end, we often fragment LINE models into domains (e.g. 5′ end, ORF2-region, and 3′ end) which are transparently merged in the RepeatMasker annotation as if the matching was done to a full-length consensus. Our nomenclature and fragmentation thus create a many-to-one relationship with Repbase full-length entries.