How does NimbleGen address repetitive elements in the genome for DNA methylation designs?
We have developed our own method of repeat masking which is dependent on the mean frequency of the 15-mers which make up each oligo. A table is made of the count of all 15-mers that appear in the genome, from both strands. Then a 15-mer window is slid along each oligo, looking up the count of each 15-mer in the table, and calculating the average count. A threshold is set, usually 100 for large eukaryotic genomes, and any probe that exceeds that threshold is eliminated from further consideration. Depending on the region of the genome being evaluated, approximately 20-25% of the DNA is excluded. For some designs we use conventional repeat masking, as done by the RepeatMasker program http://www.repeatmasker.org/). However, NimbleGen has no access to the repeat libraries necessary to use this application, so we rely on third parties to supply this type of masked sequence. We find, however, that RepeatMasker is often overly aggressive and can mask 50-55% of human DNA sequence. See the follo