How does Roche NimbleGen address repetitive elements in the genome for ChIP-chip designs?
When available, we utilize conventional repeat masking, as done by the RepeatMasker program http://www.repeatmasker.org/). However, NimbleGen has no access to the repeat libraries necessary to perform this application, so we rely on third parties to supply this type of masked sequence. However, we find that RepeatMasker is often overly aggressive and can mask 50%-55% of human DNA sequence. We have developed our own method of repeat masking which is dependent on the mean frequency of the 15mers which make up each 50mer oligo. A table is made of the count of all 15mers that appear in the genome, from both strands. Then a 15mer window is slid along each oligo, looking up the count of each 15mer in the table, and calculating the average count. A threshold is set, usually 100 for large eukaryotic genomes, and any probe that exceeds that threshold is eliminated from further consideration. Depending on the region of the genome being evaluated, approximately 20-25% of the DNA is excluded. A si