How is text search implemented?
The system tries to match query string to a unique GeneHub gene index. After a best match is found, the GeneHub gene index is used to retrieve pre-computed GEPIS result. The gene attributes and synonyms are stored in two tables: GENE and GENE_SYNONYMS, respectively. The cross-references between GeneHub gene indexes and database records are saved in DBXREF table. DBXREF and GENE_SYNONYMS tables are consulted in turn to find an exact match to the given query string. A begin-search is automatically performed if there is no exact match at first round. There are limitations in MySQL text search: • It doesn’t support function index. • Hyphenated words are treated as two words in MySQL. • MySQL comes with a default stop word list and the number in the query is ignored by default. To overcome the limitations and make text search case-insensitive and consistent (e.g. IL-8, il 8 and IL8, should all return same result), we added additional columns, SEARCH_TEXT and XREF_ID_SEARCH in the GENE_SYNON