How are the substructure and similarity search performed?
The core substructure search functionality (graph isomorphism) is provided either by the Thje CDK cheminformatics library http://cdk.sourceforge.net or by a faster algorithm, developed for AMBIT. Substructure search is an NP-hard problem, which means that the complexity of the algorithm increases rapidly with the size of the molecule. To speed-up substructure searching in large datasets, one usually uses precalculated fingerprints to identify structures, potentially containing the substructure. The AMBIT database and software combines this technique with fast relational database queries, which results in very fast substructure searching in huge datasets. In addition, fingerprints are a standard tool for assessing similarity by calculating Tanimoto coefficient between fingerprints of two compounds. AMBIT also allows querying the database by SMARTS, accelerated by several precalculated data.
Related Questions
- Im using the porphyrin template provided with ConQuest as part of my substructure but the search seems to be missing hits I know are there: what is going on?
- How do I restrict my search to structures which contain a certain substructure and show some anti-viral anti-AIDS activity?
- How is the scan of a search radar performed?