Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

How are the substructure and similarity search performed?

April 26, 2017Performed search similarity substructure

0

Posted

How are the substructure and similarity search performed?

1 Answer

0

Posted

The core substructure search functionality (graph isomorphism) is provided either by the Thje CDK cheminformatics library http://cdk.sourceforge.net or by a faster algorithm, developed for AMBIT. Substructure search is an NP-hard problem, which means that the complexity of the algorithm increases rapidly with the size of the molecule. To speed-up substructure searching in large datasets, one usually uses precalculated fingerprints to identify structures, potentially containing the substructure. The AMBIT database and software combines this technique with fast relational database queries, which results in very fast substructure searching in huge datasets. In addition, fingerprints are a standard tool for assessing similarity by calculating Tanimoto coefficient between fingerprints of two compounds. AMBIT also allows querying the database by SMARTS, accelerated by several precalculated data.