SureChEMBL
beta
Open Patent Data
Help & Support
  • Login

Tanimoto Coefficient and Fingerprint Generation

Index ↩

A number of thresholds or measures are available for Similarity searching. The higher the threshold the closer the target structures are to the query structure. By default the Similarity search within SureChEMBL uses the Tanimoto coefficient to calculate the degree of similarity between the query and the target structures. The Tanimoto coefficient has two arguments:

  • The fingerprint of the query structure
  • The fingerprint of the target structure

A fingerprint is comprised of a list of predefined structure fragments or feature found within a structure. Each feature that is present is represented as “on” by using the number 1 (as in one bit).

Tanimoto coefficient formula

 

 

 

T = NA&B / NA + NB – NA&B    

 

 

 

NA represents the number of "on" features (bits) in structure A.

NB represents represents the number of "on" features (bits) in structure B.

NA&B represents the number of "on" features (bits) common to both fingerprints A and B.

The hashed binary chemical fingerprint of a molecule is a bit string (a sequence of "0" and "1" digits) that contains information on the structure. The process of fingerprint generation is as follows:

 

  1. Up to a given bond, all linear paths (linear patterns) consisting of bonds and atoms of a structure are detected.

  2. Branching points at the end of each linear pattern are also detected.

  3. All cycle (cyclic patterns) are detected.

  4. Using a proprietary hashing method, a given number of bits in the bit stream are set for each pattern. It is possible that the same bit is set by multiple patterns. This phenomenon is called bit collision. A few bit collisions in the fingerprint are tolerable, but too many may result in losing information in the fingerprint.

Example

 Hashed fingerprint

 

 

Reference websites

 
Details presented within this documentation kindly provided by and reproduced from ChemAxon (www.chemaxon.com).
 
For more information on general chemistry information, please see http://www.chemaxon.com/jchem/doc/user/query_searchtypes.html#full. 
 
For a more thorough explanation of the Tanimoto coefficient please see http://www.qsarworld.com/files/tamimoto_coefficient-1.pdf.
 
 

 

 

Other 'Chemical Searching' articles

Sorted by view count

  • Insert a SMILES, SMARTS, MOL, or Name Entry
  • Structure search types
  • Tanimoto Coefficient and Fingerprint Generation
  • Search type differences
  • Filter by molecular weight
  • Search for structure in doc section(s)
  • Structure drawing tool basics
  • Non MedChem-Friendly SMARTS
  • Support
  • Contact Us
  • The ChEMBL-og blog
  • @SureChEMBL on Twitter
  • SureChEMBL Webinar Part 1 Part 2
  • Technology Partners
  • Terms and Conditions
  • Cookie and Privacy Policy
  • Downloads
  • The ChEMBL database
The European Bioinformatics Institute

©EMBL-EBI 2014 | EBI is an outstation of the European Molecular Biology Laboratory

SureChEMBL is a trademark of EMBL

Funding for SureChEMBL is provided by Wellcome Trust, Open PHACTS, NIH and EMBL