Export Schema
The following table provides details about each column included in a SureChEMBL csv or xml export.
Header | Description |
---|---|
Patent ID | The Patent ID or SureChEMBL Patent Number (SCPN) is a unique patent identifier used by SureChEMBL where hyphens are inserted after the country code and before the kind code i.e. CC-nnnn-KK where CC is the 2-digit country code, nnnnn is the patent number and KK is the 1- or 2-digit kind code. An SCPN always includes a kind code. |
Annotation Reference | The name is either a chemical entity recognised within the document text or the code of a supplemental image/chemical file where chemical structure data has been extracted. |
Chemical ID | The unique identifier for a chemical structure. |
SMILES | SMILES (simplified molecular-input line-entry system) is a canonical isomeric string representation of the chemical structure associated with the annotation reference. Where chemical structure data is unavailable, the column will be left blank. |
Type | This is either text, image or mol attachment, depending on the exact source of the chemical structure in the associated patent document. |
Chemical Document Count | The frequency of the chemical structure in the particular patent document. |
Annotation Document Count | The frequency of the chemical annotation (named entity) in the particular patent document. |
Title Count | A count of the number of occasions the annotation name has been observed within the title section of the associated SCPN. This value provides both a "relative" indication of how common any given annotation is compared to another and to the presence of an annotation within a document section. The title count only applies when 'type' is textual and comprises instances of annotation occurrence throughout only the title section of a document. |
Abstract Count | A count of the number of occasions the annotation name has been observed within the abstract section of the associated SCPN. This value provides both a "relative" indication of how common any given annotation is compared to another and to the presence of an annotation within a document section. The abstract count only applies when 'type' is textual and comprises instances of annotation occurrence throughout only the abstract section of a document. |
Claims Count | A count of the number of occasions the annotation name has been observed within the claims section of the associated SCPN. This value provides both a "relative" indication of how common any given annotation is compared to another and to the presence of an annotation within a document section. The claims count only applies when 'type' is textual and comprises instances of annotation occurrence throughout only the claims section of a document. |
Description Count | A count of the number of occasions the annotation name has been observed within the description section of the associated SCPN. This value provides both a "relative" indication of how common any given annotation is compared to another and to the presence of an annotation within a document section. The description count only applies when 'type' is textual and comprises instances of annotation occurrence throughout only the description section of a document. |
Chemical Corpus Count | A count of the number of occasions the chemical structure has been observed within the whole available chemical corpus of SureChEMBL, i.e. across all sections, images and attachments of all annotated patent documents. This property may be used to evaluate the novelty or triviality of a particular compound. |
Annotation Corpus Count | A count of the number of occasions the annotation reference has been observed within the whole annotation corpus of SureChEMBL. |
Molecular Weight | The theoretical average molecular mass calculated using the standard atomic weights found on a typical periodic table measured in daltons (Da). |
Med Chem Alert | A chemical structure is flagged as potentially non "medchem friendly" if one or more sub-structural features have been identified as defined by a SMARTS description. See here for the full set of SMARTS definitions. |
Log P | The predicted Octanol/Water partition coefficient for structure represented by the SMILES calculated by using Viswanadhan and Ghose ALogP method. |
Donor Count |
The number of atoms considered to be Hydrogen bond donating in the structure represented by the SMILES. |
Acceptor Count | The number of atoms considered to be Hydrogen bond accepting in the structure represented by the SMILES. |
Ring Count | The number of distinct atom ring arrangements found in the structure represented by the SMILES. |
Rotatable Bond Count | The number of bonds considered to be rotatable in the structure represented by the SMILES. |
Radical | 1 or 0, depending on whether the chemical structure is a radical or not. |
Fragment | 1 or 0, depending on whether the chemical structure is a fragment or not. |
Connected | 1 or 0, depending on whether the chemical structure has just one or more components. |
Singleton | 1 or 0, depending on whether the chemical structure consists of one heavy atom only. |
Simple | 1 or 0, depending on whether the chemical structure consists of two heavy atoms only. |
Lipinski | 1 or 0, depending on whether the chemical structure complies with Lipinski's rule-of-5 criteria: MW <= 500 AND logP <= 5 AND donorCount <= 5 AND acceptorCount <= 10 |
Lead Likeness | 1 or 0, depending on whether the chemical structure complies with a common definition of lead-likeness: MW <= 450 AND logD(7.4) >= -4 AND logD(7.4) <= 4 AND ringCount <= 4 AND rotatableBondCount <= 10 AND donorCount <= 5 AND acceptorCount <= 8 |
Bio Availability | 1 or 0, depending on whether the chemical structure complies with at least 6 of the following 7 criteria:
|
Other 'SureChEMBL Overview' articles
Sorted by view count