Lucene Query Parser Details
Lucene is a high-performance, full-featured text search engine library. It is a technology specifically for applications that require full text search, especially cross-platform. SureChEMBL relies on Lucene because of the extensive full text found with in patent literature.
Query Elements
All queries are built using some basic components: Terms, Phrases, Fields, and Operators. The exact method for combining the components may vary from one query language to the next. Because SureChEMBL is based on the Lucene Query Parser, the following details will assist you in building complex queries in SureChEMBL.
Terms
There are two types of terms: words and phrases. A “word” is a continuous string of characters without any spacing, such as "gleevec" or "kinase."
A phrase is a group of words treated as an individual unit. For Lucene to locate the phrase exactly as it is written, the phrase must be surrounded by double quotes.
Example:
"kinase inhibitor"
Fields
Lucene supports field-specific data. A field is a holder for a particular kind of data, for example patent numbers or abstracts. When performing a search you may specify a field. In some parts of SureChEMBL the field names are provided by the application itself. In the SureChEMBL Query field you may choose to provide the specific fields. You can search a field by typing the field name followed by a colon ":" and then the term you are looking for.
Boolean operators
Boolean operators are used to combine separate queries into a single complex query. SureChEMBL uses three Boolean operators: AND, OR, and NOT.
Operator | Definition |
---|---|
AND | Intersection: For an individual document to be included in the results set, it must contain both of the individual query elements. |
OR | Union: For an individual document to be included in the results set, it only has to contain one of the individual query elements. |
NOT | Difference: For an individual document to be included in the results set, it must contain the first listed individual query element and must not contain the second. |
Examples
Assignee(s)/Applicant(s):
Sunovian AND IPCR: A63B006936
Only those patents containing the Assignee or Applicant value of Sunovian and also containing the IPCR of A63B 69/36 appear in the results.
Sunovian OR IPCR: A63B006936
All those patents containing the Assignee or Applicant value of Sunovian would appear in the results set regardless of the IPCR. Likewise all the patents containing the IPCR of A63B 69/36 appear in the results, no matter who the assignee or applicant was.
Sunovian NOT IPCR: A63B006936
Only those patents containing the Assignee or Applicant value of Sunovian where the IPCR of A63B 69/36 does not occur appear in the results.
Note: The Boolean operators must ALWAYS be written in uppercase.
Default Operator
The default Boolean operator for SureChEMBL is AND. If you enter two word into a query field without quotes or any operator, the application will return all documents containing both terms. The terms will not necessarily appear immediately adjacent to one another.
Other 'Searching SureChEMBL' articles
Sorted by view count
- Search Interface Overview
- SureChEMBL Query Overview
- The Lucene Query Search
- Available Databases
- Using the Date Filter
- Field Search Overview
- Bibliographic Search Definition
- Document Section Filter
- Lucene Query Parser Details
- Boolean Operators
- Bibliographic Field Details and Examples
- Query Status Page
- Lucene Query Field Names and Examples
- Patent Number Search Format
- Search for keyword(s)