In the search, you can use simple terms, that is, single words and phrases, that is phrases composed of several words enclosed in quotation marks, eg "Nicolaus Copernicus University". If you use quotation marks, only documents containing the entire phrase will be searched.
You can use Boolean operators to concatenate search terms. You can also use the so-called Masking characters that replace any letters and numbers and their strings, find similar terms that are some distance apart, or prioritize search terms.
Fuzzy search is used in the case of simple terms similar to each other, eg Copernicus, Copernikus, Kopernikus. Documents containing these terms can be found by adding a tilde character to the term: copernicus~.
The degree of similarity sought can be determined by a coefficient that ranges between 0 and 1. As the coefficient value gets closer to 1, terms with higher similarity will be searched for. By default, the similarity coefficient is set to 0.5. To change it, add a tilde to the search term along with a clearly specified factor, e.g. kopernik~0.4.
It is also possible to specify the distance of one of the search terms from another (so-called proximity search). For example, if we remember that in the document Choral-buch and Westpreussen appeared close to each other, we can use the following query: "Choral-buch Westpreussen"~6.
You can specify the priority of the search term by appending a "^" followed by a number (greater than 1). For example, the query stempowski^4 grydzewski will return documents in which both surnames appear, but at the beginning of the list there will be those in which the surname with a higher priority appears more often. The default search priority is 1.
You can group expressions in complex queries using parentheses. Such a procedure allows you to give complex queries an intended, unambiguous meaning, just like in arithmetic operations.
First, the partial expressions inside the parentheses are processed, and then the larger whole. Query about the shape: "De revolutionibus orbium coelestium" AND (Copernicus OR Kopernik) will search for documents with the title of Copernicus's work and his name in at least one of two forms.
For obvious reasons, characters used to build complex queries (+ - && || ! ( ) { } [ ] ^ " ~ * ? : \) are treated differently from the rest of the search: they act as elements of the query syntax, not as particles of the search phrase. To avoid the specific interpretation of special characters, place the so-called escape sign "\" in front of them. For example, to search for a phrase "(2+2)*2" must be entered "\(2\+2\)\*2". However, it should be noted that only letters and digits are indexed, therefore other characters do not affect search results.
See Jakarta Lucene Query Parser Syntax
The text was originally posted on the website of the Kujawsko-Pomorska Digital Library.
This work is licensed under the Creative Commons Attribution-Share Alike 2.5 Poland license.