The Document Library: search expressions
If you want to read more about the text search technology used here, you’ll need to know that our MySql document database uses the InnoDB engine and the boolean, not the natural language full text search method. Most relevant page is probably mysql.com’s documentation on natural language boolean search.
Remember that you are searching document titles, descriptions and categories that we’ve provided for each document – not the documents themselves.
A search expression contains words, or search terms. Each word may contain only alphanumeric characters. Words are separated by spaces.
Words are not case-sensitive, so “France” is the same as “france”
Search expressions may be no longer than fifty characters.
Words in document descriptions and titles have been compiled into an index, except if they’re less than three characters long, or on a list of common words (stopwords). Consequently any word in a search expression either less than three characters long, or among the stopwords, will be ignored; you'll get a warning message with your results if that happens.
As well as words, a search expression may contain operators.
Any word may be immediately preceded by just one of the operators + or -.
The beginning of a word may be immediately followed by the operator *.
There are some sample search expressions below.
the details:
+ preceding a word | Indicates that a word must be present in a document’s description or title. |
- preceding a word | Indicates that a word must not be present. It acts only to exclude document whose descriptions, titles and/or categories that are otherwise matched by other search terms. A search that contains only terms preceded by - returns no documents; it does not return “all document descriptions or titles except those containing any of the excluded terms”. |
(no operator) | By default (when neither + nor - is specified), the word is optional, but a document whose description or title contains it is rated higher. |
* following a word | Truncation (or wildcard) operator. Unlike the other operators, it is appended to the word to be affected.
The wildcarded word is considered as a prefix that, to produce any results, must be present at the start of one or more word in the index.
Note * is only valid at the end of a word. |
" " enclosing words | The enclosed phrase matches only documents whose description or title contains the phrase literally, as it was typed.
Non-word characters need not be matched exactly: phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase". If the phrase contains no words that are in the index, the result is empty. The words might not be in the index because they do not exist in the text or they are stopwords; or they are shorter than four characters. So for instance the phrase "in UK"will never be found; “in” is a stopword, and “UK” is too short. |
Other operators (lower relevance: <; increase relevance: >; negation: ~; grouping ( and ); and @distance) even though allowed by mySql fulltext search, aren’t supported or allowed here.
some more examples:
pattern: | returns all documents whose description or title contains: |
---|---|
Hensen | documents whose titles or descriptions include the name “Hensen” |
clima* -climate | documents whose titles or descriptions include a word beginning “clima” which is not the word “climate”, probably “climat” |
Fran* Fren* | documents whose titles or descriptions refer to things french, and also franking machines, and frenetic; includes documents in the “French” category. |
brit* engl* uk | documents whose titles or descriptions refer to things English. You’ll get a warning message. |
COP2* | documents whose titles or descriptions refer to COP22, COP23…. |
"sea level" | documents whose titles or descriptions include the phrase "sea level"; will not automatically include documents in the “Sea” category. |
+sea +coal | any documents in both the “Sea” and “Coal” categories, plus any others whose descriptions and/or titles contain both words. |
sea coal | all documents in either the “Sea” or “Coal” categories, plus any others whose descriptions and/or titles contain either word. |
"Education and Training" | all documents in the “Education and Training” category, plus any others whose descriptions and/or titles contain that exact phrase. Note “and” isn’t a stopword, so counts in the index |
"Education and Training" -Sea | all documents in the “Education and Training” category, plus any others whose descriptions and/or titles contain that exact phrase; all excluding those in the “Sea” category. |
Education and Training | all documents in the “Education and Training” category, plus any others whose descriptions and/or titles include either the word “Education” or “Training”. |
Results are not ordered by any measure of how well the search expression matches document descriptions, titles or categories.
In search results, if a word search for matches a word in a category, that word will be highlighted in bold in the “in categories” list for each document to which it applies.
The thirty-six stopwords in current database environment appear to be:
a about an are as at be by com de en for from how i in is it la of on or that the this to was what when where who will with und the www
If you’d like to recommend an addition to our library email us.