The Document Library: search expressions

If you want to read more about the text search technology used here, you’ll need to know that our MySql document database uses the InnoDB engine and the boolean, not the natural language full text search method. Most relevant page is probably mysql.com’s documentation on natural language boolean search.

Remember that you are searching document titles, descriptions and categories that we’ve provided for each document – not the documents themselves.

A search expression contains words, or search terms. Each word may contain only alphanumeric characters. Words are separated by spaces.

Words are not case-sensitive, so “France” is the same as “france”

Search expressions may be no longer than fifty characters.

Words in document descriptions and titles have been compiled into an index, except if they’re less than three characters long, or on a list of common words (stopwords). Consequently any word in a search expression either less than three characters long, or among the stopwords, will be ignored; you'll get a warning message with your results if that happens.

As well as words, a search expression may contain operators.
Any word may be immediately preceded by just one of the operators + or -.
The beginning of a word may be immediately followed by the operator *.

There are some sample search expressions below.

the details:

+ preceding a wordIndicates that a word must be present in a document’s description or title.
- preceding a wordIndicates that a word must not be present.

It acts only to exclude document whose descriptions, titles and/or categories that are otherwise matched by other search terms.

A search that contains only terms preceded by - returns no documents; it does not return “all document descriptions or titles except those containing any of the excluded terms”.
(no operator)By default (when neither + nor - is specified), the word is optional, but a document whose description or title contains it is rated higher.
* following a wordTruncation (or wildcard) operator. Unlike the other operators, it is appended to the word to be affected. The wildcarded word is considered as a prefix that, to produce any results, must be present at the start of one or more word in the index.

Note * is only valid at the end of a word.
"  " enclosing wordsThe enclosed phrase matches only documents whose description or title contains the phrase literally, as it was typed. Non-word characters need not be matched exactly: phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase".

If the phrase contains no words that are in the index, the result is empty. The words might not be in the index because they do not exist in the text or they are stopwords; or they are shorter than four characters. So for instance the phrase
"in UK"
will never be found; “in” is a stopword, and “UK” is too short.

Other operators (lower relevance: <; increase relevance: >; negation: ~; grouping ( and ); and @distance) even though allowed by mySql fulltext search, aren’t supported or allowed here.

some more examples:

pattern:returns all documents whose description or title contains:
Hensendocuments whose titles or descriptions include the name “Hensen”
clima* -climatedocuments whose titles or descriptions include a word beginning “clima” which is not the word “climate”, probably “climat”
Fran* Fren*documents whose titles or descriptions refer to things french, and also franking machines, and frenetic; includes documents in the “French” category.
brit* engl* ukdocuments whose titles or descriptions refer to things English. You’ll get a warning message.
COP2*documents whose titles or descriptions refer to COP22, COP23….
"sea level"documents whose titles or descriptions include the phrase "sea level"; will not automatically include documents in the “Sea” category.
+sea +coalany documents in both the “Sea” and “Coal” categories, plus any others whose descriptions and/or titles contain both words.
sea coalall documents in either the “Sea” or “Coal” categories, plus any others whose descriptions and/or titles contain either word.
"Education and Training"all documents in the “Education and Training” category, plus any others whose descriptions and/or titles contain that exact phrase. Note “and” isn’t a stopword, so counts in the index
"Education and Training" -Seaall documents in the “Education and Training” category, plus any others whose descriptions and/or titles contain that exact phrase; all excluding those in the “Sea” category.
Education and Trainingall documents in the “Education and Training” category, plus any others whose descriptions and/or titles include either the word “Education” or “Training”.

Results are not ordered by any measure of how well the search expression matches document descriptions, titles or categories.

In search results, if a word search for matches a word in a category, that word will be highlighted in bold in the “in categories” list for each document to which it applies.

The thirty-six stopwords in current database environment appear to be:

a  about  an  are  as  at  be  by  com  de  en  for  from  how  i  in  is  it  la  of  on  or  that  the  this  to  was  what  when  where  who  will  with  und  the  www

back whence you came

If you’d like to recommend an addition to our library email us.