

Unlike B-tree based searches, the search string need not be left-anchored.įor both LIKE and regular-expression searches, keep in mind that a pattern with no extractable trigrams will degenerate to a full-index scan. The more trigrams that can be extracted from the regular expression, the more effective the index search is. The index search works by extracting trigrams from the regular expression and then looking these up in the index. SELECT * FROM test_trgm WHERE t ~ '(foo|bar)' Unlike B-tree based searches, the search string need not be left-anchored.īeginning in PostgreSQL 9.3, these index types also support index searches for regular-expression matches ( ~ and ~* operators), for example The more trigrams in the search string, the more effective the index search is. The index search works by extracting trigrams from the search string and then looking these up in the index. SELECT * FROM test_trgm WHERE t LIKE '%foo%bar' This can be implemented quite efficiently by GiST indexes, but not by GIN indexes.īeginning in PostgreSQL 9.1, these index types also support index searches for LIKE and ILIKE, for example SELECT t, word_similarity(' word', t) AS sml It will usually beat the first formulation when only a small number of the closest matches is wanted.Īlso you can use an index on the t column for word similarity or strict word similarity. This can be implemented quite efficiently by GiST indexes, but not by GIN indexes. The index will be used to make this a fast operation even over very large data sets. This will return all values in the text column that are sufficiently similar to word, sorted from best match to worst. Longer signatures lead to a more precise search (scanning a smaller fraction of the index and fewer heap pages), at the cost of a larger index.Įxample of creating such an index with a signature length of 32 bytes:ĬREATE INDEX trgm_idx ON test_trgm USING GIST (t gist_trgm_ops(siglen=32)) Īt this point, you will have an index on the t column that you can use for similarity searching. Valid values of signature length are between bytes. Its optional integer parameter siglen determines the signature length in bytes. Gist_trgm_ops GiST opclass approximates a set of trigrams as a bitmap signature. Note that those indexes may not be as efficient as regular B-tree indexes for equality operator.ĬREATE INDEX trgm_idx ON test_trgm USING GIST (t gist_trgm_ops) ĬREATE INDEX trgm_idx ON test_trgm USING GIN (t gin_trgm_ops) These index types support the above-described similarity operators, and additionally support trigram-based index searches for LIKE, ILIKE, ~, ~* and = queries. The pg_trgm module provides GiST and GIN index operator classes that allow you to create an index over a text column for the purpose of very fast similarity searches. Returns the “ distance” between the arguments, that is one minus the strict_word_similarity() value. Returns the “ distance” between the arguments, that is one minus the word_similarity() value. Returns the “ distance” between the arguments, that is one minus the similarity() value. Returns true if its arguments have a similarity that is greater than the current similarity threshold set by pg_trgm.similarity_threshold. Thus, the strict_word_similarity function is useful for finding the similarity to whole words, while word_similarity is more suitable for finding the similarity for parts of words.

# SELECT strict_word_similarity('word', 'two words'), similarity('word', 'words') In the first string, the set of trigrams is. # SELECT word_similarity('word', 'two words') ( Deprecated instead use SET pg_trgm.similarity_threshold.) The threshold must be between 0 and 1 (default is 0.3). Sets the current similarity threshold that is used by the % operator. ( Deprecated instead use SHOW pg_trgm.similarity_threshold.) This sets the minimum similarity between two words for them to be considered similar enough to be misspellings of each other, for example. Returns the current similarity threshold used by the % operator. Since we don't have cross-word trigrams, this function actually returns greatest similarity between first string and any continuous extent of words of the second string. Same as word_similarity, but forces extent boundaries to match word boundaries. Strict_word_similarity ( text, text ) → real Returns a number that indicates the greatest similarity between the set of trigrams in the first string and any continuous extent of an ordered set of trigrams in the second string. (In practice this is seldom useful except for debugging.) Returns an array of all the trigrams in the given string.

The range of the result is zero (indicating that the two strings are completely dissimilar) to one (indicating that the two strings are identical). Returns a number that indicates how similar the two arguments are.
