Supported Algorithms
A number of algorithms are supported using Opensearch/ElasticSearch with the analysis-phonetic plugin and the OpenCR Service (alone).
Algorithm | OpenCR Service | OpenSearch/ElasticSearch |
---|---|---|
Exact | Yes | Yes |
Metaphone | Yes | Yes |
Double-metaphone | Yes | Yes |
Levenshtein | Yes | Yes |
Damerau-Levenshtein | Yes | Yes |
Jaro-Winkler | Yes | No |
Soundex | Yes | Yes |
For more advanced string similarity matching, the similarity-scoring plugin for ElasticSearch can provide more features, and is based on the https://github.com/tdebatty/java-string-similarity library. The library is open source.
For more information, see the similarity-scoring repository for OpenSearch or similarity-scoring repository for elasticsearch:
Matcher Parameter for Query | Algorithm | Type | Normalized? |
---|---|---|---|
cosine-similarity | Cosine | similarity | yes |
dice-similarity | Sorensen-Dice | similarity | yes |
jaccard-similarity | Jaccard | similarity | yes |
jaro-winkler-similarity | Jaro-Winkler | similarity | yes |
normalized-lcs-similarity | Normalized Longest Common Subsequence | similarity | yes |
normalized-levenshtein-similarity | Normalized Levenshtein | similarity | yes |
cosine-distance | Cosine | distance | yes |
damerau-levenshtein | Damerau-Levenshtein | distance | no |
dice-distance | Sorensen-Dice | distance | yes |
jaccard-distance | Jaccard | distance | yes |
jaro-winkler-distance | Jaro-Winkler | distance | yes |
levenshtein | Levenshtein | distance | no |
longest-common-subsequence | Longest Common Subsequence | distance | no |
metric-lcs | Metric Longest Common Subsequence | distance | yes |
ngram | N-Gram | distance | yes |
normalized-lcs-distance | Normalized Longest Common Subsequence | distance | yes |
normalized-levenshtein-distance | Normalized Levenshtein | distance | yes |
optimal-string-alignment | Optimal String Alignment | distance | no |
qgram | Q-Gram | distance | no |