Supported Algorithms
A number of algorithms are supported using Opensearch/ElasticSearch with the analysis-phonetic plugin and the OpenCR Service (alone).
| Algorithm | OpenCR Service | OpenSearch/ElasticSearch |
|---|---|---|
| Exact | Yes | Yes |
| Metaphone | Yes | Yes |
| Double-metaphone | Yes | Yes |
| Levenshtein | Yes | Yes |
| Damerau-Levenshtein | Yes | Yes |
| Jaro-Winkler | Yes | No |
| Soundex | Yes | Yes |
For more advanced string similarity matching, the similarity-scoring plugin for ElasticSearch can provide more features, and is based on the https://github.com/tdebatty/java-string-similarity library. The library is open source.
For more information, see the similarity-scoring repository for OpenSearch or similarity-scoring repository for elasticsearch:
| Matcher Parameter for Query | Algorithm | Type | Normalized? |
|---|---|---|---|
| cosine-similarity | Cosine | similarity | yes |
| dice-similarity | Sorensen-Dice | similarity | yes |
| jaccard-similarity | Jaccard | similarity | yes |
| jaro-winkler-similarity | Jaro-Winkler | similarity | yes |
| normalized-lcs-similarity | Normalized Longest Common Subsequence | similarity | yes |
| normalized-levenshtein-similarity | Normalized Levenshtein | similarity | yes |
| cosine-distance | Cosine | distance | yes |
| damerau-levenshtein | Damerau-Levenshtein | distance | no |
| dice-distance | Sorensen-Dice | distance | yes |
| jaccard-distance | Jaccard | distance | yes |
| jaro-winkler-distance | Jaro-Winkler | distance | yes |
| levenshtein | Levenshtein | distance | no |
| longest-common-subsequence | Longest Common Subsequence | distance | no |
| metric-lcs | Metric Longest Common Subsequence | distance | yes |
| ngram | N-Gram | distance | yes |
| normalized-lcs-distance | Normalized Longest Common Subsequence | distance | yes |
| normalized-levenshtein-distance | Normalized Levenshtein | distance | yes |
| optimal-string-alignment | Optimal String Alignment | distance | no |
| qgram | Q-Gram | distance | no |