How to measure text similarity (Jaro-Winkler) in Teradata?

All we need is an easy explanation of the problem, so here it is.

In Oracle we can measure text similarity with Jaro-Winkler like the following:

SELECT UTL_MATCH.JARO_WINKLER_SIMILARITY('STACKEXCHANGE', 'STAMPEXCHANGE') MYSTRING
FROM DUAL;
--98

And it turns out that Teradata has Jaro-Winkler too, as explained here. Unfortunately I just don’t understand the doc and example there.

So far what I can do in Teradata is with EDITDISTANCE:

SELECT EDITDISTANCE('STACKEXCHANGE', 'STAMPEXCHANGE') MYSTRING;
--2

So, how to measure text similarity with Jaro-Winkler in Teradata? Could anyone please give me some simple example?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

16.20.24.01 is FU1, FU2 is 16.20.40.01+

This function is not a Scalar function, it’s Table Operator syntax for set processing. You have to get used to it, but then those operators are very poweerful.

SELECT * 
FROM StringSimilarity
 ( ON
     (
       SELECT 1 as id, 'STACKEXCHANGE' as a, 'STAMPEXCHANGE' as b
       -- FROM ...
     )
   PARTITION BY ANY
   USING
     ComparisonColumnPairs ('jaro_winkler(a,b) AS jw_dist')
     Accumulate ('id')
) AS dt

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply