SSIS Component – Search Parameter

The core functions within the search are the search parameters and the algorithms used. Use multiple search parameters to find all possible duplicates. Within a search parameter, you specify a column and the algorithm to be used. Use an algorithm that matches the respective data in the column. For surnames, the Double Metaphone Algorithms or the Cologne Phonetic are suitable. For postal codes, for example, an Edit distance like the Levenshtein distance is better suited.

A total value is calculated from each individual result of a search parameter, which may not be smaller then the defined value. You enter this total value for each individual search parameter. For the calculation method you can select “Sum” or “Product”.

Search Parameter

Threshold

The percentage value that the algorithm calculates when two strings to be compared result in a duplicate.

Weight

The weighting at which the threshold is to be included in the total value of the calculation.

Prerequisite

Defines whether the values for a comparison must occur. A comparison in which one of the values does not exist cannot be a duplicate.

Column Cross

You can use the Column Cross parameter to define that columns are to be checked interchanged. This setting can be particularly useful if, for example, you want to check whether the first and last name have been mixed up.

Partial Compare

Using the Partail Compare within the search parameter, you can define whether you allow individual parts during the check without affecting the threshold neagtively. For example, you can have the name Cathy-Victoria checked against the name Cathy with the partial separator “-“. Only Cathy is checked against Cathy, so that a 100% threshold is reached. During the complete check of the name, depending on the algorithm, you will receive a value well below 80%. Define the Partial Separator in the Search Parameter tab for each individual search parameter.