Archive for Names & Nicknames
NEW SERVICE: Latino Append
Posted by: | Comments
Do you need to know who on your lists are of Latino or Hispanic origin? This is where our new Latino Append service comes in.
This service rates each of your records as to how likely it is that the subject is Latino, Latina or Hispanic. Primary matching algorithms are based on the first and last name(s); secondary matching algorithms utilize the address and U.S. Census data.
Records are flagged with a multi-point scale providing the percentage chance of Latino or Hispanic origin.
pdLatino
For those waiting for the release of our pdLatino database product, which is utilized in our Latino Append service, development is in the final stages, and we are testing its precision and comprehensiveness. Expect release is in the first quarter of 2010.
pdLatino has taken longer to develop than expected. In fact the original schedule called for a release date of July 1, 2011. But we have really gone to school on the product, and we expect the results to be well worth the wait.
• • Click here for more information about our new Latino Append service.
Using the pdNickname RELFLAG Field
Posted by: | CommentspdNickname is a unique nearly 50,000 record database designed to facilitate comparing sets of first name data based on nicknames, diminutives, pet names, variations and given names. One of the most important fields in the database product is RELFLAG, which stands for “Relationship Flag”.
The RELFLAG field contains one of two possible values:
-
Close relationship between the name and variation (common variants):
Includes closely associated nicknames, diminutives and pet names as well as first name variations that are considered closely related.
-
More distant relationship between the name and variation (less common variants):
Includes alternate forms of the names, often deriving from another culture, as well as nicknames, diminutives and pet names that are relatively uncommon.
PDNICKNAME VARIATIONS FOR THE GIVEN NAME “SAMUAL”

The RELFLAG field indicates if the name and variation have a (1) close or (2) more distant relationship.
The RELFLAG field is useful for controlling what is to be considered an acceptable match. As more distant relationships are included in matches, the error rate naturally rises. The error rate increase is usually not substantial, but it is measurable in hundredths and tenths of a percent.
RECOMMENDATIONS
RESIDENTIAL LISTS: While additional accuracy can be achieved if only close relationships are considered, with residential lists, the margin of error rate increase is almost always very small even when the more distant relationships are included—rarely more than 0.02% in our testing. Therefore, under best practices, it is fully acceptable to use all RELFLAG relationships when matching residential lists. With the exception of the George Foreman family, most errors that might occur result from different given name that share the same nickname or other variation.
BUSINESS & ORGANIZATION LISTS: On the other hand, with business and organization lists, when the more distant relationships are included the margin of error rate increase is typically higher, compared to residential lists. However, our testing normally shows an increase that is still less than 0.1%, but we have seen it as high as 0.3% with some large lists. Under best practices, it is recommended that only close relationships be considered when processing business and organization lists.
Restructuring the pdNickname Database
Posted by: | CommentsAn alternative structure for pdNickname is to have one record per name with the variations in fields next to it. This tutorial explains how to do it.
Matching and merging names can be tricky. How do you relate William Smith with Bill Smith? The pdNickname database can be utilized to match names that are dissimilar because one has a given first name while another has a nickname or other variation, or vice versa.
Out of the box pdNickname is structured to allow immediate compatibility with the greatest number of database systems as well as to make it easy to become familiar with.
The nickname database is setup with two names per record. The first name field contains the names you are looking up, and in the second is a variation for each name—nickname, diminutive, given name, variant, etc. The same name can be listed several times in the first field, each time with a different variation. (See Figure 1.)
FIGURE 1: PDNICKNAME OUT OF THE BOX

If the names compared are Alexander Jones and Alex Jones, all names matching Alexander (NAME-A) are scanned until a variation is found that matches Alex (NAME-B). This works well, but there are other ways of organizing pdNickname that could work even better for you. In fact, we have restructured the table for utilization in our own services.
An alternative structure is to have one record per name and the variations in fields next to it. It is not practical to have separate fields for each variation, which can range from one to over two hundred. So what we do is have two Memo fields (also known as Long Text), one for close variations (relflag = "1") and the other for more distant variations (relflag = "2"), with the string of variations separated by delimiters for easier matching. (See Figure 2.)
FIGURE 2: PDNICKNAME RESTRUCTURED

Note: when browsing a table, normally you cannot see the content of a Memo or Long Text field because the database keeps it in a separate file. For this screenshot we have made the content visible.
Structured this way, when your program finds a match for NAME-A, it then determines if NAME-B can be found in variation field one or variation field two. This can be faster because you only access one record in each search request. The code sample below is an example in Visual FoxPro that illustrates this. Of course other programs use different commands and syntax to achieve the same outcome.
* CODE SAMPLE *- this Visual FoxPro function receives as parameters *- the two first names being compared - it returns a *- variable indicating what matches are found - this *- function is based on the restructuring of the *- pdNickname database described in this tutorial FUNCTION pdNickname LPARAMETERS cNameA, cNameB LOCAL nMatch IF NOT USED("nicknames") USE nicknames ALIAS nicknames IN 0 ENDIF cNameA = PADR(UPPER(ALLTRIM(cNameA)),25," ") cNameB = "/"+UPPER(ALLTRIM(cNameB))+"/" nMatch = 0 IF SEEK (cNameA, "nicknames", "name") DO CASE CASE OCCURS(cNameB, nicknames.variations) > 0 nMatch = 1 CASE OCCURS(cNameB, nicknames.var2) > 0 nMatch = 2 ENDCASE ENDIF RETURN nMatch
pdNickname, like all our Database Products, are structured to satisfy most users from the start. But there are many ways to integrate the databases into your system. It is up to you to determine what works best for you. Do not be afraid to experiment.