Peacock Data

At our original website you will be able to read about and purchase our quality data management services ranging from list scrubbing to custom programming and beyond. You will not only find the customary array, but also one-of-a-kind services only available from Peacock Data. Our specialty is our trademarked merge/purge services. Go there >>

Peacock Data 2

At our new sister website we offer a full line of database products crafted with the same quality found in our well-known data management service. There you will find unique packages relating to ZIP Codes, United States Census demographics, GeoCoding, names and nicknames, gender coding and more—and the list is growing. Go there >>

Archive for Names & Nicknames

Oct
04

NEW SERVICE: Latino Append

Posted by: Peacock Data | Comments (0)

NEW SERVICE: Latino AppendDo you need to know who on your lists are of Latino or Hispanic origin? This is where our new Latino Append service comes in.

This service rates each of your records as to how likely it is that the subject is Latino, Latina or Hispanic. Primary matching algorithms are based on the first and last name(s); secondary matching algorithms utilize the address and U.S. Census data.

Records are flagged with a multi-point scale providing the percentage chance of Latino or Hispanic origin.

pdLatino

For those waiting for the release of our pdLatino database product, which is utilized in our Latino Append service, development is in the final stages, and we are testing its precision and comprehensiveness. Expect release is in the first quarter of 2010.

pdLatino has taken longer to develop than expected. In fact the original schedule called for a release date of July 1, 2011. But we have really gone to school on the product, and we expect the results to be well worth the wait.

• • Click here for more information about our new Latino Append service.

Apr
19

Using the pdNickname RELFLAG Field

Posted by: Peacock Data | Comments (0)

is a unique nearly 50,000 record database designed to facilitate comparing sets of first name data based on nicknames, diminutives, pet names, variations and given names. One of the most important fields in the database product is RELFLAG, which stands for “Relationship Flag”.

The RELFLAG field contains one of two possible values:

  1. Close relationship between the name and variation (common variants):

    Includes closely associated nicknames, diminutives and pet names as well as first name variations that are considered closely related.

  2. More distant relationship between the name and variation (less common variants):

    Includes alternate forms of the names, often deriving from another culture, as well as nicknames, diminutives and pet names that are relatively uncommon.

PDNICKNAME VARIATIONS FOR THE GIVEN NAME “SAMUAL”
pdNickname variations for the given name “SAMUAL&rdquo
The RELFLAG field indicates if the name and variation have a (1) close or (2) more distant relationship.

The RELFLAG field is useful for controlling what is to be considered an acceptable match. As more distant relationships are included in matches, the error rate naturally rises. The error rate increase is usually not substantial, but it is measurable in hundredths and tenths of a percent.

RECOMMENDATIONS

RESIDENTIAL LISTS: While additional accuracy can be achieved if only close relationships are considered, with residential lists, the margin of error rate increase is almost always very small even when the more distant relationships are included—rarely more than 0.02% in our testing. Therefore, under best practices, it is fully acceptable to use all RELFLAG relationships when matching residential lists. With the exception of the George Foreman family, most errors that might occur result from different given name that share the same nickname or other variation.

BUSINESS & ORGANIZATION LISTS: On the other hand, with business and organization lists, when the more distant relationships are included the margin of error rate increase is typically higher, compared to residential lists. However, our testing normally shows an increase that is still less than 0.1%, but we have seen it as high as 0.3% with some large lists. Under best practices, it is recommended that only close relationships be considered when processing business and organization lists.

Categories : Names & Nicknames, Tips
Comments (0)

An alternative structure for is to have one record per name with the variations in fields next to it. This tutorial explains how to do it.

Matching and merging names can be tricky. How do you relate William Smith with Bill Smith? The pdNickname database can be utilized to match names that are dissimilar because one has a given first name while another has a nickname or other variation, or vice versa.

Out of the box pdNickname is structured to allow immediate compatibility with the greatest number of database systems as well as to make it easy to become familiar with.

The nickname database is setup with two names per record. The first name field contains the names you are looking up, and in the second is a variation for each name—nickname, diminutive, given name, variant, etc. The same name can be listed several times in the first field, each time with a different variation. (See Figure 1.)

FIGURE 1: PDNICKNAME OUT OF THE BOX

If the names compared are Alexander Jones and Alex Jones, all names matching Alexander (NAME-A) are scanned until a variation is found that matches Alex (NAME-B). This works well, but there are other ways of organizing pdNickname that could work even better for you. In fact, we have restructured the table for utilization in our own services.

An alternative structure is to have one record per name and the variations in fields next to it. It is not practical to have separate fields for each variation, which can range from one to over two hundred. So what we do is have two Memo fields (also known as Long Text), one for close variations (relflag = "1") and the other for more distant variations (relflag = "2"), with the string of variations separated by delimiters for easier matching. (See Figure 2.)

FIGURE 2: PDNICKNAME RESTRUCTURED

Note: when browsing a table, normally you cannot see the content of a Memo or Long Text field because the database keeps it in a separate file. For this screenshot we have made the content visible.

Structured this way, when your program finds a match for NAME-A, it then determines if NAME-B can be found in variation field one or variation field two. This can be faster because you only access one record in each search request. The code sample below is an example in Visual FoxPro that illustrates this. Of course other programs use different commands and syntax to achieve the same outcome.

* CODE SAMPLE

*- this Visual FoxPro function receives as parameters
*- the two first names being compared - it returns a
*- variable indicating what matches are found - this
*- function is based on the restructuring of the
*- pdNickname database described in this tutorial

FUNCTION pdNickname
LPARAMETERS cNameA, cNameB
LOCAL nMatch

IF NOT USED("nicknames")
    USE nicknames ALIAS nicknames IN 0
ENDIF

cNameA = PADR(UPPER(ALLTRIM(cNameA)),25," ")
cNameB = "/"+UPPER(ALLTRIM(cNameB))+"/"

nMatch = 0
IF SEEK (cNameA, "nicknames", "name")
    DO CASE
    CASE OCCURS(cNameB, nicknames.variations) > 0
        nMatch = 1
    CASE OCCURS(cNameB, nicknames.var2) > 0
        nMatch = 2
    ENDCASE
ENDIF

RETURN nMatch

pdNickname, like all our Database Products, are structured to satisfy most users from the start. But there are many ways to integrate the databases into your system. It is up to you to determine what works best for you. Do not be afraid to experiment.

Comments (0)

Services

Products