Archive for April 2015

Key features of pdSurname

Peacock Data released their pdSurname last name software package early last month, and since then it has proven to be their fastest starting new-product launch. Barbara Adair, spokesperson for the west coast-based company, sat down for an interview about the key features of the new software.

The product literature says pdSurname is designed to facilitate finding last names that are not exactly the same but variations or phonetically similar.

Barbara Adair explained, “Often the same person winds up in multiple places on a list because transcribers enter the name slightly differently each time, such as ‘Johnson’ verses ‘Johnsson’, ‘Jonson’, and ‘Joneson’. This can lead to significantly increased costs, notably reduced quality in customer service, and even lawsuits.”

“This is particularly critical in the medical industry where patient records frequently can be in multiple places with slightly different versions of a name. This has lead to terrible consequences in some instances, along with unnecessary medical procedures and generally reduced medical care in many other instances,” the spokesperson exclaimed.

According to Barbara Adair, “The base product includes both true name tree variations gathered from onomastic research, such as ‘Hutchens’ verses ‘Hutchins’ and ‘Parker’ verses ‘Parkers’, along with names that are not true variations, but rather sound or are spelled similarly, such as ‘Gonzalez’ verses ‘Gunsalus’ and ‘Davis’ verses ‘Davisse’.”

“There are also special features built in for last names with prefixes like ‘Mc’, ‘Mac’, ‘O’, ‘De’, ‘La’, ‘Van’, ‘Al’ and ‘St’, which allows us to match names like ‘Garcia’ with ‘De Garcia’ and ‘Van Der Zant’ with ‘Van Zant’ and ‘Zant’, which are possibilities missed by other products,” she added.
Because many possibilities can be returned from a pdSurname query, the software includes a match quality score based on a 1 to 99 scale to order the results from most likely to least likely, which is also a feature usually missing in other products.

As a bonus, in addition to showing related names, the product is enhanced with demographics for each of the more than 335,000 surname formations provided.

“All names give the racial percentage of use for whites, blacks, Hispanics and Latinos, Asians and Pacific Islanders, Native Americans, and multiracial use,” the spokesperson explained. “And most names also include the language or languages of origin and usage, which covers more than 600 languages, dialects, and ethnic groups.”

Barbara Adair noted, “An enhanced Pro edition offers additional matches based on common misspellings and other typographical errors that frequently appear in lists of names. These are derived from a set of new algorithms we call ‘Fuzzy Logic Generation 2.0’ which can pick up such typos as ‘Berryman’ misspelled as ‘Beryman’, missing the double ‘r’, and ‘Polheimer’ phonetically transcribed as ‘Poolhimer’.”

The product literature indicates fuzzy logic generation 2.0 has five layers:

  • Phonetic misspellings
  • Reversed digraphs
  • Double-letter misspellings
  • Missed keystrokes
  • String manipulations

“The new surname software is similar to our pdNickname package, which has been in the field for more than two decades successfully doing for first names, nicknames, and diminutives what pdSurname now does for last names,” Barbara Adair noted.
She added, “In addition to its use for businesses and organizations working with lists of names, all our names software is recommended for genealogical and scholarly research.”

“The new software package is a one-of-a-kind proprietary product that allows users to complete projects with significantly more success and in ways that were not possible before,” concluded Barbara Adair. “It is a very exciting time for Peacock Data and our many customers.”


The software comes in Standard and Pro editions. Both include the same names and features except the Pro version comes equipped with fuzzy logic. Fuzzy logic allows matching when lists have misspellings or other typographical errors. The Standard edition has everything except fuzzy logic.

Total name records: Pro: 109,932,801; Standard: 81,079,801
Zipped size: Pro: 1.2 GB; Standard: 962 MB
Extracted size: Pro: 22.9 GB; Standard: 16.8 GB
Introductory Price: Pro: $371.25; Standard: $224.25

The package also features precision documentation and a perpetual site license allowing installation on all computers in the same building within a single company or organization.


pdSurname is also part of the company’s pdSuite Names (now on sale, $645) and pdSuite Master Collection (now on sale, $795) software bundles.

Fuzzy logic generation 2.0

Peacock Data introduced the next generation of their fuzzy logic technology with last month’s release of the California-based firm’s pdSurname Pro last name matching software.

Accordion to company spokesperson Barbara Adair, “pdSurname facilitates identifying last names that are true variations or phonetically similar, while the fuzzy logic technology in the enhanced Pro edition allows finding names even when there are misspellings or other typographical errors.”

“We introduced fuzzy logic with our pdNickname Pro and pdGender Pro software in late 2013, but the new fuzzy logic generation 2.0 is a great enhancement,” Barbara Adair exclaimed.

According to the company, most of the enhancements were achieved after they developed a giant library of more than 80,000 language rules based on hundreds of dialects from around the world. Barbara Adair said, “Many misspellings occur as transcribers enter the sounds they hear. The character sequences and the sounds they produce are different for each language and situation, such as before, after, or between certain vowels and consonants, so our substitutions are language-rule based.”

The company explained additionally that their algorithms go even further by considering both how a name may sound to someone who speaks English as well as how it may sound to someone who speaks Spanish, which is often different. Barbara Adair explained, “Take the letter-pair ‘SC’ as an example. Before the vowels ‘E’ or ‘I’ it is most likely to be misspelled by an English speaker as ‘SHE’ or ‘SHI’ while a Spanish speaker may hear ‘CHE’ or ‘CHI’ and sometimes ‘YE’ or ‘YI’.”

Company literature indicates the new fuzzy logic generation 2.0 technology has five layers:

1. Phonetic misspellings: such as GUALTIERREZ misspelled as GUALTIEREZ, AAGARD misspelled as OUGHGARD, and YOUNGMAN misspelled as YONGMAN.

2. Reversed letters: such as DIELEMAN misspelled as DEILEMAN and RODREGUEZ misspelled as RODREUGEZ. These algorithms look for errors due to reversed digraphs (two letter sequences that form one phoneme or distinct sound) which are a common typographical issue, such as “IE” substituted for “EI”.

3. Double letter misspellings: such as HUMBER misspelled as HUMBEER and ZWOLLE misspelled as ZWOLE. The most common typographical issues occur with the characters, in order of frequency, “SS”, “EE”, “TT”, “FF”, “LL”, “MM”, and “OO”.

4. Missed keystrokes: such as HUNTER misspelled as UNTER, missing the initial “H”, and TAMERON misspelled as TAMRON, missing the “E” in the middle.

5. Other typographical errors: which cover a variety of additional misspelling issues.

The pdSurname Pro software with the new fuzzy logic generation 2.0 technology is available for immediate download and can currently be purchased at a 25 percent introductory discount (sale, $371.25; regular, $495) or as part of bundles also on sale, pdSuite Names (sale, $645; regular $795) and pdSuite Master Collection (sale, $795; regular, $995).

For users of other Peacock Data name software, Barbara Adair noted, “pdNickname Pro and pdGender Pro will be updated with fuzzy logic generation 2.0 capabilities this fall, and the upgrades will be free for anyone owning the older version.”