Archive for pdSurname

The language of names

According to Barbara Adair, Peacock Data’s chief development coordinator and spokesperson, the language features in the company’s long-standing flagship software packages, pdNickname and pdGender, and their new pdSurname product, “have never been available before on this scale and required a sizable portion of nearly six years of research and development.”

With this landmark software, among other powerful features, users can identify the languages associated with tens of thousands of first names, nicknames, and last names, providing them with critical ethnic and heritage demographics about their clients.

pdNickname is an advanced name and nickname file. It identifies first names that are the same even when they are not an exact match, but rather equivalent, such as a variation or nickname.

pdGender is gender coding database built on the same set of names as pdNickname. Users can match the data against the first names on their lists to determine male and female identification.

pdSurname is a new last name file. It identifies last names that are the same even when they are not an exact match, but rather equivalent, such as a variation or a similar sounding or spelled name.

All three innovative software packages embrace a host of similar and compatible features including languages of origin and use as well as fuzzy logic so information can be recognized even when names have misspellings or other typographical errors.

Combined, the three products cover more than 600 languages, dialects, ethnic groups, and races, such as English, Spanish, Portuguese, French, Italian, German, Polish, Russian, Chinese, Japanese, Vietnamese, Korean, Hindustani, Arabic, Persian, and Yiddish, as well as Native American names and ancient Greek, Latin, and Hebrew names.

Plans for the software releases were initially written up in January 2009 and development began in earnest mid-summer of that same year. They were built during parallel development cycles and began to be made available to the public with the release of pdNickname 2.0 and pdGender 2.0 on December 30, 2013. pdSurname, which has the largest database, was launched March 2, 2015.

According to Barbara Adair, “Creation of the master name file these new products result from is the biggest venture our company has ever undertaken. There are thousands of sources for names in scores of languages, and our task was to compare and contrast all this data and create the ultimate name resources.”

“From the start, it was essential to identify the languages and dialects associated names in considerable detail,” she added. “This gives users previously unavailable ethnic demographics linked to the names already on their lists.”

Barbara Adair showed some of the documents employed in construction of the software offerings, including a manuscript from 731 AD, written by a monk named Bebe, listing the earliest English names dating from the Anglo-Saxon era of the early medieval period. The still common personal name ‘Hilda’ is an example from the manuscript.

“Because sources often give diverse information and use different spelling conventions, it was crucial not only to gather all the data possible but also to differentiate between the quality of sources,” the spokesperson explained. “Better information became easier to identify after working with the sources over the course of the first year.”

Barbara Adair concluded, “These database packages are one-of-a-kind proprietary resources that let our users complete projects with significantly more success and in ways that were not possible before. They are very innovative pieces of software and we are encouraged and grateful for the response from our clients. A lot of time and hard work has been put into these efforts and it is a very exciting time for our customers and everyone here at Peacock Data.”

All three ground-breaking software packages are available for immediate download from the company’s website. They come with precision documentation, complete with examples, and perpetual multi-seat site licenses allowing installation on all computers in the same building within a single company or organization.

MORE ABOUT PDNICKNAME >>

MORE ABOUT PDGENDER >>

MORE ABOUT PDSURNAME >>

Optionally, they can also be licensed as part of the company’s pdSuite Names and pdSuite Master Collection software bundles.

About Peacock Data

California-based Peacock Data are the makers of database software products used by business, organizations, churches, schools, researchers, and government. They are an industry leader because of their superior solutions and renowned loyalty to customers.

For more than 20 years Peacock Data’s specialized software has been utilized in applications you use every day.

MORE ABOUT PEACOCK DATA >>

Affiliates program

DO YOU WANT TO SELL PEACOCK DATA PRODUCTS?

The firm’s affiliates program offers a unique way for your website or app to link to the Peacock Data product line. You will be provided with all of the tools necessary to convert your existing traffic into sales along with full support from dedicated affiliate managers. Apply now to join the program and earn substantial rewards!

Key features of pdSurname

Peacock Data released their pdSurname last name software package early last month, and since then it has proven to be their fastest starting new-product launch. Barbara Adair, spokesperson for the west coast-based company, sat down for an interview about the key features of the new software.

The product literature says pdSurname is designed to facilitate finding last names that are not exactly the same but variations or phonetically similar.

Barbara Adair explained, “Often the same person winds up in multiple places on a list because transcribers enter the name slightly differently each time, such as ‘Johnson’ verses ‘Johnsson’, ‘Jonson’, and ‘Joneson’. This can lead to significantly increased costs, notably reduced quality in customer service, and even lawsuits.”

“This is particularly critical in the medical industry where patient records frequently can be in multiple places with slightly different versions of a name. This has lead to terrible consequences in some instances, along with unnecessary medical procedures and generally reduced medical care in many other instances,” the spokesperson exclaimed.

According to Barbara Adair, “The base product includes both true name tree variations gathered from onomastic research, such as ‘Hutchens’ verses ‘Hutchins’ and ‘Parker’ verses ‘Parkers’, along with names that are not true variations, but rather sound or are spelled similarly, such as ‘Gonzalez’ verses ‘Gunsalus’ and ‘Davis’ verses ‘Davisse’.”

“There are also special features built in for last names with prefixes like ‘Mc’, ‘Mac’, ‘O’, ‘De’, ‘La’, ‘Van’, ‘Al’ and ‘St’, which allows us to match names like ‘Garcia’ with ‘De Garcia’ and ‘Van Der Zant’ with ‘Van Zant’ and ‘Zant’, which are possibilities missed by other products,” she added.
Because many possibilities can be returned from a pdSurname query, the software includes a match quality score based on a 1 to 99 scale to order the results from most likely to least likely, which is also a feature usually missing in other products.

As a bonus, in addition to showing related names, the product is enhanced with demographics for each of the more than 335,000 surname formations provided.

“All names give the racial percentage of use for whites, blacks, Hispanics and Latinos, Asians and Pacific Islanders, Native Americans, and multiracial use,” the spokesperson explained. “And most names also include the language or languages of origin and usage, which covers more than 600 languages, dialects, and ethnic groups.”

Barbara Adair noted, “An enhanced Pro edition offers additional matches based on common misspellings and other typographical errors that frequently appear in lists of names. These are derived from a set of new algorithms we call ‘Fuzzy Logic Generation 2.0’ which can pick up such typos as ‘Berryman’ misspelled as ‘Beryman’, missing the double ‘r’, and ‘Polheimer’ phonetically transcribed as ‘Poolhimer’.”

The product literature indicates fuzzy logic generation 2.0 has five layers:

  • Phonetic misspellings
  • Reversed digraphs
  • Double-letter misspellings
  • Missed keystrokes
  • String manipulations

“The new surname software is similar to our pdNickname package, which has been in the field for more than two decades successfully doing for first names, nicknames, and diminutives what pdSurname now does for last names,” Barbara Adair noted.
She added, “In addition to its use for businesses and organizations working with lists of names, all our names software is recommended for genealogical and scholarly research.”

“The new software package is a one-of-a-kind proprietary product that allows users to complete projects with significantly more success and in ways that were not possible before,” concluded Barbara Adair. “It is a very exciting time for Peacock Data and our many customers.”

PDSURNAME SPECIFICATIONS

The software comes in Standard and Pro editions. Both include the same names and features except the Pro version comes equipped with fuzzy logic. Fuzzy logic allows matching when lists have misspellings or other typographical errors. The Standard edition has everything except fuzzy logic.

Total name records: Pro: 109,932,801; Standard: 81,079,801
Zipped size: Pro: 1.2 GB; Standard: 962 MB
Extracted size: Pro: 22.9 GB; Standard: 16.8 GB
Introductory Price: Pro: $371.25; Standard: $224.25

The package also features precision documentation and a perpetual site license allowing installation on all computers in the same building within a single company or organization.

FOR MORE INFORMATION: VISIT THE PDSURNAME WEBPAGE >>

pdSurname is also part of the company’s pdSuite Names (now on sale, $645) and pdSuite Master Collection (now on sale, $795) software bundles.

Fuzzy logic generation 2.0

Peacock Data introduced the next generation of their fuzzy logic technology with last month’s release of the California-based firm’s pdSurname Pro last name matching software.

Accordion to company spokesperson Barbara Adair, “pdSurname facilitates identifying last names that are true variations or phonetically similar, while the fuzzy logic technology in the enhanced Pro edition allows finding names even when there are misspellings or other typographical errors.”

“We introduced fuzzy logic with our pdNickname Pro and pdGender Pro software in late 2013, but the new fuzzy logic generation 2.0 is a great enhancement,” Barbara Adair exclaimed.

According to the company, most of the enhancements were achieved after they developed a giant library of more than 80,000 language rules based on hundreds of dialects from around the world. Barbara Adair said, “Many misspellings occur as transcribers enter the sounds they hear. The character sequences and the sounds they produce are different for each language and situation, such as before, after, or between certain vowels and consonants, so our substitutions are language-rule based.”

The company explained additionally that their algorithms go even further by considering both how a name may sound to someone who speaks English as well as how it may sound to someone who speaks Spanish, which is often different. Barbara Adair explained, “Take the letter-pair ‘SC’ as an example. Before the vowels ‘E’ or ‘I’ it is most likely to be misspelled by an English speaker as ‘SHE’ or ‘SHI’ while a Spanish speaker may hear ‘CHE’ or ‘CHI’ and sometimes ‘YE’ or ‘YI’.”

Company literature indicates the new fuzzy logic generation 2.0 technology has five layers:

1. Phonetic misspellings: such as GUALTIERREZ misspelled as GUALTIEREZ, AAGARD misspelled as OUGHGARD, and YOUNGMAN misspelled as YONGMAN.

2. Reversed letters: such as DIELEMAN misspelled as DEILEMAN and RODREGUEZ misspelled as RODREUGEZ. These algorithms look for errors due to reversed digraphs (two letter sequences that form one phoneme or distinct sound) which are a common typographical issue, such as “IE” substituted for “EI”.

3. Double letter misspellings: such as HUMBER misspelled as HUMBEER and ZWOLLE misspelled as ZWOLE. The most common typographical issues occur with the characters, in order of frequency, “SS”, “EE”, “TT”, “FF”, “LL”, “MM”, and “OO”.

4. Missed keystrokes: such as HUNTER misspelled as UNTER, missing the initial “H”, and TAMERON misspelled as TAMRON, missing the “E” in the middle.

5. Other typographical errors: which cover a variety of additional misspelling issues.

The pdSurname Pro software with the new fuzzy logic generation 2.0 technology is available for immediate download and can currently be purchased at a 25 percent introductory discount (sale, $371.25; regular, $495) or as part of bundles also on sale, pdSuite Names (sale, $645; regular $795) and pdSuite Master Collection (sale, $795; regular, $995).

For users of other Peacock Data name software, Barbara Adair noted, “pdNickname Pro and pdGender Pro will be updated with fuzzy logic generation 2.0 capabilities this fall, and the upgrades will be free for anyone owning the older version.”

MORE ABOUT FUZZY LOGIC >>

MORE ABOUT PDSURNAME PRO >>