The first part of this column, Anatomy of a database, part 1, discussed the first four years of research and development for Peacock Data’s new name database products:
pdNickname 2.0 is an advanced name and nickname file used by businesses and organizations to merge database records.
pdGender 2.0 is a gender coding database built on the same set of names. Users can match the data against the first names on their lists to establish male and female identification.
Both upgrades embrace a host of similar and compatible features including languages of origin and use for each name as well as fuzzy logic so information can be recognized even when lists have typographical errors or uncommon spellings. They were built during the same development cycle because both are extracted from the same master file.
To recap, the main product research and development began in early 2009 and was completed by late 2012. Then beta versions of the new products entered field testing in January 2013.
According to the company’s chief development coordinator Barbara Adair, “By 2013 early planning for version 3.0 of the products was already underway and included new fuzzy logic technology designed to work with typographical errors and uncommon spellings. Then development proceeded so well that in April 2013 the new technology was moved up to the version 2.0 cycle.”
Barbara pointed out, “The most complex fuzzy logic involves predicting likely misspellings or alterations. We look at numerous factors that may occur in the spelling of a name. Common examples are frequently reversed digraphs (a pair of letters used to make one phoneme or distinct sound), phonetic transcriptions, double letters typed as single letters, non-common characters, the number of letters in a name, where elements occur in a name, and hundreds of other possible factors.”
“A lot of research and field trials have gone into creating the fuzzy logic algorithms and their inclusion in our new products will substantially increase their power for users,” she added.
“The difference between a real name and a fuzzy version can be very slight and even difficult to notice at first glance,” Barbara said. “But they are different and can make a big difference in the success rate for businesses and organizations working with lists of names.”
Barbara notes, “A sizable majority of the Pro edition of both new products is built with fuzzy logic, but users not ready to dive into the new technology can purchase a Standard edition without fuzzy logic and easily add it later when they are ready by contacting the company for an upgrade.”
As for the easiest part of development, Barbara quickly cited the special precision gender coding information in pdGender filtered for languages, rare usage of unisex names by one gender, and other criteria.
“By the time we had established the language information in the master file and flagged name types and rare unisex usages, it was actually quite easy to draw out the gender coding fields,” she said. “This is a testament to the quality of the information and how straightforward it is to work with.”
Barbara said, “The new products do have a learning curve but are ultimately very easy to exploit. It may take a few uses, but those working with the data will appreciate more and more how the information is organized and presented. A lot of thought and field testing has gone into this.”
One result of the decision to build pdNickname and pdGender from the same master file is the strong compatibility between the two offerings.
“While pdNickname and pdGender can easily be used separately, when used jointly they make excellent partners,” Barbara said. “They are comprised of the same set of names and can be linked together with little effort.”
On November 1, 2013, Peacock Data demonstrated the products in front of participants gathered in their Chatsworth, California offices. By this time the new releases were almost ready to go and the development team working under Barbara began tweaking the final layouts and authoring the product documentation.
pdNickname 2.0 Pro and pdGender 2.0 Pro were released on Monday, December 30, 2013 and the Standard editions (without fuzzy logic) made their debut two weeks later.
pdNickname 2.0 Pro has 3.9 million records, including 2.61 million with fuzzy logic, and is 2.9 GB counting all formats and files. pdNickname 2.0 Standard has 1.28 million records, does not have fuzzy logic, and is 964 MG.
pdGender 2.0 Pro has 140,000 records, including 80,000 with fuzzy logic, and is 80.6 MB. pdGender 2.0 Standard has 60,000 records, does not have fuzzy logic, and is 25.5 MB.