pdNickname 3.0 released today

Version 2.0 users can get a free upgrade—see below

This new offering is loaded with almost 400,000 given names and nicknames. This represents virtually every first name found in the United States, and a quarter of the database is international names only found outside the United States.

SEE PDNICKNAME 3.0 NOW >>

The package is also replete with information about the languages associated with the names, their origin, and even has a popularity ranking for all first names appearing in the United States since 1915. It identifies more than 500 languages, dialects, and ethnic groups, from English and Spanish to Arabic and Swahili.

Key Features

  • Nicknames: the heart of the database is a huge collection of more than 200,000 nicknames and the names they are associated with, including short forms, abbreviations, diminutives, and even hypocoristics.
  • Name Variations: different ways of spelling given name and nicknames are identified based on linguistic and onomastic name tree research—some names can have more than a hundred variations.
  • Phonetic Matches: first names that are not true variations but sound similar or have close spellings are identified and rated on a 1 to 99 scale.
  • Languages: we have appended extensive demographic information about languages of usage.
  • Popularity: name popularity is ranked based of U.S. Census and U.S. Social Security records.
  • Special Origins: unique characteristics about the name origins are also provided, including their connections to religion, mythology, historical events, and literature.

A Pro edition even adds fuzzy logic which allows matching when names are entered with typographical errors.

Finally, students, teachers, scholars, and those researching family histories benefit as well because the software is highly recommended for study in genealogy, onomatology, anthroponymy, ethnology, linguistics, and related disciplines.

pdNickname 3.0 is available for immediate download from our website. It can be purchased as a standalone product (Pro, $495; Standard, $299), or as part of bundles now on sale, including pdSuite Names ($595) and pdSuite Master Collection ($695).

SEE PDNICKNAME 3.0 NOW >>

SEE OUR SUITES NOW >>

SPECIAL UPGRADE OFFER FOR 2.0 USERS

If you have pdNickname 2.0, for a limited time you can get a free upgrade to the new version. This applies to standalone versions and suites that contain the product. This offer expires June 30, 2016.

UPGRADE NOW >>

The language of names

According to Barbara Adair, Peacock Data’s chief development coordinator and spokesperson, the language features in the company’s long-standing flagship software packages, pdNickname and pdGender, and their new pdSurname product, “have never been available before on this scale and required a sizable portion of nearly six years of research and development.”

With this landmark software, among other powerful features, users can identify the languages associated with tens of thousands of first names, nicknames, and last names, providing them with critical ethnic and heritage demographics about their clients.

pdNickname is an advanced name and nickname file. It identifies first names that are the same even when they are not an exact match, but rather equivalent, such as a variation or nickname.

pdGender is gender coding database built on the same set of names as pdNickname. Users can match the data against the first names on their lists to determine male and female identification.

pdSurname is a new last name file. It identifies last names that are the same even when they are not an exact match, but rather equivalent, such as a variation or a similar sounding or spelled name.

All three innovative software packages embrace a host of similar and compatible features including languages of origin and use as well as fuzzy logic so information can be recognized even when names have misspellings or other typographical errors.

Combined, the three products cover more than 600 languages, dialects, ethnic groups, and races, such as English, Spanish, Portuguese, French, Italian, German, Polish, Russian, Chinese, Japanese, Vietnamese, Korean, Hindustani, Arabic, Persian, and Yiddish, as well as Native American names and ancient Greek, Latin, and Hebrew names.

Plans for the software releases were initially written up in January 2009 and development began in earnest mid-summer of that same year. They were built during parallel development cycles and began to be made available to the public with the release of pdNickname 2.0 and pdGender 2.0 on December 30, 2013. pdSurname, which has the largest database, was launched March 2, 2015.

According to Barbara Adair, “Creation of the master name file these new products result from is the biggest venture our company has ever undertaken. There are thousands of sources for names in scores of languages, and our task was to compare and contrast all this data and create the ultimate name resources.”

“From the start, it was essential to identify the languages and dialects associated names in considerable detail,” she added. “This gives users previously unavailable ethnic demographics linked to the names already on their lists.”

Barbara Adair showed some of the documents employed in construction of the software offerings, including a manuscript from 731 AD, written by a monk named Bebe, listing the earliest English names dating from the Anglo-Saxon era of the early medieval period. The still common personal name ‘Hilda’ is an example from the manuscript.

“Because sources often give diverse information and use different spelling conventions, it was crucial not only to gather all the data possible but also to differentiate between the quality of sources,” the spokesperson explained. “Better information became easier to identify after working with the sources over the course of the first year.”

Barbara Adair concluded, “These database packages are one-of-a-kind proprietary resources that let our users complete projects with significantly more success and in ways that were not possible before. They are very innovative pieces of software and we are encouraged and grateful for the response from our clients. A lot of time and hard work has been put into these efforts and it is a very exciting time for our customers and everyone here at Peacock Data.”

All three ground-breaking software packages are available for immediate download from the company’s website. They come with precision documentation, complete with examples, and perpetual multi-seat site licenses allowing installation on all computers in the same building within a single company or organization.

MORE ABOUT PDNICKNAME >>

MORE ABOUT PDGENDER >>

MORE ABOUT PDSURNAME >>

Optionally, they can also be licensed as part of the company’s pdSuite Names and pdSuite Master Collection software bundles.

About Peacock Data

California-based Peacock Data are the makers of database software products used by business, organizations, churches, schools, researchers, and government. They are an industry leader because of their superior solutions and renowned loyalty to customers.

For more than 20 years Peacock Data’s specialized software has been utilized in applications you use every day.

MORE ABOUT PEACOCK DATA >>

Affiliates program

DO YOU WANT TO SELL PEACOCK DATA PRODUCTS?

The firm’s affiliates program offers a unique way for your website or app to link to the Peacock Data product line. You will be provided with all of the tools necessary to convert your existing traffic into sales along with full support from dedicated affiliate managers. Apply now to join the program and earn substantial rewards!

Key features of pdSurname

Peacock Data released their pdSurname last name software package early last month, and since then it has proven to be their fastest starting new-product launch. Barbara Adair, spokesperson for the west coast-based company, sat down for an interview about the key features of the new software.

The product literature says pdSurname is designed to facilitate finding last names that are not exactly the same but variations or phonetically similar.

Barbara Adair explained, “Often the same person winds up in multiple places on a list because transcribers enter the name slightly differently each time, such as ‘Johnson’ verses ‘Johnsson’, ‘Jonson’, and ‘Joneson’. This can lead to significantly increased costs, notably reduced quality in customer service, and even lawsuits.”

“This is particularly critical in the medical industry where patient records frequently can be in multiple places with slightly different versions of a name. This has lead to terrible consequences in some instances, along with unnecessary medical procedures and generally reduced medical care in many other instances,” the spokesperson exclaimed.

According to Barbara Adair, “The base product includes both true name tree variations gathered from onomastic research, such as ‘Hutchens’ verses ‘Hutchins’ and ‘Parker’ verses ‘Parkers’, along with names that are not true variations, but rather sound or are spelled similarly, such as ‘Gonzalez’ verses ‘Gunsalus’ and ‘Davis’ verses ‘Davisse’.”

“There are also special features built in for last names with prefixes like ‘Mc’, ‘Mac’, ‘O’, ‘De’, ‘La’, ‘Van’, ‘Al’ and ‘St’, which allows us to match names like ‘Garcia’ with ‘De Garcia’ and ‘Van Der Zant’ with ‘Van Zant’ and ‘Zant’, which are possibilities missed by other products,” she added.
Because many possibilities can be returned from a pdSurname query, the software includes a match quality score based on a 1 to 99 scale to order the results from most likely to least likely, which is also a feature usually missing in other products.

As a bonus, in addition to showing related names, the product is enhanced with demographics for each of the more than 335,000 surname formations provided.

“All names give the racial percentage of use for whites, blacks, Hispanics and Latinos, Asians and Pacific Islanders, Native Americans, and multiracial use,” the spokesperson explained. “And most names also include the language or languages of origin and usage, which covers more than 600 languages, dialects, and ethnic groups.”

Barbara Adair noted, “An enhanced Pro edition offers additional matches based on common misspellings and other typographical errors that frequently appear in lists of names. These are derived from a set of new algorithms we call ‘Fuzzy Logic Generation 2.0’ which can pick up such typos as ‘Berryman’ misspelled as ‘Beryman’, missing the double ‘r’, and ‘Polheimer’ phonetically transcribed as ‘Poolhimer’.”

The product literature indicates fuzzy logic generation 2.0 has five layers:

  • Phonetic misspellings
  • Reversed digraphs
  • Double-letter misspellings
  • Missed keystrokes
  • String manipulations

“The new surname software is similar to our pdNickname package, which has been in the field for more than two decades successfully doing for first names, nicknames, and diminutives what pdSurname now does for last names,” Barbara Adair noted.
She added, “In addition to its use for businesses and organizations working with lists of names, all our names software is recommended for genealogical and scholarly research.”

“The new software package is a one-of-a-kind proprietary product that allows users to complete projects with significantly more success and in ways that were not possible before,” concluded Barbara Adair. “It is a very exciting time for Peacock Data and our many customers.”

PDSURNAME SPECIFICATIONS

The software comes in Standard and Pro editions. Both include the same names and features except the Pro version comes equipped with fuzzy logic. Fuzzy logic allows matching when lists have misspellings or other typographical errors. The Standard edition has everything except fuzzy logic.

Total name records: Pro: 109,932,801; Standard: 81,079,801
Zipped size: Pro: 1.2 GB; Standard: 962 MB
Extracted size: Pro: 22.9 GB; Standard: 16.8 GB
Introductory Price: Pro: $371.25; Standard: $224.25

The package also features precision documentation and a perpetual site license allowing installation on all computers in the same building within a single company or organization.

FOR MORE INFORMATION: VISIT THE PDSURNAME WEBPAGE >>

pdSurname is also part of the company’s pdSuite Names (now on sale, $645) and pdSuite Master Collection (now on sale, $795) software bundles.

Fuzzy logic generation 2.0

Peacock Data introduced the next generation of their fuzzy logic technology with last month’s release of the California-based firm’s pdSurname Pro last name matching software.

Accordion to company spokesperson Barbara Adair, “pdSurname facilitates identifying last names that are true variations or phonetically similar, while the fuzzy logic technology in the enhanced Pro edition allows finding names even when there are misspellings or other typographical errors.”

“We introduced fuzzy logic with our pdNickname Pro and pdGender Pro software in late 2013, but the new fuzzy logic generation 2.0 is a great enhancement,” Barbara Adair exclaimed.

According to the company, most of the enhancements were achieved after they developed a giant library of more than 80,000 language rules based on hundreds of dialects from around the world. Barbara Adair said, “Many misspellings occur as transcribers enter the sounds they hear. The character sequences and the sounds they produce are different for each language and situation, such as before, after, or between certain vowels and consonants, so our substitutions are language-rule based.”

The company explained additionally that their algorithms go even further by considering both how a name may sound to someone who speaks English as well as how it may sound to someone who speaks Spanish, which is often different. Barbara Adair explained, “Take the letter-pair ‘SC’ as an example. Before the vowels ‘E’ or ‘I’ it is most likely to be misspelled by an English speaker as ‘SHE’ or ‘SHI’ while a Spanish speaker may hear ‘CHE’ or ‘CHI’ and sometimes ‘YE’ or ‘YI’.”

Company literature indicates the new fuzzy logic generation 2.0 technology has five layers:

1. Phonetic misspellings: such as GUALTIERREZ misspelled as GUALTIEREZ, AAGARD misspelled as OUGHGARD, and YOUNGMAN misspelled as YONGMAN.

2. Reversed letters: such as DIELEMAN misspelled as DEILEMAN and RODREGUEZ misspelled as RODREUGEZ. These algorithms look for errors due to reversed digraphs (two letter sequences that form one phoneme or distinct sound) which are a common typographical issue, such as “IE” substituted for “EI”.

3. Double letter misspellings: such as HUMBER misspelled as HUMBEER and ZWOLLE misspelled as ZWOLE. The most common typographical issues occur with the characters, in order of frequency, “SS”, “EE”, “TT”, “FF”, “LL”, “MM”, and “OO”.

4. Missed keystrokes: such as HUNTER misspelled as UNTER, missing the initial “H”, and TAMERON misspelled as TAMRON, missing the “E” in the middle.

5. Other typographical errors: which cover a variety of additional misspelling issues.

The pdSurname Pro software with the new fuzzy logic generation 2.0 technology is available for immediate download and can currently be purchased at a 25 percent introductory discount (sale, $371.25; regular, $495) or as part of bundles also on sale, pdSuite Names (sale, $645; regular $795) and pdSuite Master Collection (sale, $795; regular, $995).

For users of other Peacock Data name software, Barbara Adair noted, “pdNickname Pro and pdGender Pro will be updated with fuzzy logic generation 2.0 capabilities this fall, and the upgrades will be free for anyone owning the older version.”

MORE ABOUT FUZZY LOGIC >>

MORE ABOUT PDSURNAME PRO >>

Meet the ISO

Our country reference and demographics database product includes information derived from ISO – International Organization for Standardization – standards and publications. We though you would like to know more about the organization, so meet the ISO…

ISO logoFounded February 23, 1947, the ISO is a network of the national standards institutes of 159 countries, with one member per country. Its Central Secretariat, which coordinates the system, is headquartered in Geneva, Switzerland, and English and French serve as the official languages.

ISO is not an acronym in either of the organization’s official languages and the letters are not delimited with periods like initials. The short-form name is based on the Greek word ἴσος (isos in English), meaning equal. Fitting, since their mission is to equalize and standardize across nations and cultures.

The ISO is the world’s largest developer and publisher of international standards. While not a government organization, it has strong links to governments, and the ISO’s stature gives it the ability to set standards that often become law, either through treaties or national standards.

The organization’s work helps improve international collaboration and communication and promotes steady and equitable growth of international trade. ISO standards encompass most technical and nontechnical fields and affect aerospace, alphabetization and transliteration, chemistry, design, engineering, fuels, image technology, machines, manufacturing, measurements, medical equipment, methods of testing, production, specifications for parts, shipbuilding and tools, among other areas. Most of the standards are reviewed and optimized every five years.

ISO standards are identified in the format:

ISO nnnnn:yyyy: Title

where nnnnn is the number of the standard, yyyy is the year published and Title describes the subject.

EN ISO nnnnn is the European version of the international standard, and BS EN ISO nnnnn is the British variation. Some standards are issued in collaboration with other organizations and are identified accordingly.

Standards not covered by the ISO include electrical and electronic engineering, which is the responsibility of the International Electrotechnical Commission (IEC).

In addition to standards, the ISO also publishes technical reports, specifications, corrigenda and guides. Their documents are copyrighted and they usually charge for copies, however, the organization does not charge for most draft copies delivered in electronic format. Note that drafts are often substantially modified before they are finalized as standards.

ISO standards supplied with our pdCountry database product include:

  • ISO 3166-1 alpha-2 – two-letter country abbreviation
  • ISO 3166-1 alpha-3 – three-letter country abbreviation
  • ISO 3166-1 numeric – three-digit country code
  • ISO 4217 alpha-3 – three-letter national currency abbreviation
  • ISO 639-1 alpha-2 – two-letter language abbreviation

The product also provides data drawn from United Nations (UN), International Olympic Committee (OIC), International Telecommunication Union (ITU), top-level domain (TLD) data, and other information sources.

Peacock Data covers the globe

This month west coast database makers Peacock Data mark the released of a new version of their international reference and demographics database software, pdCountry. It is available in Pro and Standard editions from their website.

According to a company spokesperson, “the new product was delayed about three weeks to await the results of the Scottish independence referendum. We did not want the software to become instantly out-of-date.”

Product information

The world is becoming a smaller place and a handy collection of key country data is invaluable. The new pdCountry version 2.0 fits the bill in good fashion representing the entire globe. This easy-to-use, comprehensive, and up-to-date reference package provides core country information, GeoCoding data, and a host of useful demographic variables.

The Pro version covers demographic information from 1970 through 2012 while the Standard edition covers 2003 through 2012. Both cover 29 United Nations-defined regions (including the World as a region itself) and 211 countries, plus some former countries.

CORE COUNTRY INFORMATION

  • ISO Numeric Country (or Area) Code
  • Regions
  • Country (or Area) Name
  • ISO, FIPS, and IOC Country Abbreviations
  • National Capital
  • Language
  • Citizenry (Noun and Adjective)
  • National Currency
  • ITU Country Calling Code
  • Internet Portals

GEOCODING DATA

  • Latitude and Longitude Coordinates
  • Land and Water Area

DEMOGRAPHIC VARIABLES

  • Population
  • GDP and its breakdown
  • Value Added By Economic Activity
  • Implicit Price Deflators
  • GNI
  • Exchange Rates

A total of 156 fields of information are available including 117 devoted to the demographic variables. Statistics are calculated in multiple ways, including in the national currency, US dollars, current prices, constant 2005 prices, rates, and/or shares. They are drawn from United Nations aggregate statistical data and are the latest information available.

Uses for this database are innumerable, and no company or organization that does international business should be without it. Financial companies, travel agents, webmasters, news agencies, research institutions, schools, students, and government will find it of particular value.

View more product information…
View the product user guide…
Download a sample…

Making a data dictionary

A data dictionary is a document that catalogs the organization, contents and conventions of a database or collection of databases. It lists in written form all the databases, tables, views, fields and data definitions and often information about the table layouts, the relationships between tables and other details about the database schema.

Making a data dictionaryIt does not contain the actual data from the database system, only information necessary to manage and utilize it. It is also not an instruction manual, though a data dictionary is often included as part of an instruction manual.

There is no universal standard as to the level of detail in a data dictionary. What is included is dependent on the audience and the complexity of the database infrastructure. System administrators and programmers will usually have a highly detailed document, sometimes complete with visual depictions, while end users may only have the basics.

Below is an example of a data dictionary for a bookkeeping database with three tables. It shows the kinds of information typically included in a data dictionary, however, it is not meant to be all-inclusive. Other columns that might be provided could show if a field takes null values and the precise points where each field begins and ends. If scientific or technical information is involved, a column indicating normative ranges may be useful. The possibilities are myriad.

A data dictionary is an important part of database system documentation. Devoting the resources needed for a quality document will help insure fewer problems and significantly aid in productivity.

EXAMPLE DATA DICTIONARY FOR A BOOKKEEPING DATABASE

Number of Tables: 3

Table: name of the table. Field: name of the field. Rel: Table relationship key (if any); PK = primary key, FK = foreign key; see Foreign Key Relationships. Type: field data type. Width: field width. Dec: number of decimal points (if any). Description: data definition of the field contents.

Foreign Key Relationships: (1) points to Customers table Id field. (2) points to Sales table Invoice field.

Table Field Rel Type Width Dec Description
CUSTOMERS ID PK Character 10   Customer ID number
NAME   Character 25   Customer name
CUST_TYPE   Character 1   Customer type (key):

A = Active
I = Inactive
P = Prospect
TERMS   Character 1   Payment terms (key):

N = Net Due
P = Prepaid
SALES INVOICE PK Character 4   Invoice number
CUST_ID FK (1) Character 10   Customer ID number
SAL_DATE   Date 8   Date of sale
SAL_AMOUNT   Numeric 10 2 Amount of sale
RECEIPTS ID PK Character 10   Unique ID number
INV_NUM FK (2) Character 4   Invoice number
REC_DATE   Date 8   Date of receipt
REC_AMOUNT   Numeric 10 2 Amount of receipt

Happy birthday Dr. Codd

Relational database theory took shape in the 1960s and 1970s, and most of the thinking and enthusiasm behind it came from Dr. Edgar Frank “Ted” Codd, while working at IBM’s Almaden Research Labs in a then nascent Silicon Valley.

Dr. Codd, born August 19, 1923 on the Isle of Portland in England, studied mathematics and chemistry at Exeter College, Oxford, before serving as a pilot in the Royal Air Force during World War II. He moved to New York in 1948 to work for IBM as a mathematical programmer, but five years later migrated to Ottawa, Canada as a response to the rhetoric of Senator Joseph McCarthy. Not long after he returned to the United States, and in 1965 received a PhD in computer science from the University of Michigan, Ann Arbor. Two years later he again began working for IBM, this time at their research laboratory in San Jose, California, where he soon revolutionized database software by advocating a new relational model for database management.

Based on set theory (a branch of mathematical logic), Dr. Codd’s relational database model was developed at a time when most database platforms employed a hierarchical system, commonly known as the Codasyl database approach, in which the structure of data had to be defined within each application program. Dr. Codd’s relational database used a new query language (eventually becoming SEQUEL and later SQL) to access any combination of data stored in cross-referenced tables.

After an internal IBM paper a year earlier, Dr. Codd outlined his concept in A Relational Model of Data for Large Shared Data Banks published in 1970. However, due to vested interests in IBM’s then current hierarchical database approaches, such as IMS/DB, Dr. Codd’s ideas were not adopted until commercial rivals began implementing them. Eventually in 1981 IBM released its first commercial relational database management system in the form of SQL/DS (Structured Query Language/Data System), and in 1983 released DB2, also SQL based, for the MVS operating system. The two products have coexisted since then, but SQL/DS was rebranded as DB2 for VM and VSE in the late 1990s.

The relational database model has since made its way into countless successful products, including Microsoft SQL Server, Microsoft Access, Microsoft FoxPro and Visual FoxPro, dataBased Intelligence dBase, Alaska Software XBase++, Apollo Database Engine, Apycom Software DBFView, Astersoft DBF Manager, Digital Equipment Rdb (now Oracle Rdb), DS-Datasoft Visual DBU, Elsoft DBF Commander, GrafX Software Clipper and Vulcan.NET, Informix (now IBM Infomix), Multisoft FlagShip, Oracle Database, Recital Software Recital, Relational Technology Ingres, Software Perspectives Cule.Net, Sybase, and xHarbour.com xHarbour, to name a few, and the list goes on.

Dr. Codd did not become wealthy from his ideas, but he received many accolades, and will long be remembered. In 2004 SIGMOD (Association for Computing Machinery), which specializes in large-scale data management problems and databases, renamed its highest prize to the SIGMOD Edgar F. Codd Innovations Award, in his honor.

The inventor of the relational database system died on April 18, 2003 of heart failure at his home in Williams Island, Florida, at the age of 79.

A new twist on ZIP Codes

pdZIP receives a major overhaul this month and includes new 5-digit ZIP Code and ZIP+4 databases, along with an alternate places reference file and some new twists on the concept of ZIP Code databases. It is available in Pro and Standard editions from our website.

There are more than 41,000 United States Postal Service (USPS) 5-digit ZIP Codes, and more than 46 million USPS ZIP+4 records, in the 50 U.S. states, the District of Columbia, military posts, and island areas. pdZIP provides core USPS information about them, along with time zones, area codes, GeoCoding data, and a host of useful demographic variables.

These easy-to-use, comprehensive, and up-to-date packages are designed for those who want to create custom databases or applications, stylize the address information on their mailings, or go beyond what is available from USPS address cleaning services.

Pro and Standard versions

Both the Pro and Standard editions include a 41,000 record 5-digit ZIP Code database along with an alternate places reference file. The Pro version adds 46 million ZIP+4 records. The Standard edition has everything except the ZIP+4 information.

Features
  • (Pro, Standard) More than 41,000 5-digit ZIP Code records
  • (Pro only) More than 46 million ZIP+4 records
  • (Pro, Standard) Alternate places reference file listing preferred place names and acceptable and unacceptable alternate place names for USPS ZIP Codes
  • (Pro, Standard) Comprehensive United States national databases:
    • The 50 U.S. states
    • District of Columbia (federal district)
    • Overseas military areas
      • U.S. Armed Forces Americas (except Canada)
      • U.S. Armed Forces Europe (which serves Europe, Canada, Africa, and the Middle East)
      • U.S. Armed Forces Pacific (which serves Asia and the Pacific)
    • Insular areas:
      • American Samoa
      • Commonwealth of the Northern Mariana Islands
      • Commonwealth of Puerto Rico
      • Guam
      • Midway Islands (also known as Midway Atoll; now inhabited only by caretakers)
      • U.S. Virgin Islands
      • Wake Island (also known as Wake Atoll; now inhabited only by civilian contractors)
    • Associated island areas:
      • Republic of the Marshall Islands
      • Federated States of Micronesia
      • Republic of Palau
  • (Pro, Standard) Core USPS ZIP5 information:
    • USPS 5-digit ZIP Code
    • State postal abbreviation
    • USPS preferred city name
    • Formatted city name
    • Abbreviated city name
    • ZIP Classification Code
    • City-Delivery Carrier Routes Indicator
    • Bulk Mail Sort/Merge Indicator
    • Finance Number
  • (Pro only) Core USPS ZIP4 information:
    • USPS Plus4 Add-on Code
    • Carrier Route
    • Delivery Type Indicator
    • Street alias type and date
    • Alternate Record Indicator
    • Locatable Address Conversion System (LACS) Indicator
    • Move Indicator
    • Company or organization information
    • Puerto Rican urbanization
  • (Pro, Standard) Time Zones, UTC Offsets and Daylight Savings Time
  • (Pro, Standard) Area Codes
  • (Pro, Standard) ZIP5 GeoCoding is at both the Zip Code Tabulation Area (ZCTA) level and county level
  • (Pro only) ZIP+4 GeoCoding is at the census block group level or smaller
  • (Pro, Standard) Latitude and Longitude coordinates in 3 formats:
    • Degrees
    • Converted to radians (15 numeric places)—for trigonometry functions
    • Degrees/Minutes/Seconds— for printing out coordinates in documents and on websites
  • (Pro, Standard) Land and water area
  • (Pro, Standard) Urban and rural indicator
  • (Pro, Standard) Geographic areas identified:
    • Region
    • Division
    • State (FIPS Code and Name)
    • County (FIPS Code and Name)
  • (Pro only) Geographic areas identified:
    • Census Tract
    • Census Block Croup
    • Place (FIPS Code)
    • Congressional District (FIPS Code)
    • Census Zip Code Tabulation Area (ZCTA)
  • (Pro, Standard) Demographics (tabulations and estimates):
    • Population
      • Males
      • Females
      • Median age: Both genders
        • Median age: Males
        • Median age: Females
      • White
      • Black or African American
      • American Indian or Alaska Native
      • Asian
      • Native Hawaiian or other Pacific Islander
      • Other race
      • Two or more races
      • Hispanic or Latino
      • Speaks Spanish at home (age 5 and over)
      • Enrolled in PK-12 (age 3 and over)
      • Enrolled in college
      • Veterans (age 18 and over)
      • Military quarters population
      • College/University student housing population
      • Nursing/Skilled-nursing facility population
      • Adult correctional facilities population
      • Juvenile facilities population
      • Per capita income
      • Unemployed civilian population (age 16 and over)
    • Households
      • Average household size
      • Median household income
      • Households with income below the poverty level (in the past 12 months )
      • Family households
        • Average family size
        • Median family income
      • Non-family households
        • Median non-family income
    • Housing units
      • Median number of rooms
      • Median year built
      • Occupied housing units
        • Householder who is White
        • Householder who is Black or African American
        • Householder who is American Indian or Alaska Native
        • Householder who is Asian
        • Householder who is Native Hawaiian or other Pacific Islander
        • Householder who is another race
        • Householder who is two or more races
        • Hispanic or Latino householder
        • Average number of vehicles
        • Owner-occupied housing units
          • Median home value
        • Renter-occupied housing units
          • Median gross rent as a percentage of income
          • Median gross rent
            • Median contract rent
      • Vacant housing units
  • (Pro, Standard) Comes in multiple file formats:
    • Comma Delimited (CSV)
    • Fixed Length
    • DBF
  • (Pro, Standard) Full documentation
  • (Pro, Standard) Perpetual Site License—allowing installation on all computers in the same building within a single company or organization
  • (Pro, Standard) Available for immediate download
Compatibility

pdZIP utilizes only the ANSI character set (ASCII values 0 to 127 and extended values 128 to 255) and United States Postal Service (USPS) and U.S. Census Bureau coding conventions. These databases are fully compatible with raw USPS data and other databases and applications that make use of their coding conventions.

Product information

What is fuzzy logic?

Both pdNickname 2.x and pdGender 2.x are fully compatible with fuzzy logic. In these products, fuzzy logic involves slight variations in first names and nicknames based on common typographical errors and stylized spelling methods. The Pro edition of these packages comes equipped with fuzzy logic out of the box. Fuzzy logic add-ons can be appended to both the Pro and Standard versions.

The following illustrates the fuzzy logic technology employed in pdNickname 2.x and pdGender 2.x. Further information specific for these packages can be reviewed in the product user documentation found on our support page.

Typographical errors

A large majority of fuzzy logic records involve common typographical errors. These algorithms look at frequently reversed digraphs (a pair of letters used to make one phoneme or distinct sound), phonetically transcribed digraphs, double letters typed as single letters, single letters that are doubled, and other common data entry issues. The most likely typographical errors are determined based on the number of letters, the characters involved, where they are located in the name, and other factors.

The following are examples of fuzzy logic based on common typographical errors:

Example 1 | Real: AL | Fuzzy: ALL | the “L” is repeated
Example 2 | Real: ROCCO | Fuzzy: ROCO | the second “C” is left out
Example 3 | Real: CHRISTOPHER | Fuzzy: CHRISTOFER | the “PH” digraph is phonetically transcribed as “F”
Example 4 | Real: SOPHIA | Fuzzy: SOHPIA | the “PH” digraph is reversed
Example 5 | Real: MARGARET | Fuzzy: MARGRAET | the second “AR” digraph is reversed

Stylized spellings

Other fuzzy logic records involve stylized spelling methods. These algorithms look at non-regular characters such as extended ANSI characters (ASCII values 128 to 255) as well as hyphens, apostrophes, and spaces.

A few of the possible extended characters are “Á” (A-acute), “Ö” (O-umlaut), and “Ñ” (N-tilde). In these cases, “Á” becomes “A” (A-regular), “Ö” becomes “O” (O-regular), “Ñ” becomes “N” (N-regular), and other extended characters are treated similarly.

The following are examples of fuzzy logic based on stylized spellings:

Example 6 | Real: BJÖRK | Fuzzy: BJORK | spelled with O-regular instead of O-umlaut
Example 7 | Real: NICOLÁS | Fuzzy: NICOLAS | spelled with A-regular instead of A-acute
Example 8 | Real: ‘ASHTORET | Fuzzy: ASHTORET | spelled without an apostrophe prefix
Example 9 | Real: ABD-AL-HAMID | Fuzzy: ABDALHAMID | spelled without hyphens delimiting the name parts
Example 10 | Real: JUAN MARÍA | Fuzzy: JUANMARIA | spelled without the space between the two parts and with I-regular instead of I-acute

Fuzzy logic add-on packs and upgrades

Peacock Data releases additional fuzzy logic records nearly every month for pdNickname 2.x and pdGender 2.x in the form of add-on packs which can easily and economically be appended to the main databases extending coverage of typographical errors and stylized spelling methods.

The fuzzy logic technology built into the main Pro product downloads is designed to pick up statistically the most likely mistakes and stylizations. Fuzzy Logic Add-on Packs are designed to pick up less common mistakes and stylizations.

Add-on packs include new algorithms and randomizers and are fully compatible with both the Pro and Standard editions of these packages.

Those licensing the Standard edition of either product can also purchase a Standard to Pro Upgrade Pack which includes all the fuzzy logic records from the Pro edition. Once a Standard version is upgraded, it will be the same as the Pro edition.

Review the documentation provided with the fuzzy logic add-on packs and upgrades for further instructions.