Archive for Tips and Tutorials

What are jam values?

In the world of statistical databases, jam values are common. They are hard-coded information displayed instead of derived measures. They are used to represent unique situations where either the information to be conveyed is an explanation for the absence of data, represented by a symbol in the data display, such as a dot “.”, or the information to be conveyed is an open-ended distribution, such as 1,000,000 or greater, represented by 1000001. Even an empty value or a zero (“0”) will often have a special meaning.

Jam values can also be used to explain why information cannot be disclosed, such as for privacy reasons or because the data does not meet certain filtering criteria.

Data definitions are provided with jam values and normally they are not difficult to utilize. Depending on the parameters of the project, some may even choose to ignore them altogether. However, it is important to understand that they exist, and are not treated like numbers in tabulations and analysis.

Special consideration must be taken when importing data with jam values because they are often in alpha form. Non-numeric values will be converted to zeros (0) if appended into fields that accept only numeric information.

Some users convert jam values to special numbers during the import process so numeric fields can be used. Numeric fields are easier to work with because they do not have to be converted when counted or used in calculations.

You will find jam values frequently employed in American Community Survey (ACS) estimates and margins of error as well as in our own ACS demographics product, pdACS2013.

Using the pdNickname RELFLAG field

is a unique nearly 50,000 record database designed to facilitate comparing sets of first name data based on nicknames, diminutives, pet names, variations and given names. One of the most important fields in the database product is RELFLAG, which stands for “Relationship Flag”.

The RELFLAG field contains one of two possible values:

1 = Close relationship between the name and variation (common variants): Includes closely associated nicknames, diminutives and pet names as well as first name variations that are considered closely related.

2 = More distant relationship between the name and variation (less common variants): Includes alternate forms of the names, often deriving from another culture, as well as nicknames, diminutives and pet names that are relatively uncommon.

PDNICKNAME VARIATIONS FOR THE GIVEN NAME
pdNickname variations for the given name “SAMUAL&rdquo
The RELFLAG field indicates if the name and variation have a (1) close or (2) more distant relationship.

The RELFLAG field is useful for controlling what is to be considered an acceptable match. As more distant relationships are included in matches, the error rate naturally rises. The error rate increase is usually not substantial, but it is measurable in hundredths and tenths of a percent.

RECOMMENDATIONS

RESIDENTIAL: While additional accuracy can be achieved if only close relationships are considered, with residential lists, the margin of error rate increase is almost always very small even when the more distant relationships are included—rarely more than 0.02% in our testing. Therefore, under best practices, it is fully acceptable to use all RELFLAG relationships when matching residential lists. With the exception of the George Foreman family, most errors that might occur result from different given name that share the same nickname or other variation.

BUSINESS AND ORGANIZATION LISTS: On the other hand, with business and organization lists, when the more distant relationships are included the margin of error rate increase is typically higher, compared to residential lists. However, our testing normally shows an increase that is still less than 0.1%, but we have seen it as high as 0.3% with some large lists. Under best practices, it is recommended that only close relationships be considered when processing business and organization lists.

How to zip and unzip files

The following instructions show how to compress and uncompress a file under the Windows, Mac, and Linux operating environments. Note than under most systems you can select multiple files as well as folders/directories to zip into the same archive.

Zipping a file creates a compressed version of the file that is usually considerably smaller than the original file. The zipped version of the file has a .zip file extension.

Unzipping a file reverses the zip process and extracts the file from the compressed archive.

IMPORTANT: Please be cautious about opening .zip files from unknown e-mail senders because they can contain viruses. Confirm with known senders before opening a .zip file.

WINDOWS ME, XP, VISTA, 7, 8 and SERVER 2003 (or higher)

Compress files (zip files)

  1. Locate the file you want to compress.
  2. Right-click the file; then point to Send to and click Compressed (zipped) folder.

    A new compressed zip file is created in the same location.

Uncompress files (unzip files)

  1. Locate the compressed zip file you want to extract.
  2. Do one of the following:
    • To extract a single file or folder, double-click the zip file to open it; then drag the file or folder from the archive to a new location.
    • To extract the entire contents of the zip file, right-click the zip file; then click Extract All; then follow the instructions on the screen.

Windows 95, 98, 98SE and 2000 do not have built in zip file support, and it is necessary to utilize third-party software to create and extract zip files.

MAC OS X (10–10.4)

Compress files (zip files)

  1. Use Finder to locate the file you want to compress.
  2. Control-click or right-click the file icon; then click Compress [. . .]

    A new compressed zip file is created in the same location.

Uncompress files (unzip files)

  1. Use Finder to locate the compressed zip file you want to extract.
  2. Double-click the file icon.

    The files contained in the archive will be extracted to the same location.

MAC OS X (10.5–LION)

Compress files (zip files)

  1. Use Finder to locate the file you want to compress.
  2. Control-click or right-click the file icon; then click Create Archive of [. . .]

    A new compressed zip file is created in the same location.

Uncompress files (unzip files)

  1. Use Finder to locate the compressed zip file you want to extract.
  2. Double-click the file icon.

    The files contained in the archive will be extracted to the same location.

LINUX

Compress files (zip files)

  1. Open a shell prompt.
  2. Enter the following: zip -r filename.zip filedir

    A new compressed zip file is created in the selected location.

Uncompress files (unzip files)

  1. Open a shell prompt.
  2. Enter the following: unzip filename.zip

    The files contained in the archive will be extracted to the same location as the zip file.

Regularly review database systems

Maintaining a successful database infrastructure requires regular review to establish what is going right and where problems may be lurking. This should consist of close consideration of all aspects of the database system, including: hardware & software, administration & input, documentation, staff & training and disaster recovery.

The frequency of database system reviews is dependent on the size of a company or organization and the complexity of the system. At a minimum, we recommend quarterly reviews, but many situations warrant more frequent action.

Prior to conducting any database system review it is important to establish a set of benchmarks and checklists to compare against the findings.

HARDWARE & SOFTWARE: A database system review should begin with an assessment of the applications, computers, workstations, network servers and other devices that underlie and run the system. Decide if they are meeting the expectations and requirements of end users and if they are doing so efficiently. Needs change and technologies grow rapidly, and keeping on top of the machinery is essential in a competitive environment.

ADMINISTRATION & INPUT: Take a long look at how the database system is administered and the input into the tables. Determine if there has been unauthorized augmentation of a database, particularly rogue changes, or if unapproved or non-standard practices and tools are utilized. This will help maintain the system’s physical and logical models as well as prevent costly downtime and gaps in performance.

DOCUMENTATION: The fun often stops for tech people when they have to hang up their programming tools and put their work down on paper. Consequently, documentation is often lacking for database systems. Make sure this is not the case because end use of the system is highly affected. Documentation should include a data dictionary and reflect the current physical and logical state of the infrastructure as well as be understandable to the less tech savvy.

STAFF & TRAINING: The people part of the review is very important because a database system is only as good as those that run and use it. It is important to align duties properly, and the staff needs to have the necessary expertise and training to adequately leverage the technology and be equipped to handle new complexities in the infrastructure. Investment in this area can reap large rewards.

DISASTER RECOVERY: Last but definitely not least, asses the database system in terms of its ability to recover from a disaster. Backups need to be performed regularly and properly stored, and it is vital this includes offsite backup. Additionally, make sure there is an adequate plan for unforeseen complications and worst-case scenarios and that the system’s immunization against viruses, worms and other web-based attacks is at full strength. This is particularly important when there are substantial changes to the database infrastructure.

Database management can become overwhelming as requirements escalate and the volume of data mushrooms. Regular review of a database system is essential to preserve the return on investment, meet objectives and insure long-term success.

Restructuring the pdNickname database

An alternative structure for is to have one record per name with the variations in fields next to it. This tutorial explains how to do it.

Matching and merging names can be tricky. How do you relate William Smith with Bill Smith? The pdNickname database can be utilized to match names that are dissimilar because one has a given first name while another has a nickname or other variation.

Out of the box pdNickname is structured to allow immediate compatibility with the greatest number of database systems as well as to make it easy to become familiar with.

The nickname database is setup with two names per record. The first name field contains the names you are looking up, and in the second is a variation for each name—nickname, diminutive, given name, variant, etc. The same name can be listed several times in the first field, each time with a different variation. (See Figure 1.)

FIGURE 1: PDNICKNAME OUT OF THE BOX

If the names compared are Alexander Jones and Alex Jones, all names matching Alexander (NAME-A) are scanned until a variation is found that matches Alex (NAME-B). This works well, but there are other ways of organizing pdNickname that could work even better for you. In fact, we have restructured the table for utilization in our own services.

An alternative structure is to have one record per name and the variations in fields next to it. It is not practical to have separate fields for each variation, which can range from one to over two hundred. So what we do is have two Memo fields (also known as Long Text), one for close variations (relflag = "1") and the other for more distant variations (relflag = "2"), with the string of variations separated by delimiters for easier matching. (See Figure 2.)

FIGURE 2: PDNICKNAME RESTRUCTURED

Note: when browsing a table, normally you cannot see the content of a Memo or Long Text field because the database keeps it in a separate file. For this screenshot we have made the content visible.

Structured this way, when your program finds a match for NAME-A, it then determines if NAME-B can be found in variation field one or variation field two. This can be faster because you only access one record in each search request.

pdNickname, like all our Database Products, are structured to satisfy most users from the start. But there are many ways to integrate the databases into your system. It is up to you to determine what works best for you. Do not be afraid to experiment.