Archive for February 2010

Be kind—don’t pollute telephone fields

One of the most exasperating things about processing telephone data is all the junk that often gets added next to the numbers. Little notes like “cell”, “parent’s phone”, “call before 8 p.m.” and “disconnected” can wreak havoc when the information is processed with telephone update services or sent through merge/purge, as well as when utilized in-house.

Having the extra information included is particularly destructive when trying to verify telephone numbers or running a reverse append. Often these numbers are flagged as invalid and not properly processed.

Merge/purge can also be harmfully affected if telephone information is used in the match criteria and matches are missed due to the “database pollution”.

The little notes can be a problem in-house as well. They show up when printing telephone lists and when supplying data to your call center. And they can also cause issues when performing queries on the tables.

If notes about telephone numbers are necessary, a separate field should be provided and utilized by the end user. And if the problem persists, database managers can limit the size of the telephone fields so there is not enough room for the notes.

Most importantly, lessons in database etiquette, including a list of do’s and don’ts, should be included in training for anyone who accesses the database tables.

A company or organization’s data resources are among its most important assets. Following a few simple rules when accessing them can greatly help maintain their value and extend their effectiveness.

Be kind—don’t pollute!

Restructuring the pdNickname database

An alternative structure for is to have one record per name with the variations in fields next to it. This tutorial explains how to do it.

Matching and merging names can be tricky. How do you relate William Smith with Bill Smith? The pdNickname database can be utilized to match names that are dissimilar because one has a given first name while another has a nickname or other variation.

Out of the box pdNickname is structured to allow immediate compatibility with the greatest number of database systems as well as to make it easy to become familiar with.

The nickname database is setup with two names per record. The first name field contains the names you are looking up, and in the second is a variation for each name—nickname, diminutive, given name, variant, etc. The same name can be listed several times in the first field, each time with a different variation. (See Figure 1.)

FIGURE 1: PDNICKNAME OUT OF THE BOX

If the names compared are Alexander Jones and Alex Jones, all names matching Alexander (NAME-A) are scanned until a variation is found that matches Alex (NAME-B). This works well, but there are other ways of organizing pdNickname that could work even better for you. In fact, we have restructured the table for utilization in our own services.

An alternative structure is to have one record per name and the variations in fields next to it. It is not practical to have separate fields for each variation, which can range from one to over two hundred. So what we do is have two Memo fields (also known as Long Text), one for close variations (relflag = "1") and the other for more distant variations (relflag = "2"), with the string of variations separated by delimiters for easier matching. (See Figure 2.)

FIGURE 2: PDNICKNAME RESTRUCTURED

Note: when browsing a table, normally you cannot see the content of a Memo or Long Text field because the database keeps it in a separate file. For this screenshot we have made the content visible.

Structured this way, when your program finds a match for NAME-A, it then determines if NAME-B can be found in variation field one or variation field two. This can be faster because you only access one record in each search request.

pdNickname, like all our Database Products, are structured to satisfy most users from the start. But there are many ways to integrate the databases into your system. It is up to you to determine what works best for you. Do not be afraid to experiment.

Start at the finish line

When planning a new data management projects, results will be better, costs lower and headaches lessened if you consider what your objectives are before you begin. Always start by thinking about what you want to do with your data both now and down the road.

The worst kind of database system is one put together piecemeal as new demands arise. At some point it becomes more of a hindrance that a help. Many become monsters that seem to have a life of their own.

Before you design your databases, tables and user interfaces and decide on purchases, consider all the kinds of data you want to track and the best and most resourceful way of doing so. But to do this you need to gather some information first.

Talk to the end users who will utilize your data. Find out what they need and how they will be using it. Just as important, determine what they would like to do in the future and what has frustrated them most about data in the past.

Also talk to those who will be entering information and the techs who will be working directly with the database system. Find out what will make them more efficient and what has previously held them back.

Finally talk to the vendors who will be processing your data and supplying equipment and third-party lists. Ask them what you can do to help them achieve the best results for you and reduce costs without sacrificing quality.

All of this will affect what tables you design, what fields they will contain, what relationships there will be between them and how end-users will access the information. It will also affect what equipment and lists you acquire, when you buy them and who you hire to make it all work.

Your new data management project should not be planned until after gathering the insight needed to establish what the end results should be. Once you have seen the view from the finish line, you will be much better equipped to create a database system that will get you where you want to be days, months and years from now.