| |
Database Quality and Accuracy: |
|
INTRODUCTION Newpaper Database Database Quality Control [Note: this excerpt is from News Media Libraries: A Management Handbook, published in 1993 by Greenwood Press and edited by Barbara Semonche.]Establishing standards for full-text database quality control is a special challenge. Anne Mintz, director of information services at Forbes Inc., does not mince words about errors and quality control in online databases. A section of her 1990 article in Online address "full-text follies." Mintz is referring to the problems online database searchers have with typos and the lack of effective indexing in newspaper databases. By highlighting these gaps and traps to database providers and online searchers, progress can be made in improving the overall products and services of online full-text newspaper databases. An earlier article by Reva Bash (Database, vol. 12, 1989, pp. 15-23) gives eloquent, amusing and insightful testimony on the sins of omission and commission in full-text searching. Bash identifies the following "seven deadly sins of full-text searching:" duplicity, verbosity, wimpiness, irrelevance, sloppiness, hyperbole, and obfuscation. Her article is worthy of careful study by news researchers, particularly database managers. In fairness, providers of full-text databases have improved their products and services since Basch's indictment. Not only full-text news databases have errors; that troublesome problem can be found in bibliographic and numeric databases as well. The fault can lie with the database provider, the software, the vendor or any combination of the three. News library online managers must remain alert to full-text problems and prepare to find solutions. They must also avoid compounding errors by declining to make unwarranted assumptions and misinterpretations of the data at hand. With respect to the fundamental differences between full-text news databases and bibliographic journal databases - the size and frequency of publication - not much can be done. Newspapers publishing 36 to 96 pages (and more) daily differ undeniably from scientific or trade journals publishing approximately the same number of pages just four to six times a year. The sheer volume and rapidity of publication by daily newspapers impedes the kind of quality control over typographical errors routinely maintained by most scholarly journals. Another critical difference is that newspapers frequently have several editions, unlike journals. Matching articles with various editions online is a challenge for the database providers as well as the online searchers. Typically newspapers cannot afford the time, disk storage and money required to duplicate all editions online. Some newspapers, carried by more than one database vendor, suffer variances in search protocols and strategies as well as content. Further, licensing requirements, royalty considerations, copyright restrictions and/or editorial policy affect decisions about which news articles are included in the databases. Decisions by a few database vendors to restrict or discontinue marketing access to their full-text newspaper files present other problems to clients and database providers alike. Finally, clients are beginning to use full-text databases in ways for which they were never intended. Content analysis across several newspapers online is one example of a new application. A "one-size-fits-all" approach to managing these databases is not pleasing to every user. Tailoring full-text databases to a myriad of unique applications increases development costs and end-user fees. With complaints about current full-text search and retrieval costs, it is unlikely that the marketplace will pay for major software overhaul anytime soon. The one new trend already having significant impact on online newspapers is the World Wide Web. Over 800 U.S. newspapers have web sites. While attractive, graphical and colorful in ways that text databases could never be, the overpowering problem of digital archiving of web pages remains. The impact of these digital products on customer access and fees is just beginning to be explored. And just how these changes and trends will affect the commercial online vendors such as Nexis, DataTimes, Dow Jones Information Service, Knight-Ridder/Dialog Service is unknown but being carefully watched. Progress has been made in identifying these problems of accuracy and currency in newspaper databases. Nevertheless, there still remains much to be done by news library managers of full-text databases. It has been forcefully recommended that the News Division of the Special Libraries Association should undertake a serious, long-range study of quality control in full-text online news databases. While not all problems could be solved, at least news researchers responsible for their databases could generate quality-management guidelines. An effort to include complete, current article selection policies (with examples of variances) in each paper's database documentation should have priority. Something in the nature of a comprehensive searcher's guide to newspaper full-text databases needs to expanded, updated, published, and offered online. There may even be value in separate guides to online newspaper archives (print and web versions as well) for in-house news staff, library searchers, and the general public. No one wants an inaccurate database. Everyone makes mistakes. These two indisputable statements add up to the necessity for quality control implemented as an integral part of the entire information flow from reporter to copy editor to library database editor to in-house network administrator to digital archive to searchers. . . and back again! It is a never-ending loop. Errors, omissions, and distortions can enter anytime into the digital archive and it is VERY hard if not impossible to purge them let alone correct them. Database quality control must be a fundamental part of the news organization's policies and procedures. Several elements in each story are critical and lead to confusion when incorrect. For example, the date, the section and the page number are important to persons who might be using the online system as an index to microfilm. A story with errors in date and page fields would be much more difficult, if not impossible to locate. Headlines, credits and bylines are important, since persons searching for information often remember these points and try to find it. A daily quality control list of stories added to the database the previous day, faithfully checked for typographical errors and incorrect page or sections information, is a start for an initial quality control procedure. If it is possible, the quality control assistant should not be one of the persons who enhanced (added "key search terms" to news articles) that edition of the paper because it is easier to spot errors in a first-time reading. In top newspaper libraries with online archives, quality control staffers must be cleared to make changes in the databases. Not all enhancers are so authorized. Quality control is sometimes a seemingly expendable task if the library staff is overburdened with the task of moving the database to the vendor. Still, it is important, even vital for the integrity and credibility of the database to take every precaution. If time is critical, that is there is little or no time for this level of quality control, then heavy reliance is made upon the database software program to offer a smooth, seamless interface, with as much support as possible to reduce rekeying of data. The only way to create such a database software program is to involve the editors, library database staffers and the network administrators. By understanding exactly how the information moves through the system and what decisions are made at each critical juncture, can a reliable, efficient digital archive be created and maintained. THE PROBLEMS: Basically, errors come into print or broadcast two ways: pre-publication and post-publication. The former results from inaccurate information delivered or collected by journalists or misconstrued subsequently in any of a large number and variety of ways. If the original data is contaminate and uncorrected, the final product, print or electronic is unreliable. The latter comes from those instances when material is transferred from one format into another such as from a print format into an electronic one, either a full-text database or a web archive. These conversions offer special challenges to journalists, editors, cyberscouts, and news researchers in maintaining the quality and accuracy of original sources. Errors are exacerbated. Add to these failings is the frustration of trying to correct the errors anywhere along the information pipeline. One of the most recently published articles addressing this issue is an articles published in Columbia Journalism Review, March-April 1998 v36 n6 p13(1). The title is "How accurate are your archives" and the author is Bruce William Oakley, Arkansas Online editor. Oakley states that Newspaper articles archived on electronic databases may not match the information printed on the hard copy. Corrections made at the final proof stage of article preparation may not necessarily be transferred to the electronic database. Other errors may occur in the conversion process. Newspapers need to set up a system that verifies the accuracy of their electronic archives. Other examples exist. Item: an E&P column by Robert Brown revealed researchers' estimate of approximately 50% of all news stories contain some kind of error. The source of these inaccuracies? A combination of reporting and editing errors. Not mentioned in this article, are published errors due to fact checking errors. (Note: see CJR, May-June 1994 for an article "Inside The New Yorker's Fact-Checking Machine.) BEWARE THE DATA TWISTERS: Item: Errors in the 1990 Census yielded serious undercounts of the population. Item: Historical Fact & Fiction: Readers of the latest U.S. history textbooks discover a storehouse of misinformation. For example, Did Harry Truman become president in 1944 after Roosevelt's death? WRONG. Roosevelt died in 1945. Item: WRONG. A constitutional amendment needs to be ratified by two-thirds of the state legislatures. It needs three quarters of the state legislatures. Item: WRONG: 53,000 Americans were killed during WWI. Actually, the figure was 126,000 who died. LIBRARY HAZARDS: Item: OCLC had a software program correcting 30,000 errors a day according to the Chronicle of Higher Education in Feb. 12, 1992. Item: In 1968, a dissertation at Rutgers revealed that librarians in a sample answered questions with a 54.2% accuracy rate. By 1986, the Maryland State Library conducted a study to assess the effectiveness of its library training. It revealed that reference accuracy had improved to 77%. Item: An article in AJR's July/Aug. 1994 issue revealed that mistakes won't die. The article titled, "The Nexis Nightmare," recounts Elizabeth Haworth's (then a research librarian with Newsday) experience. Haworth was queried ten minutes before deadline by an editor, "How many members of the French Foreign Legion fought in Operation Desert Storm?" She did the best she could. Phoned the French Foreign Legion, the French Embassy and even the Pentagon. No luck. Everyone was gone for the day. She knew that she could probably pull a number -- any number -- from a news database such as Nexis, but how reliable would it be? She logged on and found a single citation that said 2,000 legion troops had been sent to Saudi Arabia. She gave the editor the figure with a strong word of caution. "I told him I didn't think that we should use it." The editor decided to go with the number. Haworth speculated that maybe the figure was right or possibly it wasn't. What she knew was that if someone knew, such data was not readily available." Thomas Jefferson, frequently irked by the press' "abandoned spirit of fasehood" once suggested that the four sections of newspapers be labeled: Truths, Probabilities, Possibilities and Lies." By the time the Civil War was underway, accounts of actions were sometimes subheaded "Important if True," a companion plug/disclaimer on its way to evolving into the World War II use by the Boston Globe of "Unconfirmed" over its unverified reports. Richard Lamm, former governor of Colorado, was quoted as saying, "the corrections move by bicycle while the stories move at the speed of light when misquoted about elderly and terminally ill Americans. Librarians say the rise of databases has increased the importance of their jobs, largely because too often reporters suspend their customary skepticism when dealing with them. News researchers should be considers and occasionally are, critical partners in ensuring the accuracy and quality of the information distributed to readers and viewers. Barbara Maxwell, director for USA Today's News Research Department, issues this maxim for her staff: Our newspaper may not be used as a source for verifying information.
Homer E. Martin, Jr., former chief librarian for The Record in
Hackensac, N.J., described in the 1983 publication, "Guidelines for
Newspaper Libraries," the proper procedure for correcting
newspaper files. "The basic rule in making corrections is to mark or label the original item containing the error so that anyone subsequently using the item will be clearly informed of the error. This rule applies to every file and every type of material. Sherry Adams reported on the NewsLib electronic mailing list how she (and a number of others) were fooled on sources for quotations. She wisely observed that just because someone finds something on the net, it doesn't necessarily make it true. She then provided the full text of the corrected quotes. A pair of mantras that NICAR's (National Institute for Computer-Assisted Reporting) Richard Mullins teaches his students at the Missouri School of Journalism is: "All databases are bad," and "All data is dirty." He was referring to large, raw-data files collected from public agencies, but the principle applies to a wider range of databases as well. Editor John Ullmann's path to quality and accuracy in newspapers is simple, but tortuous, involving "line-by-line" accuracy checks to spot logical inconsistencies or information gaffs or gaps in news articles. Cynthia Crossen in her 1994 book Tainted Truth: The Manipulation of Fact in America," gives eloquent testimony to the crisis confronting those of us whose mission is the accuracy of newspaper records and databases.
"It is unlikely that government will save us from poor information Neither can the public take comfort in thinking that the media will protect us from the flood of dubious information that we face everyday. . . ." Selected articles dealing with errors and corrections.
|
| Copyright 2003 - The Park Library - School of Journalism and Mass Communication - University of North Carolina at Chapel Hill |