By Matthew Bisanz
I recently was looking up an article about the United Auto Workers union (UAW) and General Motors (GM) contract negotiations on Google News. Google claims to compile news from 4,500 sources, a number that must raise the spirits of journalism majors everywhere. However, there is a small problem with that number: most of the news sources are nothing but re-brandings of wire services. For instance, Google claimed to have 1,597 articles relating to the setting of a strike deadline at GM. However, almost all of those articles are the re-branding of Associated Press (AP) and Agence France Presse (AFP) news wire articles. Each re-branding service has its own set of forums, page layout, and in some cases, registration system. However, the news industry isn’t the only problem in having too much content and no good way to filter it.
The August section of the University library’s Journal Finder has 32,553 journals in its databases. (Note: that would be a great number to add to the admissions tour). The problem is that unless you’re looking for a specific article, there is no way to judge the value of one journal over another. In effect, you have a huge pile of paper, with no quality control index.
Of course, this is a problem not just at the University, but something that affects every aspect of our lives. Every day we’re bombarded with more and more information, without any method for deciding if it’s good or not. Even Wikipedia, that bastion for poorly sourced research, has a system to grade the quality of its output. What’s more, with these thousands of news services, journal compilation systems, and far-flung databases, it’s possible to thoroughly research a topic and miss critical parts of it because they aren’t cross-linked in the database. Professors often say to look at the footnotes and author’s sources of an article. But many articles today, especially those in business, don’t list any sources. In other cases, you may have a PDF of an article, with the source index on another page that isn’t included.
In short, the way we store and distribute information is slowly breaking down. If anyone here has ever used the JStor database, one of the more popular ones, you’ll know that it’s almost impossible to translate its data into another format like Microsoft Word. One group that has seen this problem, and is beginning to address it is Project Gutenberg. All of it’s data is saved in simple text files that can be read on any computer. Imagine if all the tens of thousands of articles in the University library databases were in simple text format. All that would be required to work with them would be a simple piece of software that could cross-reference different authors across different publishers. Even Google could benefit from having all news stories be published in a simple, machine-readable format. Google News could then easily compare versions, without all the unique HTML formatting and advertising breaks and eliminate duplicate articles.
Part of the problem today is that people are looking to more advanced technology to solve simple problems. For instance, airports have invested in multi-million dollar baggage scanners to detect bombs. These scanners require great amounts of electricity, must be constantly tended by personnel, and if they break down, well then the whole system breaks down. For probably the same cost, or maybe even cheaper, bomb sniffing dogs could be trained. A dog walking up and down a row of luggage can go much faster than a machine pumping out puffs of air and moving things on a conveyor belt. Again, a simple problem made more complex than it needed to be.
This isn’t to say that advances in technology are bad or should be avoided, but rather that one doesn’t need to reinvent the wheel with proprietary publishing formats, company-secret machine technologies, and million-line programs, when there are existing technologies that do the same, if not a better job.
Matthew Bisanz is a graduate student. You may e-mail him at [email protected].