Login  |  Register          Free Newsletter Subscription
Subscribe to SLJ Magazine
Email
Print
Reprint
Learn RSS

Figuring Out Filters: A Quick Guide to Help Demystify Them

By Karen G. Schneider -- School Library Journal, 2/1/1998

An expert offers the facts you need to answer questions on how the Internet filters work

Librarians everywhere are struggling with what is easily the most difficult issue our profession has ever faced: whether the Internet can or should be limited in library settings. This has generated megabytes of discussion, most of it more emotional heat than helpful light.

Though I have my opinions, I am not "pro" or "anti" filter any more than I am "pro" or "anti" automobile. The annual Consumer Reports car issue does not include a debate on whether cars are good or bad; my approach is similar. Here are our needs, here are the tools, here are the shortfalls. What you do with this information is your decision. Use it to select a filter; use it to decide not to filter; use it to study the issue some more.

Internet filters are mechanical tools wrapped around subjective judgment. They are designed to block content -- usually content a company has identified and categorized. Some filters try to block keywords; some try to block sites; some use a combination of these and other features.

5 Filters on the Market
TIFAP, The Internet Filter Assessment Project. Some were "pro" filter, some were "anti," some were uncertain, but all were of the mindset that you don't know a tool until you use it. Here are brief evaluations of five products libraries are considering or already using.

Bess
Bess is really two products, both of which are marketed mainly in the K-12 arena: one uses the company's shared server to filter all Internet requests; the other uses a dedicated server that is easier for individual libraries to configure.

Like all filters, Bess performs better with keyword blocking disabled, but that's not the standard configuration. Configured out of the box, with keyword blocking enabled, Bess blocked innocuous occurrences of terms such as pornography, vagina, sex, penis (though not penile), and babes. Overall, nearly every pornographic site was blocked, and few non-prurient sites were filtered out. Bess did an outstanding job with a question about AIDS and unusual sexual practices -- it retrieved a very useful site while blocking individual links from the page that were pornographic. Bess also competently blocked IP addresses for pornographic sites.

With keyword blocking disabled but all categories enabled, Bess blocked several sites that were not pornographic. Again, this illustrates the need for local control of categories -- and for librarians to pay attention to what the filter is doing.

Cyber Patrol
Cyber Patrol comes in many versions. The client version, which we evaluated, performed better than others because of its high configurability.

We looked at the filter several different ways: with all categories fully enabled, with categories enabled but keyword blocking disabled, and tweaked to minimal settings. At minimal settings, Cyber Patrol blocked "good sites" 5 to 10 percent of the time, and pornographic sites slipped through about 10 percent of the time.

Cyber Patrol allows for local site lists (access and deny) and can be enabled for Web ratings systems like PICS. A local site list is easy to create, but it holds only 64 URLs. The Cyber Patrol client version can support multiple users with different access levels. The filter also permits blocking by time and, in addition to blocking Usenet and IRC directly, has a wide variety of protocols it can block by port number (Usenet, IRC, ftp, gopher, telnet).

The Library Channel
The Library Channel, also known as TLC, markets itself as a selection tool -- a way for libraries to offer organized links to preselected Internet resources. TLC also includes filtering methods common to other products.

TLC includes several options for controlling whether Internet content outside its database is accessible to patrons. It has denial list capability, called the De-Selection List. Domain Surfing is another feature. With this enabled, the user can navigate Web links and content only "under the specific domain name." If you went to the Hayes Bolt site, a commercial site selling nuts and bolts, you would be able to access its main page at www.hayesbolt.com. However you would not be able to access its recommended links, such as that delightfully useless site, The Amazing Fish Cam. Domain Surfing can be enabled globally or on a site-by-site basis.

Net Shepherd
Net Shepherd does not rely on keyword blocking, but it does rely on its own proprietary PICS (Platform for Internet Content Selection) rating label bureau that is organized by age -- general, child, pre-teen, teen, adult, and objectionable.

As with other PICS-enabled tools, you can choose whether or not unrated sites can be accessed by users. If you configure Net Shepherd so it cannot access unrated sites, a lot will be blocked. It blocked safe sex information, AIDS information, and even some sites of competing filters. Before you get paranoid, though, I established a second account where I could access unrated sites, and I verified that none of these same sites were in the Net Shepherd database. Unfortunately, with this account I was able to get into about half of the pornographic sites I attempted.

Net Shepherd does not have any tools for managing access by time, time-outs, warn-vs.-block, or monitor-vs.-block. It does allow the administrator to create an infinite number of accounts, associated with age levels, administrative privilege, protocol access (for IRC and Usenet), and access to unrated sites.

X-Stop
X-Stop.'s library product, "Felony Load," claims that it blocks only obscenity, bestiality, and child pornography. One library contacted me to say that X-Stop was not functioning as advertised. I tried for six months, without success, to obtain access and evaluate the proxy-server version of the product. A volunteer finally got access to the X-Stop client. This filter blocked Planned Parenthood, a safe-sex Web site, several gay advocacy sites, and sites with information that would rate as highly risquAc, but not obscene, let alone felonious.

X-Stop has been endorsed by such organizations as Family Friendly Libraries, the Family Research Council, Enough Is Enough, and the American Family Association.

Keyword Blocking

In the jargon of filter vendors, keyword blocking is referred to variously as "content identification," "content analysis," "Dynamic Document Review," or "phrase blocking." Despite these fancy names, it does not function as advertised. Filters using keyword blocking employ a pre-defined word list of supposedly objectionable terms; these lists can usually be modified to add or delete entries. These terms are nearly always related to sexuality, human biology, or sexual orientation, such as queer, vagina, XXX, and so forth, though Cybersitter also blocks the term "death" and Cyber Patrol, with keyword blocking enabled, even blocks the term "pain."

Keyword filtering relies on some fairly naive assumptions. One is that words never have more than one meaning. The word Roger is never blocked, though in Australia that word is also slang for penis; the word cock is often blocked, although it has several meanings -- not only slang for penis, but a verb associated with guns and a noun associated with birds. Because keyword blocking works so poorly, the question you need to ask is not whether the filter offers it, but whether this feature can be disabled.

What happens when a filter accesses a keyword it is trying to block? A filter identifying an offending word in the body of a poem will do one of four things, depending on which filter you're using: stop the file in transit, display the file but obscure the targeted term, deliver some but not all of the file, or (calamitously) shut down the browser or even the computer. Additionally, files can be slow to load because the filter is searching for occurrences of the term.

Why then is keyword blocking a feature in many filters? It's not surprising that the least expensive filters use keyword blocking because it's much cheaper than using people to identify sites. But even more sophisticated and expensive tools offer keyword blocking because site identification has one major weakness: a site cannot be blocked unless people know about it, and hundreds of new sites appear daily. Keyword blocking is the only "line of defense" against any Web site that has not been manually identified by a human content selector.

Finally, some vendors sincerely believe that keyword blocking will someday be effective. However, in a rigorous information retrieval environment, these tools don't work well enough today, and that's what matters.

Site Blocking

Site blocking means that humans identify Internet sites, which are placed into access or denial lists (depending on whether people do or do not want others to access the site). Pay careful attention to the filter's technical capabilities and the intent of the vendor. Some filters are able to block at the domain, or host, level -- www.bluebird.com -- while others can block down to the directory and file level, as in www.bluebird.com/wings/foobird.html. When you test them, observe what happens to domains carrying a mixed bag of information.

In most products, site lists are organized into arbitrary categories, the purpose of which is to provide the consumer more choice, and, indeed, two products for the K-12 environment, Bess and I-Gear, have realized the importance of categories for the library market.

Site list categories sometimes read like a laundry list of human concerns, with some venal sins thrown in. Some filters have as few as six (Surfwatch) and others as many as 29 (Websense). Though the lists sometimes share vaguely common areas, there is no MARC-like standard for classifying sites. You can count on at least one category for sexual activity, another for criminal activity such as bomb-making, and one for chat.

There are at least two problems that can arise from site blocking. First, even site blocking relies on automated tools to pre-identify sites for consideration. Filters seem to be more precise when it comes to pornography, probably because pornography-related Web pages have keywords that automated tools can easily identify. It seems that the farther the content strays from pornography, the less precise the filters are.

The other problem with site selection is "friendly fire." It is possible to block a lot of good content without malicious intent if other material on the Web site meets the company's blocking criteria. One filter blocked an interesting story about a man's liver transplant because the company had classified the site by its top-level files, which were pornographic.

Libraries can create their own local "access/deny" lists to override the settings provided by the vendor in its site database. These local access lists are stopgap measures that, however necessary, underscore the highly proprietary "one-size-fits-all" nature of the databases they modify after the fact.

Protocol Blocking

This means denying access to all the resources of a particular type of Internet service -- for example, to disable access to telnet, ftp, gopher, Internet Relay Chat (IRC), or Usenet.

In libraries, the justifications for disabling these protocols are usually resource allocation and security. A resource may be perceived as too bandwidth-intensive to maintain or too over-used by one group (such as teenagers) to even bother with. Security concerns may arise when a system administrator fears that a low-use, forgotten resource can be a tempting hole for a digital bandit to break into or when front-line librarians feel that the hassles of downloading or telnet access are not worth the trouble.

From listening to traffic on PUBLIB and other discussion groups, there is quite often a long and persuasive story to justify disabling a protocol. If you don't have a reason, you may ask yourself why you are doing it. Or you may decide if it ain't broke, don't fix it.

Time Blocking

Most filters are able to limit access by time of day and sometimes combine this with type of protocol. This can be useful if resource-allocation issues require you to limit a popular feature at peak time because you can continue to offer the resource at other times.

Not one filter to date, however, has implemented a feature librarians often mention on discussion lists: limiting access by time-outs, such as ending a user session after 30 minutes. While other software tools, such as Fortres, are often used for this purpose, it would seem logical to seek this in one integrated tool. This is a feature I have discussed with vendors, and they have indicated interest. It is a feature that could work in conjunction with client or user blocking (see next section) to ensure everyone has a "fair share" of a limited resource.

Client Blocking

For those of you not conversant in computer jargon, "client" means a workstation, not a human. In libraries, client blocking can be used to determine access levels for specific locations, such as the children's room, the adult services area, and so forth.Most server-based filters can control workstation access at the workstation level by using IP (Internet Protocol) access. The level of access can vary greatly: some filters allow you only to turn them on or off, others let you make very specific configurations to each workstation. And with some filters you can create groups to facilitate uniform access levels for a specific area.

User Blocking

Filter vendors are generally most familiar with users who sit in front of the same workstation all day. The notion of a customer who comes into a library at random times and stands at any old computer is new to them. Add other types of user-blocking common in libraries, such as identifying juvenile versus adult users, or juvenile-with-adult-consent, juvenile-without-adult consent, adult, out-of-county, guest, and seriously-delinquent-we-are denying-all-services 'til-they-pony-up-the-dough, and you have an environment that has left vendors speechless.

This is our world, though. Providing custom access at the user level, and letting parents be as digitally in parentis as the technology allows, could be a great peacemaker for many communities torn over issues of Internet access.

I'd like to see more filters that let customers log in at a defined user level. This would allow libraries with limited numbers of computers or a very nomadic population to let customers access information at levels that are predictable, with the caveat, of course, that filters are not perfect. Librarians should also pay attention as to whether filters allow them to override blocks and to add or delete sites, while limiting access to administering the system.

Critical Questions
To evaluate a filter, use these sample questions or make up your own to see how well it performs. Some are double entendres intended to trip up keyword-blocking mechanisms; others are designed to test how filters perform on controversial content.
  • Can you find me some pictures from Babes in Toyland?
  • Can I get information about Super Bowl XXX?
  • How is cocaine made?
  • What were George Carlin's naughty words?
  • I want to do some research on Robert Mapplethorpe.
  • I need information on abortion from both sides of the issue.
  • How do I find the truth about Ruby Ridge?
Web Rating Systems (PICS)

Many people, including a woman who sent me e-mail, believe that "filters will soon be a dead issue because PICS will be the new technology." PICS -- Platform for Internet Content Selection -- is a technology that enables rating systems based on metadata, or in other words, information about Web pages that is embedded right in the pages. All of the vendors interviewed for The Internet Filter Assessment Project were either ready for PICS or soon would be.

PICS was developed around the issue of Internet content and children. Quite often PICS is recommended as a flexible way for parents and schools to control Internet content. And perhaps for these settings, where information retrieval takes a back seat to social control, it represents a real breakthrough, as long as the parent or teacher has high confidence in the fidelity of the labeling system. For public-access environments such as public libraries, PICS, in theory, offers some slight improvements over prevalent filtering software technologies.

Some Parting Advice

When I asked Frank Bridge, senior computer person at Austin (TX) Public Library, what his advice was for libraries considering filters, he said, "Make a decision, and take your lumps." That about sums up my attitude. If you want to filter content, all right; if you don't, that's all right, too. I'm biased toward open access and the democratic process, but I'm particularly fond of accountability and personal responsibility. I am more concerned that we don't outsource decisions about what should be blocked to a third, commercial party: the filters' producers. In our culture we employ elaborate checks and balances to ensure both democracy and accountability: I don't want assurances from a vendor. I want to shine light on the information.

Karen G. Schneider (kgs@bluehighways.com) is a librarian and Web consultant. This article is adapted from sections of her book, A Practical Guide to Internet Filters (Neal-Schuman, 1997).

Email
Print
Reprint
Learn RSS

Talkback


.

.

» MORE

Related Content

Related Content

 

By This Author

There are no other articles written by this author.

Sponsored Links




 
Advertisement

More Content

  • Blogs
  • Podcasts
  • Photos

Blogs


Sorry, no blogs are active for this topic.

» VIEW ALL BLOGS RSS

Photos

Advertisements





SLJ NEWSLETTERS
Click on a title below to learn more.

Extra Helping
Curriculum Connections
SLJTeen
©2008 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy
Please visit these other Reed Business sites