Internet firewalls are a hot commodity and there are an increasing number of products coming to market. In the past, there were only about five firewall vendors. As of this writing there are about thirty five and more appear every day. The response from the firewall customer community has been, predictably, confusion. Customers who want to purchase a firewall, who are not familiar with all the ins and outs of the technology are justifiably worried that they may be buying something that is not right for them - so they are asking for a way of making their decision easier. This paper contains the author's opinions, as a long-time firewall designer, on the topic of firewall certification and testing.
Firewalls right now are a highly competitive market place. Vendors and their sales teams are constantly asked to differentiate their products against those of their competition, and the pressure is intense. The end result is that things which are very similar are touted as very different, and some types of technologies are cast as inadequate while others are positioned as superior. It's difficult to sort out the hype from the substance.
Can any single authority sort out the hype and define what is a "good" firewall? That's what certification implies, or, more precisely, that's what firewall buyers will assume certification implies. The real problem is that a "good" firewall for any given purpose depends a lot on the customer's needs, security requirements, and budget. For an authority to specify evaluation criteria for good firewalls, we'd all have to be using them for the same purpose, and they'd all have to have roughly the same properties.
In the arena of government secure computing, the Orange Book is a guideline for the desirable properties of computer systems for handling classified or secret material. The Orange Book model is based in a 1960-1970's mind-set wherein many users share a mainframe, attached to it via hard-wired terminals (some of our younger readers may have never worked on such things, but trust me, they did exist once). The security goal of these systems was to keep one user from reading another user's data on the same machine.
The Orange Book provides a very rigid requirements document, specifying the mandatory features systems had to have at a given "level" of security. What's interesting about the Orange Book approach is that it implicitly contains a notion of what the "best possible" security is (A1) and it completely ignores what a user might want to do with the system. It is for this reason that the Orange Book succeeds and fails. It is possible for the Orange Book evaluation approach to work because it has a limited set of objectives (a very comprehensive and restrictive laundry list) as well as a captive audience (DOD users and contractors). The question of whether or not the systems are useable, useful, and deployable has been loudly answered by the market and the non-captive user community.
Not all firewalls are the same; many have very different design goals and objectives. Some, such as router-based firewalls, are designed for cost, speed and flexibility. Properly configured commercial routers, for some applications, provide excellent security. They might not be adequate for other customer applications, such as ones requiring advanced audit trail, traffic screening, or a more restrictive access policy. Indeed, for some applications, a correctly configured Windows-based SMTP gateway (e.g., a Cc:Mail SMTP server) might provide all the firewall functionality a site needs, as well as adequate security.
The problem then is that if an Orange Book for firewalls is developed, it will either be so general as to be nearly useless, or it will be so specific as to be a marketing club that vendors seek to manipulate to hold over eachother's heads. The prospect of vendors lobbying a hypothetical certification authority to require functionality that gives them a market edge is not only disturbing, it's also highly likely. The current standards arena ("Arena" is the right word to use here!) is now a highly-charged battlefield in which vendors are actively lobbying standards makers to manipulate present and future standards to their advantage. It would be a shame to see such a thing happen with firewalls, but standardization of what is "good" in firewalls will make this inevitable.
It is my belief that if there is an attempt to establish a certification authority for firewalls, that we will wind up with several. The vendors who are out lobbied on the first certification authority's laundry list will band together and create their own. Ad nauseam. So, we'll either see one certification authority in which the laundry list is so vague as to be almost useless, or we'll see competing certification authorities, none of which have any integrity except to their constituent vendors.
Implicit in the issue of "certification" is the matter of testing, and that's a really tough nut to crack. Before you can certify a firewall, you need to be able to measure it against some kind of yardstick and determine if it is adequate. Even the concept of adequacy is slippery to come to grips with. A firewall may be adequate from a security perspective but unable to do the job because of some special requirement, cost, or whatever.
More importantly, a firewall needs to be correct for its proposed use and that needs to be taken into account when it is "certified." In the past I've given the example of a highly secure high assurance firewall for Email only, which can be easily implemented using a screening router and a UNIX machine. In some people's eyes that might not even be a "firewall" -- a rigid code for certification likely would not accept such a firewall as OK.
I believe there are 2 approaches, which are not necessarily mutually exclusive or incompatible:
"Checklist" testing would amount to running SATAN++ against the firewall and failing it if SATAN++ found a hole. Do not pass go, do not collect $200. The problem with this approach is that it is very limited: a bug that we don't test for in SATAN++ could slice right through the firewall tomorrow and we'd have to invalidate the whole certification and recertify. The advantage of the "checklist" approach is that it's cheap, quick, easy, and it lets a vendor put a certification "seal of approval" on their product and everyone can get a quick set of warm fuzzies and tell their boss they have exercised due diligence.
Design-oriented testing is when you walk into the room where the engineers who wrote the firewall sit, and start with the question: "Why do you think this firewall protects networks and itself effectively?" and go from there. Depending on the answers they give you, you then formulate a set of tests which propose to verify the properties they claim the firewall has. So, if I tell you my firewall works by testing the psychicintent in each packet, a test would be derived whereby we would send malicious packets at the firewall and see if they were blocked. Then we'd send the same packets without thinking nasty thoughts while we did it, and see if they went through. In other words, the test is a custom-tailored approach that matches the design of the system as we understand it. The problem with design-oriented testing is that it's hard. It takes skills that are not presently common - I only know 5 people that I would believe could do a good job of it. It's expensive, slow, and it's hard to explain. To even explain or understand a serious red team design review requires a pretty high level of expertise.
I've heard scary stories of people doing "firewall testing" of UNIX-based firewalls who do not understand UNIX. So, for example, they will tell you the firewall is insecure if the sendmail executable has not been deleted. So their checklist is maybe a little bit off. :) I've heard of other scary stories about people getting an auditor for a firewall and having a CNE appear. It's a networking problem, so who is better qualified than a Certified Network Engineer, right? If someone hired me to do a design-oriented test of a VMS firewall, that'd be pretty ridiculous, too - I'm a UNIX guru, and am completely unqualified to find a hole in a VMS product.
The market is ripe right now for someone to come along and start certifying firewalls. NSA will probably do it for their customer base, which is government only. As such they will slant their "What is good" requirements to meet their political/technological agenda: NSA approved crypto only, and Fortezza. The question is: If someone starts certifying firewalls, will the certification have any intellectual integrity? Will the certificate just be a sticker that the vendor paid for, or will it mean something?
I recently rather derisively dismissed an RFI from a large consulting company that wants to hire "firewall test consultants" and which asked for a detailed writeup of the methodology used. (My response was a description of design-oriented testing) From the layout of the RFI it was pretty clear that they were building a laundry list and were canvassing other consultants to help fill out their own laundry list. Being certified as "secure" on those terms should not make anyone sleep better at night. Big laundry lists are better than small laundry lists but if you were to look at the set of facts that SATAN1.0 tested for, there are at least 4 new things since it's release that have been discovered. If SATAN1.0 were your sole firewall test "methodology" you could be wide open to attack, right now.
The testers and certification authorities should be the top experts in the field for the particular type of firewall you are talking about. That means that if it's a VMS based firewall, it'd better be a VMS guru, not smb, ches, or mjr. If it's a router, then it should be someone who really knows routers. The degree to which you can trust a certification depends a lot on the credentials of the certifiers, how much they have at stake in the process, and how well their test methodology has survived peer review.
Don't expect to see vendors or testers assuming liability for security problems with firewalls. Firewalls are, by their nature, easy to configure and reconfigure. They are also easy to bypass. For someone to assume liability for a firewall's installation, they'd have to have some kind of guarantee that the customer would not somehow alter, weaken, or bypass it. Another problem with assigning liability is that the provenance of the attack would have to be clearly identified. In many break-ins, it's almost impossible to tell how the attacker initially got into the network - no firewall vendor or testing authority is going to be very happy about having to defend their firewall against hordes of lawyers, when a customer's network gets broken into via a modem pool. But if there's no liability, what's the point of a certificate? The (intended to be humorous)"Marcus J. Ranum Certified firewall" logo at the top of this document is worth no more than the paper it's printed on; and it's not even printed on paper. Print it out and tape it to your firewall, then send me a check for $150: maybe you'll sleep better at night with it there. If you don't want to send $150, send $10, but it'll only be 1/15th as secure.
First, let's pretend we're going to attack the firewall host. So we grab our copy of SATAN++, give it the IP address of the firewall, and tell it to attack. In some cases, the results we get back will indicate something valuable. But, let's pretend that the firewall we're testing is a firewall such as a SunScreen or a bridging firewall, and it has no IP address! Suddenly, one of our most important testing tools and methods has become meaningless. Does that mean the firewall is completely secure against attack? In some cases, it may, but in other cases it may be that the firewall is accessible via some other control means that our copy of SATAN++ does not know about.
Another scary scenario in the SATAN++ testing approach is a firewall that is actually not a firewall at all, but which simply passes SATAN++ tests by luck or design. For example, one thing SATAN checks for is the version of Sendmail a system is running, if it is running Sendmail. Let's suppose that our fictional firewall is running Sendmail V3.0 (an ancient version) with no bug fixes applied. However, when the firewall's mailer was configured, it was configured to not display the version prompt in the SMTP welcome banner. So SATAN++ decides it's a new version of the mailer and does not complain. I can imagine developing a "firewall" that is a complete leaking sieve, but which passes SATAN with no warning. Presently, however, one of the questions most often asked of firewall vendors is: "does it pass SATAN?" There's a lot of ignorance out there for us to overcome.
Let's pretend that the next phase of our checklist testing procedure is that we run SATAN++ against all the systems behind the firewall. The rationale for this is that if the firewall is a screening router-type firewall, we'll get an idea about what it screens or permits, and if the firewall is a dual-homed-block-everything-type firewall, we'll verify that it does, indeed, block everything. The problems with this thinking are serious. If the firewall is a screening router-type firewall, we may simply wind up measuring what version of Sendmail is running on the test-bed network, and not the customer network. Also, we may not probe a system that happens to be down right now, which is otherwise wide open to attack.
With a screening router-type firewall, the deployed configuration is critical to its configuration. Suppose I submit a firewall for testing that consists of a screening router with a default set of rules that block all traffic except outgoing WWW? It's going to register on the tests as impregnable, but just about any customer who installs it is going to have to weaken its default rules considerably in order to get any use out of it. Is the test I performed still valid? What about a firewall that ships with several "standard operating policies" of which the default (how the firewall is tested) is very restrictive and conservative, and the remainder of which are weak?
In the firewall product summaries effort, I have tried to encourage the community to recognize that firewalls embody two protective relationships:
Testing a firewall needs to take this into consideration, as well! To meaningfully test the firewall, we need to consider the strengths and weaknesses of the systems behind it. Doing this in a lab is impossible, since very few real world sites will be exactly like the test lab.
Last but not least, many firewalls embody their own protocols, which the firewall may or may not rely on for its security. An example would be the authentication server (authsrv) from the firewall toolkit. Suppose there is a hole in the protocol or its implementation? (There are none that I know of; this is just an example) An automated testing suite will not uncover that fact. One firewall vendor that shall remain nameless used to sell a firewall that they remotely managed for customers that needed direct support. When remote management was necessary, the vendor telnetted into the firewall, over the Internet, using a clear-text root password. What is the point of testing a firewall for elaborate holes, when there is a gaping weakness in its managment procedures? From a testing standpoint, such weaknesses will not be visible. They only become visible if a detailed and very intensive penetration test is attempted.
The preceeding examples paint a deliberately bleak picture. But it's important to look at the ways in which a test may accidentally become meaningless, in order to decide if it's worth doing, and how much faith to place in it. Automated testing in a lab is not going to provide adequate coverage, and firewalls that are highly adaptable and user-customizable will be practically impossible to test, except in a very general way.
The top-down design-oriented testing approach is simple: you start looking at the firewall from a very high level, see if it makes sense at that level, then look at it at increasingly lower levels until you've gotten to a point where you're pretty much trusting your components or you've pushed things as far as it's sensible to go. Determining how far is sensible is a difficult judgement, and it depends a lot on the application to which the firewall is being put. This is where I agree 100% with the Orange Book guys: if the firewall is protecting the launch console for an H-bomb, then it is entirely appropriate to not only review the high level design, but all the source code. Indeed, the source code for system library routines used, the compiler they were compiled with, the kernel itself, the processor it runs on, etc. For "normal" folks there has to be some happy medium and it's very dependent on what you have at stake. To determine the happy medium, keep things in perspective.
Many of the sites I have visitted have firewalls where the firewall is the only thing on their network that is remotely close to secured. They have an Internet link on the other side of the firewall, and 5 T1 lines to business partners, with little or no protection on those links, and the business partners in turn have links to their competitors, the Internet, and whatever else your worst nightmares can imagine. Often, a large, unspecified, percentage of the users have modems on their desktops and run PPP stacks to whoever they feel like. In such an environment, the firewall is the last thing that is likely to be broken into. It's still worth checking that the firewall works, is nailed down tightly, and that the firewall appears to have security properties that meet the requirements, and that are implemented correctly: but it's probably not worth worrying about a Ken Thompson trojan in the compiler the firewall was built with.
First off, you need to know what you're testing, how it works, and why it works the way it does. To find this out, you need to interview the designers, or have access to very high-quality design documentation about the product. After all, if the high-level design makes no sense, you may not need to test any farther. Understanding the high-level design will permit you to guess where there may be weaknesses in the implementation, and what parts of the implementation are particularly critical to the functioning of the firewall. If the designers and design documentation do not appear to recognize these potential areas of weakness, and to have countered them, there may be a possibly fruitful area for penetration testing.
If the high level design is comprehensive and comprehensible, it should be easy enough to determine what the basic assumptions of the firewall are. From these basic assumptions, derive some simple tests. Some of the tests may be the kind that are incorporated in tools like SATAN: use such tools as directed testing engines and they're very effective. For example, if I am dealing with a firewall that blocks all traffic between two networks, I should be able to run a complete network maximum SATAN scan on a network behind the firewall, and it had better come up 100% dry.
Continue to postulate problems, based on what the designers of the firewall identify its properties as being. Another example would be a firewall that has no IP address. If the firewall has no IP address, then it should not ARP if I ARP it, it should not ping if I ping the network broadcast address - in other words it should act like it has no IP address.
Lastly, determine the protective relationships that the firewall implements: if the firewall passes some of the responsibility for security to hosts behind it, determine what those responsibilities are and how they are documented. For example, a screening router-type firewall may permit telnet through from the outside to specific hosts on the inside. If that is the case, those hosts may be vulnerable to holes in the vendor-supplied telnet daemon. The documentation should reflect this.
Top-down design-oriented firewall testing is, perforce, a rambling process. Essentially, you must play the role of a detective, re-creating the design path the engineers followed, and turning up any clues they missed.
The Orange Book approach was a commercial failure because of time-to-market and expense. Rigorous firewall testing suffers from the same kind of problems, if carried too far. It is too expertise-intensive as well; there are not enough real firewall experts in the industry to be brought together in one place to test firewalls, and most of them are either competitors or so insanely busy that getting a slice of their time is nearly impossible.
Somewhere between the effort, expense, and time of Orange Book style evaluation, and the "apparently OK firewall" sticker there is a happy medium. To be useful, tests have to take into account the design principles of what they are tested; they cannot be mindlessly automated attack scripts. To be timely, repeatable, and cost-effective, tests cannot require expensive talent, elaborate design reviews, and development of customized attack tools. I believe that what is needed is a process of peer review for testing methodologies, and perhaps some give-and-take between the vendor community. Vendors need to stop selling firewalls based on smoke and mirrors. Every time a sales rep tells a potential customer, "don't buy an XYZ, I hear they got broken into." (yes, they do that) the firewall technology is called into question as a whole; it is penny-wise and pound foolish to call the basis of an entire technology into question in order to make a sale, but that is what happens. Firewall customers need to not only ask whether a firewall provides the functionality they want, but how the vendor tested it, and if there was some other procedure involved in the testing, what it was. Customers need to grow a bit more cynical about testing and test strategies. The time is ripe, right now. It is ripe for hucksters who want to cash in by "testing firewalls" they don't understand, and it is ripe for good faith efforts to establish peer review processes for firewall design and marketing. Let's be on the alert for both.