Monday, July 16, 2007

Automating web application security testing

Cross-site scripting (aka XSS) is the term used to describe a class of security vulnerabilities in web applications. An attacker can inject malicious scripts to perform unauthorized actions in the context of the victim's web session. Any web application that serves documents that include data from untrusted sources could be vulnerable to XSS if the untrusted data is not appropriately sanitized. A web application that is vulnerable to XSS can be exploited in two major ways:

    Stored XSS - Commonly exploited in a web application where one user enters information that's viewed by another user. An attacker can inject malicious scripts that are executed in the context of the victim's session. The exploit is triggered when a victim visits the website at some point in the future, such as through improperly sanitized blog comments and guestbook entries, which facilitates stored XSS.

    Reflected XSS - An application that echoes improperly sanitized user input received as query parameters is vulnerable to reflected XSS. With a vulnerable application, an attacker can craft a malicious URL and send it to the victim via email or any other mode of communication. When the victim visits the tampered link, the page is loaded along with the injected script that is executed in the context of the victim's session.

The general principle behind preventing XSS is the proper sanitization (via, for instance, escaping or filtering) of all untrusted data that is output by a web application. If untrusted data is output within an HTML document, the appropriate sanitization depends on the specific context in which the data is inserted into the HTML document. The context could be in the regular HTML body, tag attributes, URL attributes, URL query string attributes, style attributes, inside JavaScript, HTTP response headers, etc.

The following are some (by no means complete) examples of XSS vulnerabilities. Let's assume there is a web application that accepts user input as the 'q' parameter. Untrusted data coming from the attacker is marked in red.

  • Injection in regular HTML body - angled brackets not filtered or escaped

    <b>Your query '<script>evil_script()</script>' returned xxx results</b>

  • Injection inside tag attributes - double quote not filtered or escaped

    <form ...
      <input name="q" value="blah"><script>evil_script()</script>">

  • Injection inside URL attributes - non-http(s) URL

    <img src="javascript:evil_script()">...</img>

  • In JavaScript context - single quote not filtered or escaped

      var msg = 'blah'; evil_script(); //';
      // do something with msg variable

In the cases where XSS arises from meta characters being inserted from untrusted sources into an HTML document, the issue can be avoided either by filtering/disallowing the meta characters, or by escaping them appropriately for the given HTML context. For example, the HTML meta characters <, >, &, " and ' must be replaced with their corresponding HTML entity references &lt;, &gt;, &amp;, &quot; and &#39 respectively. In a JavaScript-literal context, inserting a backslash in front of \, ', " and converting the carriage returns, line-feeds and tabs into \r, \n and \t respectively should avoid untrusted meta characters being interpreted as code.

How about an automated tool for finding XSS problems in web applications? Our security team has been developing a black box fuzzing tool called Lemon (deriving from the commonly-recognized name for a defective product). Fuzz testing (also referred to as fault-injection testing) is an automated testing approach based on supplying inputs that are designed to trigger and expose flaws in the application. Our vulnerability testing tool enumerates a web application's URLs and corresponding input parameters. It then iteratively supplies fault strings designed to expose XSS and other vulnerabilities to each input, and analyzes the resulting responses for evidence of such vulnerabilities. Although it started out as an experimental tool, it has proved to be quite effective in finding XSS problems. Besides XSS, it finds other security problems such as response splitting attacks, cookie poisoning problems, stacktrace leaks, encoding issues and charset bugs. Since the tool is homegrown it is easy to integrate into our automated test environment and to extend based on specific needs. We are constantly in the process of adding new attack vectors to improve the tool against known security problems.

I wanted to respond to a few questions that seem to be common among readers. I've listed them below. Thanks for the feedback. Please keep the questions and comments coming.

Q. Does Google plan to market it at some point?
A. Lemon is highly customized for Google apps and we have no plans of releasing it in near future.

Q. Did Google's security team check out any commercially available fuzzers? Is the ability to keep improving the fuzzer the main draw of a homegrown tool?
A. We did evaluate commercially available fuzzers but felt that our specialized needs could be served best by developing our own tools.

Monday, July 9, 2007

The reason behind the "We're sorry..." message

Some of you might have seen this message while searching on Google, and wondered what the reason behind it might be. Instead of search results, Google displays the "We're sorry" message when we detect anomalous queries from your network. As a regular user, it is possible to answer a CAPTCHA - a reverse Turing test meant to establish that we are talking to a human user - and to continue searching. However, automated processes such as worms would have a much harder time solving the CAPTCHA. Several things can trigger the sorry message. Often it's due to infected computers or DSL routers that proxy search traffic through your network - this may be at home or even at a workplace where one or more computers might be infected. Overly aggressive SEO ranking tools may trigger this message, too. In other cases, we have seen self-propagating worms that use Google search to identify vulnerable web servers on the Internet and then exploit them. The exploited systems in turn then search Google for more vulnerable web servers and so on.  This can lead to a noticeable increase in search queries and sorry is one of our mechanisms to deal with this.

At ACM WORM 2006, we published a paper on Search Worms [PDF] that takes a much closer look at this phenomenon. Santy, one of the search worms we analyzed, looks for remote-execution vulnerabilities in the popular phpBB2 web application. In addition to exhibiting worm like propagation patterns, Santy also installs a botnet client as a payload that connects the compromised web server to an IRC channel. Adversaries can then remotely control the compromised web servers and use them for DDoS attacks, spam or phishing. Over time, the adversaries have realized that even though a botnet consisting of web servers provides a lot of aggregate bandwidth, they can increase leverage by changing the content on the compromised web servers to infect visitors and in turn join the computers of compromised visitors into much larger botnets. This fundamental change from remote attack to client based download of malware formed the basis of the research presented in our first post. In retrospect, it is interesting to see how two seemingly unrelated problems are tightly connected.