Building on our earlier posts on defenses against web application flaws ["Automating Web Application Security Testing", "Meet ratproxy, our passive web security assessment tool"], we introduce Automatic Context-Aware Escaping (Auto-Escape for short), a functionality we added to two Google-developed general purpose template systems to better protect against Cross-Site Scripting (XSS).
We developed Auto-Escape specifically for general purpose template systems; that is, template systems that are for the most part unaware of the structure and programming language of the content on which they operate. These template systems typically provide minimal support for web applications, possibly limited to basic escaping functions that a developer can invoke to help escape unsafe content being returned in web responses. Our observation has been that web applications of substantial size and complexity using these template systems have an increased risk of introducing XSS flaws. To see why this is the case, consider the simplified template below in which double curly brackets {{
and }}
enclose placeholders (variables) that are replaced with run-time content, presumed unsafe.
<body>
<span style="color:{{USER_COLOR}};">
Hello {{USERNAME}}, view your <a href="{{USER_ACCOUNT_URL}}">Account</a>.
</span>
<script>
var id = {{USER_ID}}; // some code using id, say:
// alert("Your user ID is: " + id);
</script>
</body>
In this template, four variables are used (not in this order):
- USER_NAME is inserted into regular HTML text and hence can be escaped safely by HTML-escape.
- USER_ACCOUNT_URL is inserted into an HTML attribute that expects a URL and therefore in addition to HTML-escape, also requires validation that the URL scheme is safe. By allowing only a safe white-list of schemes, we can prevent (say)
javascript:
pseudo-URLs, which HTML-escape alone does not prevent. - USER_COLOR is inserted into a Cascading Style Sheets (CSS) context and therefore requires an escaping that also prevents scripting and other dangerous constructs in CSS such as those possible in
expression()
orurl()
. For more information on concerns with harmful content in CSS, refer to the CSS section of the Browser Security Handbook. - USER_ID is inserted into a Javascript variable that expects a number as it is not enclosed in quotes. As such, it requires an escaping that coerces it to a number (which a typical Javascript-escape function does not do), otherwise it can lead to arbitrary javascript execution. More variants may be developed to coerce content to other data types, including arrays and objects.
Each of these variable insertions requires a different escaping method or risks introducing XSS. To keep the example small, we excluded several contexts of interest, particularly style tags, HTML attributes that expect Javascript (such as onmouseover
), and considerations of whether attribute values are enclosed within quotes or not (which also affects escaping).
Auto-Escape
The example above demonstrates the importance of understanding the precise context in which variables are being inserted and the need for escaping functions that are both safe and correct for each. For larger and complex web applications, we notice two related vectors for XSS:
- A developer forgetting to apply escaping to a given variable.
- A developer applying the wrong escaping for that variable for the context in which it is being inserted.
Considering the sheer number of templates in large web applications and the number of untrusted content they may operate on, the process of proper escaping becomes complicated and error prone. It is also difficult to efficiently audit from a security testing perspective. We developed Auto-Escape to take that complexity away from the developer and into the template system and therefore reduce the risks of XSS that would have ensued.
A Look at Implementation
Auto-Escape is a functionality designed to make the Template System web application context-aware and therefore able to apply automatically and properly the escaping required. This is achieved in three parts:
- We determined all the different contexts in which untrusted content may be returned and provided proper escaping functions for each. This is part science and part practical. For example, we did not find the need to support variable insertion inside an HTML tag name itself (as opposed to HTML attributes) so we did not build support for it. Other factors come into play, including availability of existing escaping functions and backwards compatibility. As a result, part of that work is template system dependent.
- We developed our own parser to parse HTML and Javascript templates. It provides methods which can be queried at a point of interest to obtain the context information necessary for proper escaping. The parser is designed with performance in mind, and it runs in a stream mode without look-ahead. It aims for simplicity while understanding that browsers may be more lenient than specifications, particularly in certain corner cases.
- We added an extra step into the parsing that the template system already performs to locate variables, among other needs. This extra step activates our HTML/Javascript parser, queries it for the context of each variable then applies its escaping rules to compute the proper escaping functions to use for each variable. Depending on the template system, this step may be performed only the first time a template is used or for each web response in which case some limitations may be lifted.
A simple mechanism is provided for the developer to indicate that some variables are safe and should not be escaped. This is used for variables that are either escaped through other means in source code or contain trusted markup that should be emitted intact.
Current Status
Auto-Escape has been released with the C++ Google Ctemplate for a while now and it continues to develop there. You can read more about it in the Guide to using Auto-Escape. We also implemented Auto-Escape for the ClearSilver template system and expect it to be released in the near future. Lastly, we are in the process of integrating it into other template systems developed at Google for Java and Python and are interested in working with a few other open source template systems that may benefit from this logic. Our HTML/Javascript parser is already available with the Google Ctemplate distribution and is expected to be released as a stand-alone open source project very soon.
Co-developers: Filipe Almeida and Mugdha Bendre