Tuesday, November 24, 2009
More Information about Malware Details
A month ago we announced the release of a new Webmaster Tools feature that helps webmasters identify malicious content that has been surreptitiously added to their sites. We've been working on improving the quality of the feature since it launched, and yesterday we released some changes that should make the information even more useful. Most of the changes have occurred behind the scenes, but the end result is that we can provide more data, with higher accuracy, and do so more quickly. If your site is receiving a malware warning for Google search results, please visit Webmaster Tools for more details about the problematic code that our automated systems have discovered.
We will continue to improve the feature over time and welcome feedback via comments on this blogpost. If you are a webmaster of a compromised site and use the feature to help clean your site, please include feedback in the comment field of the appeal request.
Thursday, October 29, 2009
Do machines dream of electric malware?
We've explored Google's anti-malware processes several times recently, as well as our efforts to work with webmasters to help protect their users. However, there's been some confusion about the objectivity of our scanning and flagging procedures.
Google uses fully automated systems to scan the Internet for potentially dangerous sites. These systems help detect sites infected with malware and then add a warning that appears in Google search results and in many web browsers. We flag sites in this way to help protect users who might visit them. The warning is a cautionary page, and we never prevent users from viewing the affected site if they choose. It's important to note that sites are often compromised without the webmaster's knowledge, so we provide affected webmasters with further information on the issues we've identified — including showing snippets of the malicious code we find. We also offer free resources in Google Webmaster Tools to help site owners clean their sites and request a re-scan.
Site owners sometimes say that we've made a mistake and that their site does not contain malware. For example, the recent appearance of a malware warning on people.com.cn sparked discussion about how Google flags websites. Our scanners — which are automated and indifferent to a site's subject matter — first found a malicious ad on the book.people.com.cn domain at approximately 3:47 a.m. PT on October 17, 2009. Over several days, the scanners detected thousands of URLs with suspicious content in other people.com.cn domains.
Malicious content can be very difficult to detect. A previous post on this blog offered tips for finding hidden malware and cleaning up websites. There are also good tips on Google's Webmaster Central Blog. If a webmaster has indeed removed the malicious content and filed a malware review request in Webmaster Tools, the warning label will be removed shortly. If it persists, however, it's very likely that dangerous content remains. Our scanners are highly accurate, and false positives are extremely rare.
When Google's automated systems detect dangerous content on a site, an email is sent to several administrative email addresses at the site, as well as to the corresponding Webmaster Tools account if one exists. We sent a notification to people.com.cn at 11:01 a.m. PT on October 17, just as any compromised site would receive. The email includes an explanation of how the site may have become compromised and unknowingly been distributing malware. It also describes the process of removing malware from the site and getting the Google warning removed from the site. A copy of the message sent to the addresses associated with infected sites is below:
We recently discovered that some of your pages can cause users to be infected with malicious software. We have begun showing a warning page to users who visit these pages by clicking a search result on Google.com.
...
We strongly encourage you to investigate this immediately to protect your visitors. Although some sites intentionally distribute malicious software, in many cases the webmaster is unaware because:
1) the site was compromised
2) the site doesn't monitor for malicious user-contributed content
3) the site displays content from an ad network that has a malicious advertiser
If your site was compromised, it's important to not only remove the malicious (and usually hidden) content from your pages, but to also identify and fix the vulnerability. We suggest contacting your hosting provider if you are unsure of how to proceed. StopBadware also has a resource page for securing compromised sites: http://www.stopbadware.org/home/security Once you've secured your site, you can request that the warning be removed by visiting http://www.google.com/support/webmasters/bin/answer.py?answer=45432 and requesting a review. If your site is no longer harmful to users, we will remove the warning.
As the email says, the fastest way for a site to be removed from the malware list is for the webmaster to file a review request via Google Webmaster Tools. Google's automated scanners will periodically re-examine the site even if no such request is received, but the process will take longer. People.com.cn did not file a review request, but our scanners reviewed the site on October 23 and removed the malware warning after finding that the malicious ad was gone.
Malicious display ads are an increasingly common way for sites to unknowingly distribute malware. We recently wrote about the steps that Google takes to help protect our advertising networks. Also, other publishers have recently written about their experiences with deceptive display ads.
Thursday, October 22, 2009
Best Practices for Verifying and Cleaning up a Compromised Site
As part of Cyber Security Awareness Month, Google's Anti-Malware Team is publishing a series of educational blog posts inspired by questions we've received from users. October is a great time to brush up on cyber security tips and ensure you're taking the necessary steps to protect your computer, website, and personal information. For general cyber security tips, check out our online security educational series or visit http://www.staysafeonline.org/. To learn more about malware detection and site cleanup, visit the Webmaster Tools Help Center and Forum.
In our last post in this series, we explained Google's malware scanning process and how malware warning reviews work. It's not always clear to webmasters how to go about cleaning up their sites once they've been compromised, so this time we thought we'd share some best practices.
1) Verify Your Site with Google Webmaster Tools
If you have added and verified your site's ownership with Google Webmaster Tools, you can view a partial list of URLs where our system has detected suspicious content on your site, as well as samples of the malicious code. Once you've thoroughly cleaned up your site and addressed the vulnerability that allowed it to be compromised, it's easy to request a review through Webmaster Tools. We recognize that some site owners may want to use these tools even if they haven't already signed up with Webmaster Tools. For that reason, we enable you to verify ownership of your sites at any time, even if our systems have listed them as potentially dangerous.
2) If Your Site Has Been Compromised, Perform a Comprehensive Cleanup
If any part of your site has been compromised, thoroughly check all pages on the site for harmful code or content — not just the example pages listed in Webmaster Tools. Be sure to identify and address the underlying vulnerability that led to the compromise, or else reinfection is likely to occur.
Remember to Check Your Web Server Configuration
In addition to checking the contents of your site's pages and web server source code, remember to check that your web server configuration has not been modified by any intruders. If your web server has been compromised, your site's error pages can be modified to include custom HTML that actually redirects visitors to malicious sites.
Deleted & Error Pages: Dark Corners of Your Website Where Malware May Be Lurking
When a page is deleted from a site, the web server returns an error code (usually 404: Not Found) when requests to the "deleted" URLs are made. In addition to the error code in the HTTP header, the web server may send a custom error page or "Not Found" page, usually intended to help users find what they are looking for. If your site is infected, its error page can contain arbitrary HTML that exposes your visitors to malware. You can search our Webmaster Forum for information about how others are dealing with similar problems. The recently-launched malware samples feature in Google Webmaster Tools could also come in handy.
3) If You Switch Hosting Providers, Disable Access to the Old Version of Your Site
When a site is moved to a different hosting provider, the DNS records are updated such that the domain name points to a new IP address. In some cases, DNS caching can cause your domain name to continue resolving to the old IP address for some visitors even after the site has moved. For this reason, we recommend instructing your former hosting provider to stop serving any content for your site. This may cause some visitors to experience server errors for a few hours, but can protect them from visiting a potentially dangerous web server.
As always, our Webmaster Forum and StopBadware's BadwareBusters can be good sources of help and information when cleaning up a compromised site.
Friday, October 16, 2009
Protecting Users and Ads from Malware
As part of Cyber Security Awareness Month, we're highlighting cyber security tips and features to help ensure you're taking the necessary steps to protect your computer, website, and personal information. For general cyber security tips, check out our online security educational series or visit http://www.staysafeonline.org/.
At Google, we always aim to provide users with useful, relevant information. Readers of this blog know that we also work hard to detect malicious content on the web and protect users from harm. But did you know that we strive for the same level of relevance, and work equally as hard to protect users, in our online advertising business?
The mainstream media has recently picked up on the topic of malvertising (malware-infected advertising). Google's Anti-Malvertising Team works hard in this area and would like to take this time to share some important safety tips. We work closely with the Anti-Malware Team to identify trends and improve automated detection systems. We also educate users, develop policies and act as a liaison between the online security and online advertising communities.
Whether you're a web publisher who accepts ads on your website, or a home user who enjoys browsing the wide variety of advertising-supported content available on the web, we expect the resources below will help protect you from malvertising.
What is "Malvertising?"
"Malvertising" = malware + advertising. Haven't heard of it? The terminology may be new, but we can all understand the concept. Although malware distributors have attempted to spread malware through online ads for years, ever-improving prevention and detection methods have made it unlikely for most Internet users to have encountered a "bad ad" firsthand. However, it's important to make sure that you (and your computer) are properly prepared in case you encounter any source of malware on the web — whether it is an infected ad, a hacked site, a dangerous link, or someone who is pretending to be someone they're not.
Anti-Malvertising.com
We created Anti-Malvertising.com earlier this year as a resource for all members of the online ecosystem. Anti-Malvertising.com contains tips designed for publishers, ad operations teams, and Internet users to help protect their websites, networks, and computers.
Tips for Web Publishers: Know Who You're Working With, Perform Comprehensive QA, & Have a Plan in Place
Anti-Malvertising.com includes a custom search engine to help individual ad networks, publishers, and ad operations teams conduct quick background checks on prospective advertisers. It indexes a variety of independent, third party sites that track possible attempts to distribute malware through advertising. It is intended to be used as one of the steps in a publisher's background check process.
In some recent cases, infected ads that had already been caught and publicized by security researchers have remained active within some advertising systems. Anti-Malvertising.com's malvertising research engine makes it easier for the online advertising and security communities to share information and collaborate to help protect users from emerging threats.
For more detailed guidance on the following tips, visit http://www.anti-malvertising.com/tips-for-publishers
- Pay close attention to all agencies and advertisers with whom you work.
- Perform due diligence by thoroughly checking prospective partners' references and credentials.
- Perform comprehensive QA on all ad creatives.
- Protect your own computer and website from infection.
- Be aware that various ad networks and exchanges may have significantly different standards for the prevention and detection of malware. No automatic detection system, however robust, can substitute for your own vigilance. However, we strongly advise against exposing your site to harm by using networks or exchanges without strong anti-malware security measures in place.
- Ensure your Ad Operations team has an incident response plan in place (for guidance, visit http://www.anti-malvertising.com/tips-for-ad-operations).
- Make sure your browser, operating system, software and plugins are all updated regularly (enable auto-updates when possible).
- Be aware that malware can be disguised as antivirus/antispyware software in order to trick people into buying or downloading it. Fake (and harmful) software of this kind is known in the web security community as "rogue security software." How to avoid getting tricked? Always research a company's reputation before downloading its software or visiting its website, and be wary of unexpected warnings from products you haven't installed yourself. You can view a list of some legitimate free security scans at http://www.staysafeonline.org/content/free-security-check-ups.
- Exercise caution whenever you're prompted to download an email attachment, follow an instant message link, install a plug-in, or download an unfamiliar piece of software.
In addition to providing visibility to advertisers, revenue to publishers, and information to users, the online advertising business model also enables anyone with an Internet connection to access an entire world of content for free. By increasing our vigilance as a community, we can help to keep online ads safe and preserve the wide access to information that advertising enables.
Monday, October 12, 2009
Show Me the Malware!
As part of Cyber Security Awareness Month, we're highlighting cyber security tips and features to help ensure you're taking the necessary steps to protect your computer, website, and personal information. For general cyber security tips, check out our online security educational series or visit http://www.staysafeonline.org/. To learn more about malware detection and site cleanup, visit the Webmaster Tools Help Center and Forum.
To help protect users against malware threats, Google has built automated scanners that detect malware on websites we've indexed. Pages that are identified as dangerous by these scanners are accompanied by warnings in Google search results, and browsers such as Google Chrome, Firefox, and Safari also use our data to show similar warnings to people attempting to visit suspicious sites.
While it is important to protect users, we also know that most of these sites are not intentionally distributing malware. We understand the frustration of webmasters whose sites have been compromised without their knowledge and who discover that their site has been flagged. We proactively offer help to these webmasters: we send email to site administrators when we encounter suspicious content, we provide a list of infected pages in Webmaster Tools, and we maintain a service that allows webmasters to notify us when they have cleaned their sites. Read more about this process in the previous post on this blog.
We're happy to announce that we've launched a feature that enables Google to provide even more detailed help to webmasters. Webmaster Tools now provides webmasters with samples of the malicious code that Google's automated scanners detected on their sites. These samples — which typically take the form of injected HTML tags, JavaScript, or embedded Flash files — are available in the "Malware details" Labs feature in Webmaster Tools. (UPDATE: The 'Malware details' feature graduated from Labs and is now part of the default Webmaster Tools interface. You can access it in the regular menu under 'Diagnostics'). Registered webmasters (registration is free) of infected sites do not need to specially enable the feature — they will find links to it on the Webmaster Tools dashboard. Webmasters will see a list of their pages that we found to be involved in malware distribution and samples of the malicious content that Google's scanners encountered on each infected page. In certain situations we can identify the underlying cause of the malicious code, and we'll provide these details when possible. We hope that the additional information will assist webmasters and help prevent their visitors from being exposed to malware.

Malware details for your site

Malware details for a particular page
While we're excited to offer this feature, we caution webmasters to use the tool only as a starting point in their site clean-up process. Google's scanners may not be able to provide malware samples in all cases, and the malware samples may not be a complete list of all the malware on the page. More importantly, we advise against simply removing the examples that are displayed in Webmaster Tools. If the underlying vulnerability is not identified and patched, it is likely that the site will be compromised again.
In addition to helping the webmasters of sites with malware warnings, this new detail is also designed to promote the general health of the web. In some cases, our automatic scanners find questionable content on a site but do not have enough data to add it to the malware list. The new "Malware details" feature will highlight these instances to webmasters early on to help them identify and address security vulnerabilities more quickly.
We hope you never have cause to use this feature, but if you do, it should help you quickly purge malware from your site and help protect its visitors. We plan to improve our algorithms in the upcoming months to provide even greater coverage, more accurate vulnerability identification, and faster delivery to webmasters.
Friday, October 9, 2009
The Malware Warning Review Process
As part of Cyber Security Awareness Month, Google's Anti-Malware Team is publishing a series of educational blog posts inspired by questions we've received from users. October is a great time to brush up on cyber security tips and ensure you're taking the necessary steps to protect your computer, website, and personal information. For general cyber security tips, check out our online security educational series or visit http://www.staysafeonline.org/. To learn more about malware detection and site cleanup, visit the Webmaster Tools Help Center and Forum.
Google's anti-malware efforts are designed to be helpful to both webmasters and website visitors. Google continuously scans our web index for pages that could be dangerous to site visitors. When we find such pages, we flag them as harmful in our search results, and also provide this data to several browsers so that users of these browsers will receive warnings directly. We undertake this process as part of our security philosophy: we believe that if we all work together to identify threats and stamp them out, we can make the web a safer place for everyone. While we believe these processes are important steps in helping to protect our users, we also understand the frustration felt by the webmasters of flagged sites. This is why we notify webmasters as soon as we discover that their sites have been compromised. Additionally, we provide webmasters with a tool to file a review once they have cleaned their site. The review process works as follows.
Part 1: The webmaster's job: The first step is site cleanup. The webmaster should remove all harmful content from the site. We realize that it can be tricky to find all the infections on a website, and webmasters should look thoroughly if the warning label persists. Keep in mind that if your site contains elements from another website that may have been compromised, it will remain flagged. This is because your site could still introduce harm to visitors. To prevent reinfection, the webmaster should also identify and fix the underlying software vulnerability that led to site compromise in the first place. For a guide on how to do this, visit stopbadware.org/home/security.
Once a webmaster has cleaned up the site, a Malware Review can be filed with Google's Webmaster Tools (please note that a Malware Review request is not the same as an Index Reinclusion request). The process for Malware Review is as follows:
- Log in to Webmaster Tools.
- From the Tool's home page click on the link to the site that is being flagged. This will bring you to the site's Dashboard.
- There should be a large red banner across the top of the dashboard that says "This site may be distributing malware." Clicking on the link that says "More Details" expands the dashboard to reveal a list of pages on the site that were found to be malicious.
- Below this list is a link that says "Request a review." A webmaster can fill out this form and click the "Request a review" button to initiate the review process.
Part 2: Our job: Upon receiving a Malware Review request, an automated set of algorithms verifies that the site has been cleaned. These algorithms revisit a subset of both the malicious and non-malicious pages that were scanned when the site was originally flagged. Additionally, these algorithms test some pages that were not originally scanned. If none of the tested pages are found to be malicious, the site is deemed to be safe, and warnings are removed from search results. A typical appeal takes only several hours to complete, although in some cases the process may take up to one day.
In addition to processing appeal requests from webmasters, we also rescan compromised sites periodically.
We encourage webmasters of infected sites to quickly clean their web pages and proactively request reviews through Webmaster Tools. After the site has been thoroughly cleaned and reviewed, it will no longer show a warning on Google's search results pages or through the browsers making use of our data.
Tuesday, August 25, 2009
Malware Statistics Update
Every now and then people ask us for an update on the malware statistics we published in the All Your iFrames Point To Us blog post. We're glad to share this sort of data because we believe that collaboration and information sharing are crucial in driving anti-malware efforts forward. Here is a small update containing some interesting trends we've observed over the last 12 months.
Number of Entries on the Google Safe Browsing Malware List

As we mentioned in our Top-10 Malware Sites blog post, we have seen a large increase in the number of compromised sites since April. The number of entries on our malware list has more than doubled in one year, and we have seen periods in which 40,000 web sites were compromised per week. However, compared to infections associated with Gumblar and Martuz — two relatively large and well-known pieces of malicious code, many compromised web sites now point to hundreds of different domains. As these malware trends evolve, we're constantly improving our systems to better detect compromised web sites. The increase in compromised sites we observed may have also been influenced by our improved detection capabilities.
Search Results Containing a URL Labeled as Harmful

The above graph shows the percentage of daily queries that contain at least one search result that we labeled as harmful. In January 2008, more than 1.2% of all Google search queries contained at least one such result (you can review a graph of this data in the aforementioned All Your iFrames Point To Us post). Since then, there has been a downward trend to well below 1%. We noticed an increase around May 2009, and that growth may be due to the appearance of a larger number of compromised web sites. That said, it's encouraging that compared to last year, fewer search queries contain results to potentially harmful sites.
Users of Google search, Google Chrome, Mozilla Firefox and Apple Safari receive warnings when visiting sites we identify as potentially harmful. These warnings are produced by our Safe Browsing API, a technology that is freely available for webmasters to implement.
Friday, August 14, 2009
Ask the Google Anti-Malware Team
The Google Anti-Malware engineering team knows you have many questions related to our scanning and flagging of infected sites, some with short and simple answers and some with more complex answers. The short-answer questions are already -- we hope -- adequately handled on the Webmaster Forums; now we want to do a better job at answering the more complex questions.
To this end, we have created a Google Moderator page for you to submit your questions, and to vote on other webmasters' questions. In two weeks (on Friday the 28th of August), we will close the page and select a few of the top-rated questions. Over the course of the next several weeks, we will do our best to answer each of these in a write-up, to be published here and to the Webmaster Malware Forum.
We hope to repeat this exercise (with a fresh Moderator page) in the fall to give you the opportunity to ask more questions.
Thank you, and see you on the Moderator page!
Wednesday, July 22, 2009
Improving web browser security
Malware is the source of a large number of reported security incidents on the Internet. Since Internet users can become infected in many different ways, the proliferation of malware is a very hard problem to solve. One part of the solution is to improve the robustness of web browsers such that security compromises due to browser bugs are minimized. We work hard to scrutinize our own code for potential vulnerabilities. We also contribute to research in this area with projects like the Browser Security Handbook and open source releases of the fuzzers involved in our software testing.
Some of you may have noticed that while working on Google Chrome, we have also discovered and responsibly reported a number of security issues in other browsers. Various scenarios lead us to report these bugs:
- Some browsers share code bases with Google Chrome, and we collaborate with those browser vendors.
- We develop generic fuzzers that are applicable to most browsers and that we want to share with others.
- We spend time analyzing behavior in different browsers, and we sometimes discover bugs in the process.
- It benefits our users and the Internet as a whole if we work collaboratively on better web browser security.
The collaboration works both ways. We'd like to thank the following browser vendors:
Microsoft for helping with SSL interactions with HTTP proxies, Mozilla for sharing fuzzers, and Apple for sharing and coordinating Webkit-based bugs.
Together as a security community, our combined efforts to find vulnerabilities in browsers, practice responsible disclosure, and get problems fixed before criminals exploit them help make the Internet an overall safer place for everyone. We'd also like to thank all those who have helped us by contributing to Google Chrome.
Wednesday, July 15, 2009
Password strength and account recovery options
There's been some discussion today about the security of online accounts, so we wanted to share our perspective. These are topics that we take very seriously because we know how important they are to our users. We run our own business on Google Apps, and we're highly invested in providing a high level of security in our products. While we can't discuss individual user or customer cases, we thought we'd try to clear up any confusion by taking some time to explain how account recovery works with various types of Google accounts and by revisiting some tips on how users can help keep their account data secure.
One of the more common requests for assistance that we receive from regular Gmail users is to help them regain access to their accounts after they have misplaced or forgotten their password. We know that it can be frustrating when you can't access your account, and we've worked hard to come up with a system designed to help our users regain access to their accounts as smoothly as possible while taking appropriate precautions to protect their account security. When you select a password as you create an account, we recommend that you also choose a security question and provide a secondary email address. Recently, we also added a field where you can input a mobile phone number to assist with later account recovery. We regularly provide tips about how you can choose good passwords and security questions, and we also share our best ideas for what to do when you can't access your account. It's important to keep your password, security question, and secondary email address up to date. It's not enough to just tell us your email address to try to change your password. The security question helps us identify you, but if you want to initiate a password reset, we'll only send that information to the secondary address or the mobile phone number you provide.
We handle password recovery differently for our Google Apps customers. There is no password recovery process for individual Google Apps users. Instead, users must communicate directly with their domain administrator to initiate password changes on their individual accounts. Earlier this year we added new password security tools for Google Apps that allow administrators to set password length requirements and view password strength indicators to identify sufficiently long passwords that may still not be strong enough. For businesses that desire additional authentication security, since 2006 we have supported SAML Single Sign On, a protocol that allows organizations to use two factor authentication solutions such as certificates, smartcards, biometrics, one time password devices, and other stronger tokens.
If you're a regular Gmail user and you haven't updated your account information in a while, we recommend you do so by visiting your Google Account settings page now.
Tuesday, June 16, 2009
HTTPS security for web applications
A group of privacy and security experts sent a letter today urging Google to strengthen its leadership role in web application security, and we wanted to offer some of our thoughts on the subject.
Let's take a closer look at how this works in the case of Gmail. We know that tens of millions of Gmail users rely on it to manage their lives every day, and we have offered HTTPS access as an option in Gmail from the day we launched. If you choose to use HTTPS in Gmail, our systems are designed to maintain it throughout the email session — not just at login — so everything you do can be passed through a more secure connection. Last summer we made it even easier by letting Gmail users opt in to always use HTTPS every time they log in (no need to type or bookmark the "https").
Update @ 1:00pm: We've had some more time to go through the report. There's a factual inaccuracy we wanted to point out: a cookie from Docs or Calendar doesn't give access to a Gmail session. The master authentication cookie is always sent over HTTPS — whether or not the user specified HTTPS-only for their Gmail account. But we can all agree on the benefits of HTTPS, and we're glad that the report recognizes our leadership role in this area. As the report itself points out, "Users of Microsoft Hotmail, Yahoo Mail, Facebook and MySpace are also vulnerable to [data theft and account hijacking]. Worst of all — these firms do not offer their customers any form of protection. Google at least offers its tech savvy customers a strong degree of protection from snooping attacks." We take security very seriously, and we're proud of our record of providing security for free web apps.
Update on June 26th: We've sent a response to the signatories of the letter. You can read it here.
Wednesday, June 3, 2009
Top 10 Malware Sites
A recent surge in compromised web servers has generated many interesting discussions in online forums and blogs.  We thought we would join the conversation by sharing what we found to be the most popular malware sites in the last two months.
As we've discussed previously, we constantly scan our index for potentially dangerous sites.  Our automated systems found more than 4,000 different sites that appeared to be set up for distributing malware by massively compromising popular web sites.  Of these domains more than 1,400 were hosted in the .cn TLD.  Several contained plays on the name of Google such as goooogleadsence.biz, etc.
The graph shows the top-10 malware sites as counted by the number of compromised web sites that referenced it. All domains on the top-10 list are suspected to have compromised more than 10,000 web sites on the Internet. The graph also contains arrows indicating when these domains where first listed via the Safe Browsing API and flagged in our search results as potentially dangerous.
Other malware researchers reported widespread compromises pointing to the domains gumblar.cn and martuz.cn, both of which made it on our top-10 list. For gumblar, we saw about 60,000 compromised sites; Martuz peaked at slightly over 35,000 sites. Beladen.net was also reported to be part of a mass compromise, but made it only to position 124 on the list with about 3,500 compromised sites.
To help make the Internet a safer place, our Safe Browsing API is freely available and is being used by browsers such as Firefox and Chrome to protect users on the web.
Tuesday, March 31, 2009
Reducing XSS by way of Automatic Context-Aware Escaping in Template Systems
Building on our earlier posts on defenses against web application flaws ["Automating Web Application Security Testing", "Meet ratproxy, our passive web security assessment tool"], we introduce Automatic Context-Aware Escaping (Auto-Escape for short), a functionality we added to two Google-developed general purpose template systems to better protect against Cross-Site Scripting (XSS).
We developed Auto-Escape specifically for general purpose template systems; that is, template systems that are for the most part unaware of the structure and programming language of the content on which they operate. These template systems typically provide minimal support for web applications, possibly limited to basic escaping functions that a developer can invoke to help escape unsafe content being returned in web responses. Our observation has been that web applications of substantial size and complexity using these template systems have an increased risk of introducing XSS flaws. To see why this is the case, consider the simplified template below in which double curly brackets {{ and }} enclose placeholders (variables) that are replaced with run-time content, presumed unsafe.
<body>
<span style="color:{{USER_COLOR}};">
Hello {{USERNAME}}, view your <a href="{{USER_ACCOUNT_URL}}">Account</a>.
</span>
<script>
var id = {{USER_ID}}; // some code using id, say:
// alert("Your user ID is: " + id);
</script>
</body>
In this template, four variables are used (not in this order):
- USER_NAME is inserted into regular HTML text and hence can be escaped safely by HTML-escape.
- USER_ACCOUNT_URL is inserted into an HTML attribute that expects a URL and therefore in addition to HTML-escape, also requires validation that the URL scheme is safe. By allowing only a safe white-list of schemes, we can prevent (say) javascript:pseudo-URLs, which HTML-escape alone does not prevent.
- USER_COLOR is inserted into a Cascading Style Sheets (CSS) context and therefore requires an escaping that also prevents scripting and other dangerous constructs in CSS such as those possible in expression()orurl(). For more information on concerns with harmful content in CSS, refer to the CSS section of the Browser Security Handbook.
- USER_ID is inserted into a Javascript variable that expects a number as it is not enclosed in quotes. As such, it requires an escaping that coerces it to a number (which a typical Javascript-escape function does not do), otherwise it can lead to arbitrary javascript execution. More variants may be developed to coerce content to other data types, including arrays and objects.
Each of these variable insertions requires a different escaping method or risks introducing XSS. To keep the example small, we excluded several contexts of interest, particularly style tags, HTML attributes that expect Javascript (such as onmouseover), and considerations of whether attribute values are enclosed within quotes or not (which also affects escaping).
Auto-Escape
The example above demonstrates the importance of understanding the precise context in which variables are being inserted and the need for escaping functions that are both safe and correct for each. For larger and complex web applications, we notice two related vectors for XSS:
- A developer forgetting to apply escaping to a given variable.
- A developer applying the wrong escaping for that variable for the context in which it is being inserted.
Considering the sheer number of templates in large web applications and the number of untrusted content they may operate on, the process of proper escaping becomes complicated and error prone. It is also difficult to efficiently audit from a security testing perspective. We developed Auto-Escape to take that complexity away from the developer and into the template system and therefore reduce the risks of XSS that would have ensued.
A Look at Implementation
Auto-Escape is a functionality designed to make the Template System web application context-aware and therefore able to apply automatically and properly the escaping required. This is achieved in three parts:
- We determined all the different contexts in which untrusted content may be returned and provided proper escaping functions for each. This is part science and part practical. For example, we did not find the need to support variable insertion inside an HTML tag name itself (as opposed to HTML attributes) so we did not build support for it. Other factors come into play, including availability of existing escaping functions and backwards compatibility. As a result, part of that work is template system dependent.
- We developed our own parser to parse HTML and Javascript templates. It provides methods which can be queried at a point of interest to obtain the context information necessary for proper escaping. The parser is designed with performance in mind, and it runs in a stream mode without look-ahead. It aims for simplicity while understanding that browsers may be more lenient than specifications, particularly in certain corner cases.
- We added an extra step into the parsing that the template system already performs to locate variables, among other needs. This extra step activates our HTML/Javascript parser, queries it for the context of each variable then applies its escaping rules to compute the proper escaping functions to use for each variable. Depending on the template system, this step may be performed only the first time a template is used or for each web response in which case some limitations may be lifted.
A simple mechanism is provided for the developer to indicate that some variables are safe and should not be escaped. This is used for variables that are either escaped through other means in source code or contain trusted markup that should be emitted intact.
Current Status
Auto-Escape has been released with the C++ Google Ctemplate for a while now and it continues to develop there. You can read more about it in the Guide to using Auto-Escape. We also implemented Auto-Escape for the ClearSilver template system and expect it to be released in the near future. Lastly, we are in the process of integrating it into other template systems developed at Google for Java and Python and are interested in working with a few other open source template systems that may benefit from this logic. Our HTML/Javascript parser is already available with the Google Ctemplate distribution and is expected to be released as a stand-alone open source project very soon.
Co-developers: Filipe Almeida and Mugdha Bendre
 
