Tuesday, March 30, 2010

The chilling effects of malware

In January, we discussed a set of highly sophisticated cyber attacks that originated in China and targeted many corporations around the world. We believe that malware is a general threat to the Internet, but it is especially harmful when it is used to suppress opinions of dissent. In that case, the attacks involved surveillance of email accounts belonging to Chinese human rights activists. Perhaps unsurprisingly, these are not the only examples of malicious software being used for political ends. We have gathered information about a separate cyber threat that was less sophisticated but that nonetheless was employed against another community.

This particular malware broadly targeted Vietnamese computer users around the world. The malware infected the computers of potentially tens of thousands of users who downloaded Vietnamese keyboard language software and possibly other legitimate software that was altered to infect users. While the malware itself was not especially sophisticated, it has nonetheless been used for damaging purposes. These infected machines have been used both to spy on their owners as well as participate in distributed denial of service (DDoS) attacks against blogs containing messages of political dissent. Specifically, these attacks have tried to squelch opposition to bauxite mining efforts in Vietnam, an important and emotionally charged issue in the country.

Since some anti-virus vendors have already introduced signatures to help detect this specific malware, we recommend the following actions, particularly if you believe that you may have been exposed to the malware: run regular anti-virus as well as anti-spyware scans from trusted vendors, and be sure to install all web browser and operating system updates to ensure you’re using only the latest versions. New technology like our suspicious account activity alerts in Gmail should also help detect surveillance efforts. At a larger scale, we feel the international community needs to take cybersecurity seriously to help keep free opinion flowing.

Phishing phree

To help protect you from a wide array of Internet scams you may encounter while searching, we analyze millions of webpages daily for phishing behavior. Each year, we find hundreds of thousands of phishing pages and add them to our list that we use to directly warn users of Firefox, Safari, and Chrome via our SafeBrowsing API. How do we find all these phishing pages? We certainly don’t do it by hand!

Instead, we have taught computers to look for certain telltale signs, as we describe in a paper that we recently presented at the 17th Annual Network and Distributed System Security Symposium. In a nutshell, our system looks at many of the same elements of a webpage that you would check to help evaluate whether the page is designed for phishing. If our system determines that a page is being used for phishing, it will automatically produce a warning for all of our users who try to visit that page. This scalable design enables us to review all these potentially “phishy” pages in about a minute each.

What we look for

Our system analyzes a number of webpage features to help make a verdict about whether a site is a phishing site. Starting with a page’s URL, we look to see if there is anything unusual about the host, such as whether the hostname is unusually long or whether the URL uses an IP address to specify the host. We also look to see if the URL contains any phrases like “banking” or “login” that might indicate that the page is trying to steal information.

We don’t just look at the URL, though. After all, a perfectly legitimate site could certainly use words like “banking” or “login.” We collect a snapshot of the page’s content to examine it closely for phishing behavior. For example, we check to see if the page has a password field or whether most of the links point to a common phishing target, as both of these characteristics can be a sign of phishing. Additionally, we pick out some of the most characteristic terms that show up on a page (as defined by their TF-IDF scores), and look for terms like “password” or “PIN number,” which also may indicate that the page is intended for phishing.

We also check the page’s hosting information to find out which networks host the page and where the page’s servers are located geographically. If a site purporting to be an American bank runs its servers in a different country and is hosted on a local residential ISP’s network, we have a strong signal that the site is bad.

Finally, we check the page’s PageRank to see if the page is popular or not, and we check the spam reputation of the page’s domain. We discovered in our research findings that almost all phishing pages are found on domains that almost exclusively send spam. You can observe this trend in the CCDF graph of the spam reputation scores for phishing pages as compared to the graph of other, non-phishing pages.

How we learn to recognize phishing pages

We use a sample of the data that our system generates to train the classifier that lies at the core of our automatic system using a machine learning algorithm. Coming up with good labels (phishing/not phishing) for this data is tricky because we can’t label each of the millions of pages ourselves. Instead, we use our published phishing page list, largely generated by our classifier, to assign labels for the training data.

You might be wondering if this system is going to lead to situations where the classifier makes a mistake, puts that mistake on our list, and then uses the list to learn to make more mistakes. Fortunately, the chain doesn’t make it that far. Our classifier only makes a relatively small number of mistakes, which we can correct manually when you report them to us. Our learning algorithms can handle a few temporary errors in the training labels, and the overall learning process remains stable.

How well does this work?

Of the millions of webpages that our scanners analyze for phishing, we successfully identify 9 out of 10 phishing pages. Our classification system only incorrectly flags a non-phishing site as a phishing site about 1 in 10,000 times, which is significantly better than similar systems. In our experience, these “false positive” sites are usually built to distribute spam or may be involved with other suspicious activity. While phishers are constantly changing their strategies, we find that they do not change them enough to reliably escape our system. Our experiments showed that our classification system remained effective for over a month without retraining.

If you are a webmaster and would like more information about how to keep your site from looking like a phishing site, please check out our post on the Webmaster Central Blog. If you find that your site has been added to our phishing page list ("Reported Web Forgery!") by mistake, please report the error to us.

Wednesday, March 24, 2010

Detecting suspicious account activity

(Cross-posted from the Gmail Blog)

A few weeks ago, I got an email presumably from a friend stuck in London asking for some money to help him out. It turned out that the email was sent by a scammer who had hijacked my friend's account. By reading his email, the scammer had figured out my friend's whereabouts and was emailing all of his contacts. Here at Google, we work hard to protect Gmail accounts against this kind of abuse. Today we're introducing a new feature to notify you when we detect suspicious login activity on your account.

You may remember that a while back we launched remote sign out and information about recent account activity to help you understand and manage your account usage. This information is still at the bottom of your inbox. Now, if it looks like something unusual is going on with your account, we’ll also alert you by posting a warning message saying, "Warning: We believe your account was last accessed from…" along with the geographic region that we can best associate with the access.

To determine when to display this message, our automated system matches the relevant IP address, logged per the Gmail privacy policy, to a broad geographical location. While we don't have the capability to determine the specific location from which an account is accessed, a login appearing to come from one country and occurring a few hours after a login from another country may trigger an alert.

By clicking on the "Details" link next to the message, you'll see the last account activity window that you're used to, along with the most recent access points.

If you think your account has been compromised, you can change your password from the same window. Or, if you know it was legitimate access (e.g. you were traveling, your husband/wife who accesses the account was also traveling, etc.), you can click "Dismiss" to remove the message.

Keep in mind that these notifications are meant to alert you of suspicious activity but are not a replacement for account security best practices. If you'd like more information on account security, read these tips on keeping your information secure or visit the Google Online Security Blog.

Finally, we know that security is also a top priority for businesses and schools, and we look forward to offering this feature to Google Apps customers once we have gathered and incorporated their feedback.

Friday, March 19, 2010

Meet skipfish, our automated web security scanner

The safety of the Internet is of paramount importance to Google, and helping web developers build secure, reliable web applications is an important part of the equation. To advance this goal, we have released projects such as ratproxy, a passive security assessment tool; and Browser Security Handbook, a comprehensive guide for web developers. We also worked with the community to improve the security of third-party browsers.

Today, we are happy to announce the availability of skipfish - our free, open source, fully automated, active web application security reconnaissance tool. We think this project is interesting for a few reasons:
  • High speed: written in pure C, with highly optimized HTTP handling and a minimal CPU footprint, the tool easily achieves 2000 requests per second with responsive targets.

  • Ease of use: the tool features heuristics to support a variety of quirky web frameworks and mixed-technology sites, with automatic learning capabilities, on-the-fly wordlist creation, and form autocompletion.

  • Cutting-edge security logic: we incorporated high quality, low false positive, differential security checks capable of spotting a range of subtle flaws, including blind injection vectors.
As with ratproxy, we feel that skipfish will be a valuable contribution to the information security community, making security assessments significantly more accessible and easier to execute.

To download the scanner, please visit this page; detailed project documentation is available here.

Wednesday, March 3, 2010

Federal Support for Federated Login

Last November, we discussed the progress that account login systems operating via standards-based identity technologies like OpenID have achieved across the web. As more websites seek to interact with one another to provide a richer experience for users, we're seeing even more interest in finding a secure way to enable that kind of information sharing while avoiding the hassle for users of creating new accounts and passwords.

Excitement for technology like OpenID is not limited to the private sector. President Obama's open government memorandum last year spurred the creation of a pilot initiative in September to enable U.S. citizens to more easily sign in to government-run websites. Google joined a number of other companies to explore ways to answer that call.

Now, several months later, some interesting things are taking shape. The Open Identity Exchange (OIX), a new organization and certification body focused on online identity management, today named Google among the first identity providers to be approved by the U.S. Government as meeting federal standards for identity assurance. This means that Google's identity, security, and privacy specifications have been certified so that a user can register and log in at U.S. government websites using their Google account login credentials. The National Institute of Health (NIH) is the first government website ready to accept such credentials, and we look forward to seeing other websites open up to certified identity providers so that users will have an easier and more secure time interacting with these resources.

Our hope is that the work of the OIX and other groups will continue to grow and help facilitate more open government participation, as well as improve security on the Internet by reducing password use across websites.