Page 1 of 1

#1 Ars tests Internet surveillance—by spying on an NPR reporter

Posted: Tue Jun 10, 2014 11:56 pm
by rhoenix
This is a long article, I will warn you in advance. However, it is very revealing about the state of Internet security today.
arstechnica.com wrote:On a bright April morning in Menlo Park, California, I became an Internet spy.

This was easier than it sounds because I had a willing target. I had partnered with National Public Radio (NPR) tech correspondent Steve Henn for an experiment in Internet surveillance. For one week, while Henn researched a story, he allowed himself to be watched—acting as a stand-in, in effect, for everyone who uses Internet-connected devices. How much of our lives do we really reveal simply by going online?

Henn let me into his Silicon Valley home and ushered me into his office with a cup of coffee. Waiting for me there was the key tool of my new trade: a metal-and-plastic box that resembled nothing more threatening than an unlabeled Wi-Fi router. This was the PwnPlug R2, a piece of professional penetration testing gear designed by Pwnie Express CTO Dave Porcello and his team and on loan to us for this project.

The box would soon sink its teeth into the Internet traffic from Henn's home computer and smartphone, silently gobbling up every morsel of data and spitting it surreptitiously out of Henn's home network for our later analysis. With its help, we would create a pint-sized version of the Internet surveillance infrastructure used by the National Security Agency. Henn would serve as a proxy for Internet users, Porcello would become our one-man equivalent of the NSA’s Special Source Operations department, and I would become Henn's personal NSA analyst.

As Henn cleared a spot on his desk for the PwnPlug, he joked that it might not provide anything useful for us to analyze. In the year since Edward Snowden pulled back the curtain of secrecy around the NSA’s dragnet surveillance programs, many of the major Internet service providers targeted by the spy agency have publicly announced plans to better protect customers, often through the expanded use of encryption.

Our experiment would answer the question: could a passive observer of Internet traffic still learn much about a target in this post-Snowden world?

Henn dialed up Porcello and put him on speakerphone as we finalized the location and setup of the PwnPlug. As I snapped in an Ethernet cable, Henn turned on his iPhone and connected to the PwnPlug’s Wi-Fi network. Porcello watched remotely as data from Henn's network suddenly poured into a specially configured Pwnie Express server.

“Whoa,” Porcello said. “Yep, there’s Yahoo, NPR... there’s an HTTP request to Google... the phone is checking for an update. Wow, there’s a lot of stuff going on here. It's just thousands and thousands of pages of stuff... Are you sure you’re not opening any apps?”

“I didn’t do anything!” Henn replied. “My phone is just sitting here on my desk.”

He checked his phone and found that Mail, Notes, Safari, Maps, Calendar, Messages, Twitter, and Facebook were running in the background—and making connections to the Internet. The Safari Web browser proved the most revealing. Like most people who use the iPhone, Henn had left open dozens of websites; when his phone had connected to the PwnPlug’s network, the browser had refreshed them, revealing movies he was checking out for his kids, a weather report, and research he was doing for work.

In the first two minutes of our test, we had already captured a snapshot of Henn’s recent online life—and the real surveillance hadn't even begun.

While the NSA runs hundreds of surveillance programs, its broad, passive surveillance of the Internet has just two key components: Turbulence, a network monitoring system that skims traffic from the Internet’s fiber-optic backbone, and XKeyscore, an analytics database that processes the captured traffic, using rules that look for specific strings of text or patterns in data (e-mail addresses, phone numbers, file attachments). According to leaked NSA documents and whistleblower testimony, pieces of both Turbulence and XKeyscore are scattered about the world near Internet chokepoints such as the infamous “secret room” at AT&T’s San Francisco offices that has been described by former AT&T employee Mark Klein.

To recreate this setup in miniature, the PwnPlug in Henn’s office was configured as a Wi-Fi access point; it acted as our equivalent of the NSA’s Turbulence. While the PwnPlug is generally used for network penetration testing, Porcello configured the device used in our test only to intercept traffic outbound to or inbound from the Internet, not traffic that began and ended on Henn's home network. The device captured every packet matching these criteria and sent it over a secure SSH connection back to a server at Pwnie Express headquarters in Berlin, Vermont.

The remote machine at Pwnie acted as our diminutive version of XKeyscore. To emulate the NSA's processing of captured traffic, Porcello ran a number of open source analytics tools against Henn's traffic, including the ngrep packet search tool, the tshark and Wireshark traffic analysis tools, the tcpflow data stream capture tool, the dsniff suite’s passive monitoring tools, and tcpxtract for capturing files within Internet traffic.

For more than a month before the experiment began, Ars Technica and NPR made technical and legal preparations to ensure that any data captured from Henn would be handled with confidentiality and care. The focus would be solely on Henn’s personal online activities; we explicitly did not attempt to penetrate NPR’s corporate network, to hack Henn’s computer or phone, or to grab traffic from Henn's other family members. We would simply watch the traffic passing between our test Wi-Fi network and the Internet in the same way that the NSA collects data from millions of Internet users around the world each day.

Our full access to Henn's activities lasted for several days while he reported a single story. To make Henn as accurate a proxy as possible for the average unsuspecting Internet user, one condition stipulated for the test was that when the PwnPlug was active, Henn wouldn’t take extra measures to avoid surveillance (though he followed his normal operational security protocols). Henn could also pull the plug on our test at any time.

The experiment unfolded in two phases. In the first, we simply observed Henn’s normal Internet traffic. In the second, Henn, Porcello, and I stopped the broad surveillance of Henn and turned our tools on specific traffic created by leading Web applications and services. Here's what we found.

Watching Henn’s traffic let us track much of his activity on the open Internet, but it didn’t give us everything. Like many people who work from home, Henn's corporate e-mails, Voice over IP phone (VoIP) calls, and other official communications were concealed by encryption—either by application-specific encryption or by NPR’s virtual private network. Encryption, when applied consistently, at least helps to thwart casual passive surveillance.

However, we quickly discovered that the encryption used by most popular Internet services doesn’t completely protect users from eavesdropping. Inconsistent implementations of encryption, plus data leaked by connections to unprotected sites, still provided us with enough data to paint a fairly complete picture of what Henn was doing.

On one of the days we watched him, Henn was reporting on environmentally friendly data centers, though I didn't know this at the time.

I got my first hint of what Henn was researching by reconstructing his Google searches. Google encrypts searches by default now, but data leaks from Google’s search engine can easily give up a person’s searches once they’ve been de-anonymized—in part by using Google’s own “cookies” against a target.

To provide its services, Google uses several cookies, small bits of unique text that are stored by users' Web browsers. One of these, the PREF cookie, tracks user identity separately from a Google login, in part to track what users search for and then to serve up context-appropriate advertisements.

This unique identification capability means that cookies are also valuable to anyone else listening in. According to documents published by The Washington Post, the NSA has used Google’s PREF cookie ID value as a “strong identifier” to associate a specific Web browser with a specific stream of Web traffic.

Even within Google’s encrypted sites, Google doesn’t encrypt PREF cookie data sent from the browser to various services. For example, Google’s secure search page makes calls out to Google Maps using the PREF cookie “in the clear,” along with unencrypted requests for maps embedded within search results. Thus, map data presented within the otherwise “secure” search results can offer hints about what the user was actually searching for—or even the street address.

In our test, I was able to isolate PREF cookie data quickly and use it as a key to search through all of Henn’s captured traffic. In the first block of traffic I searched, I got hits on Henn’s ID for calls to ads.google.com from a discount shopping site. Henn said he had no recollection of him or his wife ever visiting the site, so how this request was generated by his Web browser remains a mystery.

But there were also requests to maps.google.com from within Google Search. The maps showed Grundy County, Iowa and Forest City, North Carolina. What did both locations have in common? One is the site of a wind farm for a new Facebook data center; the other holds an existing high-efficiency Facebook data center.

In addition, the search queries for these locations were embedded in the Web calls to maps.google.com. When I discussed these locations with him later, Henn confirmed he was looking at the locations to see if he could get local public radio reporters or freelancers to make site visits.

I conducted the same location test on myself and found the same thing—a search using a store's name generated an unencrypted request to Google Maps, which contained my search term and data about my IP address. Security researcher Ashkan Soltani confirmed the leak in an e-mail exchange, writing, "Basically the short answer is: it's significant but depends on the client." (Internet Explorer appears unaffected.)

We reached out to Google for comment. It turns out that we had found a bug in Google search—one that Google has since corrected.

But there are even simpler ways to track someone's searches—if you can't see the search queries themselves, look at the results that get clicked. It’s possible to reverse engineer searches by scanning captured traffic for “referrer” tags, which tell websites about incoming traffic.

As part of our analysis of Henn’s traffic, I searched for "Google" in these referrer tags, allowing me to identify the pages Henn clicked on from Google search result pages. Google stopped sending its search terms as part of the referrer tag when it started encrypting search result pages, so I couldn’t always determine the exact search query Henn had used. But thanks to the search engine optimization efforts of the websites he visited, I was able to capture URL keywords that provided strong hints—keywords that Henn would later tell me matched almost exactly with his searches:
  • who-coined-cloud-computing
  • data-centers-waste-vast-amounts-of-energy-belying-industry-image
  • global-warming-and-energy
  • searching-the-planet-to-find-power-for-the-cloud
  • recent-updates-to-the-oed
  • clickclean-interactive-us
  • ca-vantage-data-centers-id
  • new-iowa-wind-farm-will-feed-facebook-data-center
  • global-warming-and-energy
  • the-facebook-data-center-faq
With the map information and a partial Google search history reconstructed, I could easily guess what sort of story Henn was researching.

What crypto doesn’t conceal

Once you’ve left the (relative) safety of the major search, mail, and social media providers, the vast majority of what you do online is an open book. Most websites are unencrypted, as are the identifying cookies that Web browsers pass to them—cookies that can help unmask the people using those browsers. And while most of the major webmail services and other e-mail providers have provided encryption to protect e-mail content between users and their mail servers, a significant portion of e-mail traffic between mail servers remains unencrypted—leaving the content open to perusal by governments or anyone else who can capture it.

Extracting meaningful information from all that content doesn’t require that someone read everything in it. The NSA’s XKeyscore and a variety of traffic analysis tools can pull a trove of information from unencrypted Web and mail traffic. They can scan for keywords or look for patterns in data that identify “entities”—known data structures such as a name, an e-mail address, or a phone number. They can also count the repetition of words within a document to provide analysts with a sense of what the text is about—“bomb instructions,” “divorce lawyers,” or “casual encounters.”

Based on analysis of Henn’s traffic, I already knew that he was looking into cloud computing. I also knew the organizations he was researching based on the Web URLs. But who was he speaking with? The tools that Porcello set up identified a handful of phone numbers and e-mail addresses seen in Henn's Web traffic.

Some of these were banal, such as 1-800 numbers for customer service from some of the sites he visited. Another phone number was from Uberconference, a free SIP phone application that some reporters had been testing. But others, Henn later confirmed, were for people he had called as part of his research.

While we did not probe NPR's network, part of Henn's workflow included downloading an audio file from an NPR server. Our analysis tools identified and plucked the audio file from Henn's traffic stream and revealed a security hole that would have allowed me (had I the desire and the legal clearance) to obtain similar raw audio for some other NPR reports. (NPR quickly addressed the issue.)

With a fairly complete picture of Henn's reporting, it was time to end our broad monitoring of his network. Henn pulled the cord on the PwnPlug, and we moved on to phase two: a targeted examination of some major online tools and services.

Getting to know you

Taken together, the information we collected in the first part of our experiment was fairly revealing. But it only scratched the surface of what we could learn from an individual's Internet traffic. We began systematically testing some of the biggest and most popular services on the Internet with both Web and mobile device interfaces.

Many sites that can leak personal data don’t use encryption by default—or at all. In fact, many e-commerce websites allow users to perform searches and to access other information of a personal nature before logging in, only requiring a secure connection when it comes time to pay.

For example, if you search for something on Amazon or look at your wish list, your traffic is unencrypted by default. This traffic can include your name, birth date, and location, as well as searches for potentially embarrassing items. The following are search terms Dave Porcello was able to capture from his own Web traffic during the second phase of our testing:
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_1 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D201 Safari/9537.53.
Cookie: session-id=190-9015664-2689569; session-id-time=2082787201l;

http://www.amazon.com/gp/aw/s/ref=is_bo ... +capacitor.

http://www.amazon.com/gp/aw/s/ref=is_bo ... +Dispenser.

http://www.amazon.com/gp/aw/s/ref=is_box_?k=Wolf+Urine.

http://www.amazon.com/gp/aw/s/ref=is_box_?k=Live+bees.

http://www.amazon.com/gp/aw/s/ref=is_bo ... ockroaches.

http://www.amazon.com/gp/aw/s/ref=is_box_?k=Uranium+Ore.

http://www.amazon.com/gp/aw/s/ref=is_box_?k=bone+saw.

http://www.amazon.com/gp/aw/s/ref=is_bo ... ll+of+tarp.

http://www.amazon.com/gp/aw/s/ref=is_box_?k=shovel.

http://www.amazon.com/gp/aw/s/ref=is_bo ... a+mask+set.

http://www.amazon.com/gp/aw/s/ref=is_bo ... avel+guide.
Even applications that require a login and then encrypt parts of their traffic can leak personal data. Searching for packets with the keyword “Skype,” I came across what looked like normal Web traffic—a series of GET requests sent to api.skype.com. Skype calls themselves weren’t targeted by our surveillance, because Skype-to-Skype video and audio calls are encrypted, but it turns out the Skype client uses an unencrypted Web interface to retrieve the photo “avatars” for people in a user’s contact list. Part of that request contains the username of the contact, potentially revealing one's Skype contact list:

Image

We contacted Microsoft to check on this particular leak and were told that it had been fixed in a recent update of the Skype Windows app. When we had captured the data in April, the version we tested was only a month old, however, so it’s likely that many other Skype users are also leaking data from their contact lists. (And we know that the NSA collects such contact list data on a massive scale.)

Leaky phones

Your phone also leaks a substantial amount of data. We tested a number of mobile apps on multiple devices and found a whole pile of potentially privacy-exposing data, including:

- Weak crypto support on older devices. Facebook’s mobile security was fine on most current generation devices. But a Facebook app on an older Android device sent profile images and other photos unencrypted. We also found that Google searches from an Android 4.1.1 (“Jelly Bean”) device were unencrypted as well.

- Geolocation data. The iOS Weather application, which uses Yahoo’s Weather API, passed location in clear text. We also found that images taken with the iOS Camera app included, by default, location data, full data about the phone itself, whether the front or rear-facing camera was used, and the compass direction the phone was facing when the camera fired. If phone images are posted via a nonsecure app or e-mail account, this EXIF metadata can be easily detected in the packet stream.

- The Web history that never dies. As mentioned, a good chunk of Henn’s earlier mobile Web activity showed up on our first day of collection thanks to unclosed mobile Safari “tabs.” Safari stays live even when it’s been closed on the screen; behind the scenes, it can reload pages that were previously open.

- AT&T "brain" updates. Dave Porcello intercepted a file download from AT&T to an iPhone that included default settings for a variety of services. One of those settings, Porcello said, was a switch that tells the iPhone to automatically connect to Wi-Fi access points with the SSID “attwifi”. Attackers who want to put themselves in the middle between a phone and the broader Internet need only have their attacking device advertise with the SSID in the file. That feature can be disabled on iPhone devices, but according to Pwnie Express’ Oliver Weis, that isn’t the case with AT&T Android devices.

We contacted both AT&T and Apple for comment; Apple pointed us to AT&T, but AT&T didn't respond.

- Personal mobile app data. Some mobile apps offer little or no encryption of their content, which can contain location information and other personal data. Pinterest, for example, sends and receives all its data except for “settings” information in the clear. WhatsApp leaks the user's phone number. SnapChat encrypts everything—but it leaks the registration data for its under-13 version, SnapKidz.

- Unencrypted VoIP calls from an app. While Uberconference only provided us one of Henn's phone numbers in the clear, Dave Porcello tested another VoIP app called RingCentral and found that it left everything unencrypted, including the call itself. Porcello was able to extract the full audio of a call from an iPhone’s Internet traffic—and says he won't be using that particular app anymore.

- App downloads. Monitoring the traffic to modern smartphones and tablets can also reveal which apps are being bought and downloaded. Porcello found that both iOS apps and system updates appeared to be delivered to devices as unencrypted .zip files. Google Play Store content and apps and Android OS updates are also delivered unencrypted.

Such encryption gaps don’t just provide a way to spy on what’s on someone’s phone; they also offer an opportunity for hackers (at the NSA and elsewhere) to attack. Attackers could conceivably build a malicious version of an iOS or Android update or spoof the Google Play store and deliver an “evil” version of an app to a targeted phone—especially if the attackers can also fool the phone into connecting to their own malicious Wi-Fi access point.

We’re all insecure

Even without resorting to more aggressive, active attacks, the amount of information that can be obtained with simple network tools is staggering. This is exactly why the NSA has invested so much time and money in its passive Internet surveillance capabilities—and why even “drive-by” surveillance by anyone who can capture pieces of your daily life on the Internet is a potential hazard to your privacy.

After our brief one-week surveillance of Henn’s online activities, I joked that I could have written his story about data centers for him. And while that wasn’t quite true, we had uncovered a vast trove of information—the exact types of information the NSA could use as a digital fingerprint to identify and track any of us online:
  • Most of the apps on Henn’s iPhone, based on application data while he was connected to the Wi-Fi
  • The operating systems he used on personal computers, and the applications they ran—such as Microsoft Office, Outlook, Internet Explorer 7, Skype, and an app for syncing workout data from his wearable device
  • Henn's mobile phone number, unique device identifiers (UDID), model numbers, operating system versions, and cellular provider
  • The addresses of e-mail and VPN servers and personal e-mail services
  • Every website he visited and how often
  • Cookies used to read paid websites
  • Places he might be planning to travel
  • The general content of Web search queries and which sites he visited as a result
  • E-mail addresses and phone numbers he looked up online
  • His patterns of activity—when he was working, using his computer for non-work purposes, or was active on a smartphone
Voluntarily opening up your online life to this kind of monitoring is not for the fainthearted, but the exercise was revealing.

“If you have even the foggiest idea of how technology works and you think about what you are actually doing online,” Henn said afterward, “you have probably realized some of this could happen to you. But going through it myself, it was still kind of shocking in the detail.” He also realized with surprise that anyone tracking his Internet usage "could actually know more about my own past than I did."

Porcello, a security veteran, was himself chastened by data leaks from applications he frequently used—and he pointed out just how hard security is, especially for smaller companies. "We just look for apps that work and trust them," he said, because they help get work done—and the average small business doesn't have the time or resources to run penetration tests against every piece of software it uses.

Our experiment also highlighted my own lapses in daily operational security; playing NSA for a few days has made me want to dive deeper into my own Internet traffic to see where my network might leak personal data. That’s not because I’m concerned about being a government surveillance target; but I am concerned about what I, my children, and even my parents expose about ourselves online, even when we aren’t doing anything obviously wrong. Even if I make sure every application on every device in my house is up-to-date and do everything I can to lock things down, all I’m doing is minimizing my potential exposure—not removing it altogether.

Surveillance technology has become a commodity these days. While the NSA has invested untold billions to build its Internet collection capability, most users face more imminent threats of being surveilled while eating lunch in a mall food court by someone with a few hundred dollars' worth of mobile hardware and some open-source tools. And businesses are at risk of widespread breaches by anyone with a thousand bucks and physical access to the corporate network.

Is the Internet a safer place than it was before we knew about Prism? In some ways. But for the vast majority of people online, a little paranoia remains a very healthy thing.
The basic summary of the article is "be aware of what you do online, and who does what with your information," which appears to be common sense. However, it is disquietingly revealing just how often that information can be leaked anyway.