Does Google know what you're doing online? Eighty-eight percent of the time, the answer might be yes. Earlier this month a group of graduate students at UC Berkeley released a detailed report on the extent of online tracking and the disconnect between reality and user expectations. The data in the report was collected from various sources including examination of popular and randomly selected websites, surveys, and Freedom of Information Act (FOIA) requests. The examinations of websites searched for the presence of "web bugs"---the term the researchers apply to the various methods third party tracking services use to monitor activities online.
The researchers found that every one of the top 100 websites contained at least one web bug. They also found that just a few third party tracking services were represented over a wide selection of sites. For example, in an examination of nearly 400,000 sites it was found that Google tracking cookies were served by 88%. Thus, quite a lot of data is being aggregated by a small group of companies. The researchers also examined and categorized the privacy policies of the top 50 websites. They found the policies to be largely contradictory or confusing. For example, all of the websites say that they won't share the information collected about users with third parties, but they allow exceptions for affiliates and contractors. When the researchers investigated the affiliate relationships of these companies they found that the websites in question had an average of 297 affiliates each. Further, only 36 of the top 50 sites mentioned third party trackers in their privacy policies and in each case it was stated that the practices of third parties were outside the scope of the website's privacy policy.
The report highlights a significant problem with privacy online: there exists a lack of transparency with regard to what information is collected about users and what is done with it. Users have access only to a vaguely worded privacy policy that may even mislead them if they read it. Third party tracking is particularly problematic as not only does the user have the normal barriers but he is unlikely to know it is even going on---third party tracking is normally accomplished using tiny, invisible images or javascript code. Additionally, it is seldom clear what will be become of the data in the long term. Is it deleted once it is no longer useful, or retained for as-yet-unknown uses? The report's major recommendation is for regulation requiring companies to provide access to collected data and explicit notice of the purposes and destinations of all data.
The recommendations are not brand-new and it has generally been argued that such requirements would either frustrate business online or be of little benefit to users. I believe that it is indeed true that the requirements may frustrate certain kinds of businesses online but in the long run will benefit the majority. Greater transparency and accountability will increase user trust in online businesses. In the short term it may affect the revenue streams garnered by targeted advertising and data collection but it is not impossible for businesses to find new sources of income or adapt the old ones to respect the privacy of users online. The second argument is usually that users don't really want the kind of privacy the report is advocating. They are content as long as no actual harm comes from the collection of information. The surveys and FTC complaints summarized by the Berkeley report contradict this argument. Users are concerned about their privacy and want greater control. The history of FTC complaints has shown that when users are made aware of invasions of their privacy, they will act to hold the company accountable.