Mapping human behavior on the internet through collection of data that’s in fact extremely private has formed the basis for one of the world’s biggest economies. Here you can read about the business model that involves sucking data from the majority of the internet’s everyday users. Here we write about how the actual data collection is done. And in this article, we take a look at what the data is used for. But now let’s take a moment to identify which tech companies run this marketplace of behavioral data that the internet has been transformed into – and to look at the absurdly large amounts of data they collect.
Let’s start with the internet service providers. It’s pretty obvious that they keep track of what you do online (unless you’re using a VPN, of course). Nor is it particularly strange that they do this, because in most countries they’re forced to log your traffic by law. That doesn’t mean all internet service providers make an extra buck by selling the data on – but in a country like the USA it’s extremely common. An investigation by Vice discovered that it was even possible to buy people’s geographical location in real time. And according to a report by the Federal Trade Commission (FTC) in the USA, at least six of the largest internet service providers map their customers’ internet behavior and their alternatives for offering their customers privacy are an illusion.
So what else is there? Payment services: For example, Paypal has been reported to have terms and conditions that are longer than Shakespeare’s Hamlet, which gives a good indication that their data collection is somewhat excessive. The apps in your phone: Washington Post journalist Geoffrey A. Fowler calculated the total number of words in his phone’s privacy policies and they added up to around 1 million – or twice as long as Tolstoy’s War and Peace, if we’re going to continue the comparison with classic literature. And yes, user agreements this long equals data collection. When it comes to the apps, location data is one of the most attractive items. And in this particular category, there’s no limit on the sensitivity of the data that’s sold to the highest bidder; visits to medical clinics and religious institutions are amongst the basic products in a marketplace where people’s physical movement patterns bring in 12 billion dollars a year. And don’t think you’re immune because you’ve switched off location services. For the sake of simplicity, let’s use Meta as an example. Its business model includes paying its way out of court cases. This is no problem for it financially, but for every settlement we get to know a bit more about its methods. In a single agreement in 2022, for example, it paid 37 million dollars after having tracked 70 million users despite them rejecting the location service function. Still more expensive was the settlement with those affected by the Cambridge Analytica leaks, where Meta agreed to pay 725 million dollars after leaking data including private conversations. Meta in itself deserves a more exhaustive presentation. You’ll probably agree when you’ve read the next few paragraphs.
Meta: doesn’t even know itself how much data it collects, where it goes or how it could be deleted.
Both Google and Meta offer you as a user the opportunity to control and review the data collected by the company about you. But this is a false impression, and far from the entire truth. Meta doesn’t even want to reveal in court how much data it has. In a hearing linked to the Cambridge Analytica scandal, the company agreed to share data that can be found under ‘Download Your Information’ but argued that it wanted to keep data from ‘non-consumer parts of Facebook’ outside the courtroom. When the court didn’t agree with this and demanded an answer from two of Meta’s heads of development, they answered that not even Meta knows exactly how much data it has on people. “I don’t believe there’s a single person that exists who could answer that question.”
In spring 2022, leaked documents gave the same picture when employees at Meta admitted that ”We do not have an adequate level of control and explainability over how our systems use data”. Vice magazine published parts of the leak where employees at Meta compared its system with pouring ink in water.
“We’ve built systems with open borders. Imagine you hold a bottle of ink in your hand. This bottle of ink is a mixture of all kinds of user data: third-party data, first party data, sensitive data. You pour that ink into a lake of water (our open data systems; our open culture). And it flows everywhere. How do you put that ink back in the bottle? How do you organize it again, such that it only flows to the allowed places in the lake?”
The image emerges of a Meta without control over its (your) data. All that remains is to try and work out how much data it actually has. Over the years, there have been indications that the quantity is quite simply absurd. When ProPublica mapped Facebook’s data collection, it turned out that as early as 2016, Meta had a dizzying 52,000 unique attributes which it used to categorize people with the help of machine learning. Meta certainly wants to give the impression that the data collection primarily comes from users’ activity on their platforms. But you only have to read about scandal after scandal after scandal where Meta and data leaks have gone hand-in-hand to get a completely different picture. The leaks are often linked to the technology that they once called Facebook Pixel; the ad system that billions of sites use and which makes it possible for Meta to reach far beyond its own apps when it feeds its AI and machine learning systems with data.
Meta collects information about customers who've bought pregnancy tests and sought consultations for erectile dysfunction. This applies to people all over the world, regardless of whether or not they have a Facebook account.
To put it simply, Meta’s Pixel system means websites give Meta access to how their site visitors behave – what they buy, what they avoid, what texts they read, what videos they look at and so on – and in return the sites get to use Meta’s total data collection to optimally tailor and target their ads (on Meta’s platforms and in its ad system). In an investigation by The Markup it emerged that one in three of the world’s most popular 100,000 websites were linked to Meta Pixel. It’s this infrastructure that means Meta can keep track of internet users all over the world, regardless of whether or not they have a Facebook account.
When a leak via Meta Pixel is revealed, the newspaper headlines are often about how it has been possible to link sensitive purchases or online behaviors to real people via email addresses or phone numbers. For example, it has emerged that the Pixel technology registers data about pharmacy customers who bought HIV tests, pregnancy tests and who sought consultations for erectile dysfunction. But there’s actually no difference between a ‘scandalous leak’ where personal data such as email addresses has been leaked together with online behavior, and the constant flow of collected data that tech companies suck in every day, where the data can be linked to people with other methods: using IP addresses, cookies and other techniques. It doesn’t matter how much the tech giants excuse their actions by saying the data they have for profiles is anonymized. You only need enough data about someone for it to be impossible to keep it anonymous. It takes no time at all to put together the jigsaw revealing who’s hiding behind the data – and then it’s de-anonymized. Particularly if your entire business model is based on huge AI and machine learning systems whose only purpose is to categorize everything an individual does to build a profile of them.
Even though Meta has access to data about its 2 billion users and also tracks people on every third site in the world, the company isn’t satisfied with that. As well as collecting its own data, it also buys extra data from what are known as data brokers. The total amount of data collected gives Meta the ability – which it described in leaked documents – to target ads at people based on how they will behave, what they will buy and what they will think.
The scandals, the leaks and the absurd figures about how much data Meta actually collects gives us a good image of the company. But what perhaps says most about the company’s values and ambitions are the approaches it uses. It’s in the technical details that it becomes clear surveillance is the true core of Meta’s business model.
Meta collects the movements you make with your mouse, the messages you've written on social media but never posted and how you move when you carry your mobile phone, even when you've clicked to refuse sharing location data.
Meta isn’t exactly known for being transparent about how the company collects data and what it does with it. But you can use a back door to get into its thinking by reading its patents. It calls one of them Offline Trajectories, and it’s about using techniques that can predict when you’re about to lose signal and go offline. Several of the company’s patents relate to this – in other words, finding ways to locate you even if you resist. One patent is called Location Prediction Using Wireless Signals on Online Social Networks, and just like it sounds, it’s all about using the strength of your Wi-Fi connection or reading your Bluetooth to locate you. In the same way, Meta has used other people’s mobiles (near you) to identify your position even when you have location data switched off. Meta has been sued for breaking Apple’s Tracking Transparency and has itself admitted it can track people even when location services are switched off.
But nothing has revealed the extent of Meta’s data collection as clearly as the aftermath of the Cambridge Analytica scandal, where 87 million users’ metadata and personal messages went straight to an analysis company using the information to affect the American presidential election. Amongst other things, it emerged that Meta reads and registers your movement patterns with your computer mouse and the public Wi-Fi networks in proximity to tracked mobile phones. They use mobile masts and GPS to work out where you are. And they log your battery percentage, available storage space, installed plug-ins and the speed of your connection to identify you. The company also admitted that it uses metadata from images you take with your phone (data that isn’t visible to the naked eye but which is embedded in the pictures) to identify and track you. Spokespeople for Meta also confirmed it registers IP addresses and purchases data from data brokers to build clearer personal profiles.
Meta's patents reveal the core of its business model and its ultimate ambitions. One of the patents even aims to predict when you're going to die.
Meta has also been exposed for using something called the accelerometer to track people; this is the hardware in your phone that measures your movements and direction and which means, for example, that your phone can switch between vertical and horizontal mode. By mapping movement patterns and linking them to other apps on your phone, Meta has been able to identify how you move and when you visit different types of places. This technology has even been used to match with mobiles close to you, and suddenly it becomes extremely clear that the tech companies have access to technologies far beyond the obvious in their hunt for personal data. In another invasive way, Meta has monitored what people have written but not posted in different online forms. Meta calls these unposted thoughts ‘self-censorship’. We’ll say that again – text you wrote but that, for whatever reason, you chose not to post, has been saved and logged by Meta. But none of this truly comes as a shock any more. Meta also has patents for technology that can predict when people go through ‘life changing events’ by analyzing everyday routines and how your sleep changes (with your phone on your nightstand, everything’s possible). The patent even aims to predict when you’re going to die. Welcome to a brave new world.
Google – with a monopoly in terms of both search engine and web browser, it knows everything about everyone.
Of course, even if Meta appears to be extremely good at data collection, it faces stiff competition in Google. While Meta Pixel is present on one in three sites, Google’s equivalent, Google Analytics, manages 74%. The way it works is roughly the same. When a website has Google Analytics installed – to measure and analyze the traffic on the website and link it to Google’s ad system for more accurate marketing – Google also gets access to how visitors behave. But that isn’t the only tool in Google’s belt.
The company also provides free fonts for websites. This is an offer that 60 million sites have found difficult to refuse. And just like the company’s analysis tool, these come with the same demand for something in return: that Google can collect information about site visitors. On websites using Google Fonts, it can monitor visitors and how they behave by registering their IP address and then cross-referencing it with all the other information it has that’s connected to that particular IP address. The same sort of collection takes place wherever there’s a Google search box embedded in a website (this also applies wherever there’s a ‘share’ button from Facebook, Twitter or Instagram). Overall, this gives Google an enormous flow of data. But we all know this is only the start.
In 2022, Google paid 400 million dollars in a single settlement – then carried on with its core business: collecting personal information.
9 out of 10 people who use a search engine do so by googling. This means Google has an insight into the inmost thoughts and life of virtually every internet user in the world. Here you can read how Google doesn’t even need you to be logged in to know it’s you doing the search_LINK. And it doesn’t even end there. 7 out of 10 browsers used today are Google’s Chrome. Here you can see a comic strip explaining how Google uses its browser to google you rather than you using it to look things up. Add YouTube and Gmail, and what Google knows about the world and its inhabitants is almost limitless.
Just like Meta, Google has a huge budget for legal settlements(in a single settlement in 2022, it laid out a cool 400 million dollars – before continuing to collect data as before). But even if it can financially cope with this, trends indicate that Google will have to start adapting. Because Google Analytics has essentially been outlawed in several countries. In addition, third party cookies are under enormous legal pressureand Google itself has said it will phase out this type of tracking by 2024 at the latest. But at the same rate (or faster) that laws catch up with the tech giants, they move the focus to new ways of collecting data. Because, don’t forget, that’s their core business. As Larry Page, one of Google’s founders, said in an interview way back in 2001: “Personal information is Google’s business.”
In recent years, Google has felt forced to take a number of measures to appear as if it cared about privacy, despite the fact that its entire business model is built on exactly the opposite. For instance, it has announced that it deletes data after 18 months. If we ignore the fact that this means your digital footprint will be saved for 18 months at a time, the obvious question is ‘Does it really matter what Google say it’s doing?’ When Washington Post journalist Geoffrey A. Fowler contacted Google and asked why it was keeping 167 Gb of data about him – or 83,500 Stephen King novels, if you prefer – the company’s answer was merely: “We’ve long focused on minimizing the data we use to make our products helpful.” When the abortion laws were changed in the USA, Google said it would proactively delete ‘particularly personal’ data about the places people visited, such as abortion clinics and hospitals. A year after this statement, nothing had changed. It’s worth repeating: personal information is Google’s business. This means it can’t entirely ignore the world around it. But it does also mean that it’ll probably continue handling new legal requirements and pressure from the public by trying to find new ways of collecting data. At least until it changes its business model.
There are more tech companies that deserve a mention. TikTok has been accused of collecting large quantities of data and sharing it with the Chinese state. It is also clear on its own site that it collects things like keystroke patterns and the rhythm in how you write. Amazon has been exposed collecting absurdly large amounts of data in both its digital ecosystem and in physical stores. And you really don’t want to know where your credit card transaction data goes. As we’ve already said, the vast majority of the internet has been transformed into an infrastructure where the collection of personal data is used to increase both revenues and power. And it’s going to take strong resistance to overturn that trend.