Data scientist here! In addition to the data points others have mentioned, there is actually a lot more data available than you would think in the form of metadata. We call the process feature engineering - essentially building a set of inputs that help determine an output, or prediction. How long you spend in the app, how long you stay on a screen before changing, how long you view a TikTok before swiping, which of the default settings you change, into what, all of this is used in machine learning models to help build a more accurate advertiser profile for you. Even if you don't volunteer data about yourself, your behavior in a way informs on you, even if you don't realize it. Through inference, a machine learning model could accurately deduce your age based on your behavior, for example.
No Stupid Questions
No such thing. Ask away!
!nostupidquestions is a community dedicated to being helpful and answering each others' questions on various topics.
The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:
Rules (interactive)
Rule 1- All posts must be legitimate questions. All post titles must include a question.
All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.
Rule 2- Your question subject cannot be illegal or NSFW material.
Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.
Rule 3- Do not seek mental, medical and professional help here.
Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.
Rule 4- No self promotion or upvote-farming of any kind.
That's it.
Rule 5- No baiting or sealioning or promoting an agenda.
Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.
Rule 6- Regarding META posts and joke questions.
Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.
On fridays, you are allowed to post meme and troll questions, on the condition that it's in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.
If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.
Rule 7- You can't intentionally annoy, mock, or harass other members.
If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.
Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.
Rule 8- All comments should try to stay relevant to their parent content.
Rule 9- Reposts from other platforms are not allowed.
Let everyone have their own content.
Rule 10- Majority of bots aren't allowed to participate here.
Credits
Our breathtaking icon was bestowed upon us by @Cevilia!
The greatest banner of all time: by @TheOneWithTheHair!
And if this sounds dystopian to you.
I anecdotally got into a CEO data conference, where leaders were discussing strategy and tactics. Biggest topic of the day was, why can't I track how many times someone sees my physical store/billboard/sign and makes a decision. Geofencing + your cellphone GPS isn't accurate enough for these guys, they want to know how long you stared at the store, what made you move in, what demographics you belong to, and how can they maximize your likelihood to purchase more stuff.
Why does this matter? People are more likely to buy more stuff in a store wandering around than on a market place where they just swap tabs to get the same thing from somewhere else.
If I can make my store front like temu to get you in and keep you there, then it's likely you'll be interested in buying more stuff you didn't know you wanted.
Yep, I've been at conferences for data science where I hear talking about tracking position in a store using things like Apple air tags for the same reason.
I can’t wait for that to start (/s) so I can completely mess up the metrics by standing around for 10 minutes in isles I have no reason to go down, thinking “what am I forgetting?” Or just completely blanking out for a bit due to choice overload (Which happens about 10 times every grocery trip, and I don’t “browse” other stores, I go looking for specific things or I don’t go at all.)
I hate that some stores have started to waste consumer time by make finding stuff an absolute pain specifically to get you wandering around the store looking for it. Best Buy is really bad about this near me. Good fucking luck finding an SD card or hard drive without asking for help.. they are in 6 different places, each, based on what they think you might need for whatever application (importantly, no duplicate products, so if they have a 5tb drive in one place, that specific drive won’t be in another place). I avoid going there now whenever possible, as I don’t support consumer-hostile practices. Even the employees think it’s stupid because it makes them work more, and everyone complains because it’s obnoxious.
Art Vandelay
I thought you were an importer
from industries import vandelay
:p
So, the goal typically is to gather as much information about a user in order to define a profile that advertisers will use to serve ads that are more relevant to the end user? Is there any other end goal, such as to build a better app or inform decisions that will ultimately lead to a better user experience?
Aside from what everyone else has said, also:
- Location data (your phone might be telling companies where you drive/go to where they know your routine and such.)
- What commercials you watch; even commercials you watch on devices other than your smartphone.
- Search history.
- Call history.
- Audio/video recordings taken without your knowledge or consent.
Don't worry, they gave everyone a stale donut and an apology email as part of their class action lawsuit punishment.
Would they not have had to give access to location services for this to happen though? Google is very good at giving me a "only while using this app" option for this kind of stuff now.
They surely agreed to it, the mixup is that people in general don't realize how much data and how often Tim Horton's wants to collect it.
Tim Horton's should probably just know which Tim Horton's you're closest to when you go to place an order, and that's about it. There's no reason they should even be allowed to ask to track you all day every day, even if you agree.
That's sort of the gist of it.
I installed the McDonalds app years ago and it asked for location permissions. I turned them off. But it didn't want to let me place a pickup order or something unless I was within a certain range of the restaurant. So sure, I'll turn it on. This was a version of Android before "Only this time" options existed.
Of course, I forgot to turn it back off. A few days later, I got a notification from the app that I was near a McDonalds -- how about ordering some fries?
I uninstalled the app and never looked back. Actually, since then, I've been kind of avoiding McDonalds. They price their stuff knowing that people will get discounts through the app...but no way am I getting that again.
That's pretty gross, but the fries are tasty!
A surprising number of restaurants ask for the "all the time" permission and hide it with "so we know when you're almost here".
Phone number, email, anything else you put in, plus device and connection data. Also, depending on the app, it could steal passwords, cookies, banking info, etc.
TikTok in particular grabs a list of other installed apps and your entire contacts list as well, IIRC.
Many social media/messenger apps take your contact list and if deny the request it will disable features.
Apps are also interested in how long you stay on a particular page, whether you tap on any ads, and how often you visit particular parts of the application.
The theft is not generally that they're collecting the data, the theft comes from them not paying you for it, and also usually not telling you they are collecting it. Taking something of value from someone without compensation and permission.
In terms of what they do with it, it isn't really important since the theft has already happened. But usually the data is sold to advertising agencies, or other application developers, sometimes it is used for research, and it can often make its way to illegal blackmarkets as well depending on the source of the data.
they can also collect much more depending on which permissions it has. IIRC some sensors like movement are not behind permissions.
some collect your behaviour online to extrapolate your personality, habits and to predict you and manipulate you too, thats scarier imo.
The biggest problem i have with my data being collected, analyzed and used is in the fact that it will almost certainly be used to teach a ML model about how to better manipulate with people like me - the people that are privacy conscious and are trying as much as possible to reduce their fingerprint.
That data is invaluable, and if there does exist a way how to target even people like that, which there probably does since we're only humans after all, the ML model will eventually figure it out. And they have literally billions of people to experiment and learn on.
Now, we already know from a few leaked studies made by Facebook that they cab already pretty well manipulate people into mostly whatever they choose. Take a hypothetical situation where you get a crazy out-of-touch billionaire, who decides to buy a large social network company, and then decides "Hey, I really want this candidate to win. Tune up the algorithms!".
And the ML models will get a clear goal, that has been already proven to just work pretty well at influencing user behavior. And any data you give them, it helps the model to fine tune into influencing people like you . Which would also be really hard to prove, because ML models are by definition black boxes that are really hard to reverse engineer, and proving that it was trained to do this is AFAIK almost impossible.
I don't want no part in that. Thankfully, all the large social networks have CEOs that are reasonable and would never try something like that, right?
And one more thing - you may not think that data about your behavior are of interest to anyone right now. But look at China and their Social Credit. And imagine how would have I.e holocaust turned out, if the government had access to all the data, opinions and profiles of people that are being collected now.
Oh, you mentioned you sympathize with the Jews three years ago in a private message? Well, let's hope the country you live in never ends up in a situation where that could be a huge problem for you or your family.
So, every time any site is offering a "personalized, curated list" for you (I.e the google search result, or YouTube recommended videos), assume you are potentionally being manipulated, and avoid the site altogether- because there's no other way how to prevent it. The ML model knows that you know, and is already trying to figure out how to manipulate people that are taking care not to be. And if there is a way, it will figure it out with some success.
The potential future authoritarian government has been my primary concern when it comes to data collection and profiling by corporations like Google and Meta for years. The governments don't even have to build their information gathering networks, although they still will, but so much of the surveillance has been done for them, goes back years (literally an entire lifetime for many people now), and is just a request away. I can't judge how the climate will be in two years, let alone a decade or two from now, but that information isn't going anywhere.
I find the motion sensing and gps tracking to be the creepiest. Using motion sensing they can know when you put your phone down and pick it up, if it was screen down or face up, and knows when you are walking, running, driving, etc. Combined with GPS it can be used to pretty accurately judge when you wake up, where you go, and how you get there. Lots of apps also don't "close" when you swipe it away, they continue running in the background, so if you have the setting "only collect data when using the app" it will still collect data until you close it in the background or force stop it.
It is even worse than that. Given the list of data you have provided it is actually possible to discern general activity. You can determine if you are playing video games, working out, watching TV, out on a date, hanging with friends. As long as your phone is in your possession, the patterns for every behavior have a distinct fingerprint for each person. With enough collection, they can be filtered and categorized.
Source: I am an analytical statistician.
I also read about how they can correlate data between users and devices, too; maybe you don't have location on, but your app can correlate accelerometer data from your device with matching data from the same time from another device on the bus that does have location on. Boom, now they know you ride that bus. Or: everyone connecting from a particular IP address visits a particular restaurant's menu site from a QR code. Pretty good chance, then, that that IP address is the restaurant's wifi. Now they can correlate all that data and find out who your friend group is. Even something as simple as knowing that you were near your friends for an extended period of time while they were in an Uber to a venue before a show can help them build a profile about you and your cohort's interests and behaviors.
Yup. I love that I got my math degree, but it does give me an understanding of things like this that are usually miles ahead of my cohorts. It makes my skin crawl to see the kinds of things that these companies harvest. You mention restaurant QR codes. I'm sure not all of them are, but it is so easy to build harvesting APIs into websites that host those menus. I do the Analytics work for my company and the things that even just the basic analytics tag harvests, let alone setting up specialized eventing or more invasive APIs.
That's true, although I believe you still have to give permission to an app to use this (at least on Android). Not to say that people won't accept things way too fast.
The enhanced permission api was a huge step forward but plenty of apps still just demand permissions up front and lock you out until you grant them
To your last point, yes. The average user doesn't even glance at the permissions before blindly accepting them. It is also true that an alarmingly high number of users/consumers /don't care/ about basic privacy concerns that affect things like targeted ads, PII, and information that could be used to affect things like credit score.
Along with what others said, things you are interested in, demographic data, etc. The content you choose to watch on tiktok or products you click on on temu reveals a lot of valuable information about what ads might be most effective on you so they can target ads to you.
Depending on permissions, just about everything.
The more worrisome of these would be all your contacts, your location (even with Location permissions denied it can still be extrapolated up to a point if allowed to access to information on "WiFi networks nearby") - which can be used to derive workplace, living place, hobbies and, when crossed with other people's data, even who you regularly meet with - call history, files in your phone (such as personal photos and stuff you downloaded), sites visited and, even more seriously, actually record what's being said around your phone and even image as well as track something as intimate as how your phone (and hence you, if its in a pocket) move and when.
All of this is beyond the whole tracking of app usage (what do you do, see and for how long in it) which at least makes some sense to track for quality improvement.
That said, what makes it a problem is not that the app can get that infomation from the phone's systems but that it can, without your authorization, send it all to a central server - if it couldn't do the latter all that data capture for processing inside your phone would be absolutelly fine.
simple, valid personal information can be valuable in aggregate. it is accumulated and sold to ad companies.
these apps are often given permission to look through your phone and report back other data.. more than 'simple'. browsing/shopping history at best, account creds at worst.. its mostly for the same reason; advertising.
Everything. Basically, if it's not nailed down, they want to take it.
The short list of most common data taken would be app usage stats, not necessarily just for the app in question (eg, tiktok may pull data on how many hours of screen time other apps get, like YouTube or Instagram or literally anything else), GPS info, data about how often you handle your phone (from accelerometer readings), wifi networks including the bssid (mac address) of your router, which cannot be easily changed or masked, sometimes even data from your mic when you're not using the phone at all.
They know when you're sleeping, they know when you're awake, they know when you've been bad or good.... Oh wait, that last bit is Santa... Isn't it?
Anyways, I wouldn't be surprised if a few are bold enough to upload your pictures regardless of if you are posting the images, your browser history, security, device make/model, storage of your device, the list of files in storage, text messages...
Basically, anything that might help them identify you, what you do, where you work, when you work, how you travel, whether you're in a relationship, how happy you are in that relationship and how long it has been going on... Anything that might lead them to provide more targeted ads. Been in a relationship for a while and you seem happy? Check out these engagement rings. Already married? Here's some ads about parent stuff. Even something as simple as, hey, you're single and it's February, why not try Tinder or Grindr, or (insert app for your preference here).
They want to know everything there is to know so they can get you to buy more crap you probably don't need, for more than it's worth, and keep that economic gravy train rolling.
Also to add to bssid, it is possible (in the majority of cases) to get the exact (and i do mean exact) geolocation of the router whose bssid you have. See geomac by drygdryg on Github.
Fun story: I purchased several wireless access points from an eBay seller, years back, and when I brought them online, our geolocation services on all our phones thought we were several hundred miles away from where we lived for many months. I assume the bssid data was feeding the incongruency.
After a few months, however, whatever database was feeding our devices with bad geolocation data, was updated, and we were once again "located" in the correct spot.
The accuracy of these systems is incredible, it will actually use, not only your own bssid, but also that of complete strangers to try to figure out where you are without turning on GPS. If your personal bssid is weak but your neighbors bssid is stronger, it will adjust your position based on the relative signal strength of each bssid that is detected. In the same way triangulation works with most radio signals.
I've seen such systems estimate, with a fair amount of accuracy, client location data on a floorplan where there are a few dozen access points in the space.... So it works both ways. In that case I was part of a team at a job where the client had a couple thousand square feet of floor space, and about 12-15 access points to blanket the space in coverage. We could, with some degree of accuracy, follow the location of someone as they moved through the space; knowing where they spent most of their time, and what services in the space were utilized by the guest.
.... It was a mid-sized airport.
The basic idea is that you build a dossier on everyone. You discover what kind of food they eat, where they live, The size and makeup of their family, their sexual preferences, pregnancies, what kind of porn they watch, where they shop for groceries, where they shop for electronics. You tie together purchases with their credit card to purchases in other apps or even brick and mortar stores. You figure out where they owe money with their education looks like. You look at these things even altogether at some point in your life and go why the hell do I care.
Then 20 years down the road when Chinese companies start pushing out American banks all of a sudden you can't get a loan for a house or a car . Or maybe you're going for a job at some point in this data is leaked back out now it's part of your indelible history.
Perhaps somebody takes it all and throws it into a large language model, All of a sudden they've got clarity into your post history on all social media even stuff you thought was private because they know your phone serial number or your home IP address.
Corporations and governments don't have any business knowing about your private life. They shouldn't get to make decisions based on your private choices and preferences.
they mostly do it either because they can sell the data, or can display "more relevant" ads to you, so they can charge advertisers more.
Both, really. They do both.
apart from all the stuff already mentioned, some apps arise from a really insidious industry: blackmail through personal loan apps.
these apps fetch your contact details and the scumbags behind them exploit this info at your cost.
here's a bbc expose on this with more info: https://www.bbc.co.uk/news/world-asia-india-66964510
and then there's sms. especially when bank balance alerts are sent, that's gold for marketers.