For the threads with the older one on the left: https://lemmy.world/post/14859950
(Thank you @Nelots@lemm.ee )
Lemmy is not immune to this!! We need to develop FOSS to mitigate/detect that
oh it’s simple, don’t capitalize and it’s immediately harder to do.
I do find it funny that you didn’t capitalize any words in this comment.
i mean listen we’ve got priorities here. We’re capitalizing, not capitalizing.
My understanding of how this works is that that left one is real accounts making real comments, at least in the majority.
Then when the link gets reposted, either by a bot or naturally, potentially depending on the title, the bots scrape the old comments and post them.
It’s content farming. And Reddit is probably okay with this.
The right one is the “real” accounts. Notice how the left one is newer and all the accounts have names ending with four digits, except where they aren’t copies from the right.
No, the left one is older and most the names in the right contain four numbers.
What’s going on here?
Maybe op updated the picture?
I did, because other people complained in another comment that it was confusing to not have the older thread on the left.
Anyway, it’s pretty obvious which one is which one
Thanks I almost thought I’m delusional
I also thought you were, lmao.
deleted by creator
The list of names at the left creeps me the fuck out.
I saw this exact same style of bot account years ago on Tumblr. They always follow the same naming scheme: one word or two words combined and then a string of 4 digits. I bet if you go to any of their profiles, you’ll find like 4 comments that are all copied from old threads and a bunch of upvotes on completely random subs, possibly even all of them being on other bot accounts’ posts and comments.
The real question is whether they’re being used to fake activity on Reddit, sway public opinion by posting this sort of political slant, or will they later be used to advertise scams and this is just to make them seem legitimate.
Why not all of the above? If you have a service, you want to sell it to as many customers as possible.
Reddit is going to poison LLMs sooner than I thought.
LMAO while AIs reading training data sets get stuck in infinite loops.
Reddit probably omits bot accounts when it sells its data to AI companies
Doubt it, they are interwoven into almost any conversation with more than 70 comments.
If you have access to the entire Reddit comment corpus it’s trivial to see which users are only reposting carbon copies of content that appears elsewhere on the site
The low level bots in OPs screenshot, sure, because it’s identical. Not the rest.
I used to hunt bots on reddit for a hobby and give the results to Bot Defense.
Some of them use rewrites of comments with key words or phrases changed to other words or phrases from a thesaurus to avoid detection. Some of them combine elements from 2 comments to avoid detection. Some of them post generic comments like 💯. Doubtless there are some using AI rewrites of comments now.
My thought process is if generic bots have been allowed to go so rampant they fill entire threads that’s an indication of how bad the more sophisticated bot problem has become.
And I think @phdepressed is right, no one at reddit is going to hunt these sophisticated bots because they inflate numbers. Part of killing the API use was to kill bot detection after all.
Reddit has way more data than you would have been exposed to via the API though - they can look at things like user ARN (is it coming from a datacenter), whether they were using a VPN, they track things like scroll position, cursor movements, read time before posting a comment, how long it takes to type that comment, etc.
no one at reddit is going to hunt these sophisticated bots because they inflate numbers
You are conflating “don’t care about bots” with “don’t care about showing bot generated content to users”. If the latter increases activity and engagement there is no reason to put a stop to it, however, when it comes to building predictive models, A/B testing, and other internal decisions they have a vested financial interest in making sure they are focusing on organic users - how humans interact with humans and/or bots is meaningful data, how bots interact with other bots is not
It’s probably not as easy as you imagine for reddit to identify and cleanse all bot content.
Of course it’s not. Nor do they want to.
I think the person you’re talking to thinks all bots are like the easy ones in this screenshot.
Look at the picture above - this is trivially easy. We are talking about identifying repost bots, not seeing if users pass/fail the Turing test
If 99% of a user’s posts can be found elsewhere, word for word, with the same parent comment, you are looking at a repost bot
I doubt Reddit is in charge of many of the existing bots on their site.
Reddit has access to its own data - they absolutely know which users are posting unique content and which user’s content is a 100% copy of data that exists elsewhere on their own platform
I know they could be I’m just not sure they’re that competent. These bots often aren’t single user or just copy paste either, there’s usually some effort to mix it up or change wording slightly. Reddits internal search function is infamously shit but they “know” which users are unlabeled bots with some effort put behind them?
I know everyone here likes to circle jerk over “le Reddit so incompetent” but at the end of the day they are a (multi) billion dollar company and it’s willfully ignorant to infer that there isn’t a single engineer at the company who knows how to measure string similarity between two comment trees (hint:
import difflib
in python)- To compare every comment on reddit to every other comment in reddit’s entire history would require an index, and if you want to find similar comments instead of exact matches, it becomes a lot harder to do that efficiently. ElasticSearch might be able to do it, but then you need to duplicate all of that data in a separate database and keep it in sync with your main database without affecting performance too much when people are leaving new comments, and that would probably be expensive.
- Comparing combinations of comments is probably impossible. Reddit has a massive number of comments to begin with, and the number of possible subtrees of those comments would just be absurd. If you only care about comparing entire threads and not subtrees, then this doesn’t apply, but I don’t know how useful that will be.
- Programmers just do what they’re told. If the managers don’t care about something, the programmers won’t work on it.
It’s account farming. They make fake accounts look legitimate so they can use them to influence opinions on the site.
deleted by creator
The left predates the right by 10 months
deleted by creator
https://en.wikipedia.org/wiki/Dead_Internet_theory
I didn’t believe this when I first heard about it but it’s looking more true everyday
Yeah, even if we’re not quite “there” yet, it feels like we’re at least moving in that direction
Definitely depends on where you’re going. Certain Hexbear posts are such obvious bot networks, while some niche communities can remember what they wrote more than two comments ago.
This gets posted all the time, and it’s frustrating that it lacks any nuance.
It’s just a spooky bedtime story… “imagine if everyone you talk to online is just a bot”
Yes a lot of online content is generated.
Yes it’s getting worse.
Yes there’s lots of bots.
However… you can choose where you spend your time online, and spend it with friends or likeminded people.
What I mean to say is, some communities on reddit are “mostly dead”, but you don’t have to go there.
This is extremely dangerous to our democracy.
What if find absolutely wild is how their stock didn’t just flop. The site has been on a downward spiral since the first redesign, and with the cut to API they’ve basically entered a freefall. I could seen people backing Reddit like 14 years ago, but now? Why?
I suppose if there’s any optimism to have in OP’s post it’s that the bots are at least propagating messaging that’s better for the greater good than the typical shit that’s trying to get us into a full dystopia.
This is extremely dangerous to our democracy.
This is extremely dangerous to our democracy.
This is extremely dangerous to our democracy.
This is extremely dangerous to our democracy.
This is extremely dangerous to our democracy.
Just paid a visit. It’s really gotten bad. Horrible titles that make little sense. People falling over each other to make tired quips instead of conversation, and the rest to point out how someone is wrong or one-up the commenter.
That’s what it has been like for years now.
IMO it’s gotten markedly worse since the 3rd party app debacle. Perhaps combined with the advent of AI added to bots has made it obvious. Yeah, it’s been on a decline for quite a bit with the repost bots repeating everything from posts to replies, but people would call them out. Now it’s like it’s bots all the way down or the remaining participants have resigned themselves to the decline.
Small subs still seem mostly safe, but anything with decent participation is pretty bad.
Yeah the only real reason for Reddit for me anymore is sports discourse. E.g. the Baltimore Orioles are my MLB team. /r/Orioles on reddit has almost 80k members. Currently on the page there’s 62 people actively in the sub and that’s at 10am on a Wednesday, not during a game. The two Orioles communities on lemmy are Orioles@fanaticus.social and Baltimore Orioles@lemmy.world and they have 133 and 131 subscribers, respectively. There’s a bot posting game day threads and 0 comments in all of them. The only post not by a game day bot was 21 days ago.
Yeah I feel you, at least the Orioles team is super stacked rn though (speaking as a Yankees fan 🫠). !yankees@fanaticus.social is equally dead.
My current thought process is that if we can get a decently active generalized baseball community going, it could provide a stepping stone to increasing the activity in the team-specific communities. I’m trying to be active on !mlb@lemmy.ml and !baseball@fanaticus.social as much as possible.
There is already a latent population of sports fans on Lemmy, but it’s sort of a self-fulfilling prophecy that the communities aren’t active so people assume there must be no other fans.
My other thought on this topic is that although I do miss the active fan discussion and game threads, the subreddits for essentially all of my teams were indisputably toxic cesspools. The whining, armchair GMing, scapegoating, and just completely idiotic takes were out of this world. So it’d be nice to have activity, but too much activity can also degrade the quality of discussion to the level of Twitter and just create a very toxic environment where fans are constantly arguing and complaining.
Username checks out. Which client are you using for Lemmy?
I switch between Mlem and Voyager (iOS). I like them both, but I tend to use Voyager more. Mlem tends to give me more variety of communities, I like Voyager’s layout.
Reddit went to shit when the zoomers flooded in, arguably the late 90’s kids aswell
Been happening a lot longer than you imagine. I stopped using Reddit when the third party apps got shut down. At least the last year of my time there was calling out repost bot accounts. Threads like that on smaller subs with week moderation were really common.
Even on some better moderated subs, they got through.
Reddit died for me a long time ago.
I’ve noticed that many Reddit users with the username format
Word_Word_Number
(for exampleAbsolute_Bot_1230
) are almost guaranteed to either be a bot or extremely inflammatory – it’s like everything they post is meant to generate controversies.It’s Reddit’s automatic username generation, so either yeah, bots, or someone logging in through Google/Facebook and having a username assigned to them.
Yeah reddit has a name generator that you can choose from when you create an account and that’s the format it uses. Those names are almost exclusively bots and throwaway/anon accounts
I’m glad i end with word*_word_word for my screen name, lol.
Well yeah they even have bot in their username.
I don’t get it. They already created a good bot network, but the username part is where they get lazy.
Thank you. That is the day when I’ll finally stop using Reddit. I never have thought that bots write that realistically, so thank you for proving it.
Well they actually don’t write that realistically, these are copy and paste bots that are just trying to farm karma so they can later sell the account (which I’ve heard is a thing apparently?). You can see the left is all original accounts by the uniqueness of their usernames and the copied posts on the right are all reddit generated names.
The left image is the original post, 10 months old, where (at least most) of the users are real people. Left is full of bots copying the post 1:1, comments included.
Or worse still, AI (really LLM) farming.
Just said on a Reddit r/worldnews’ thread that the subreddit has been astroturfed for years, as a response to someone wondering how could people in the comments be wishing for more innocent Palestinians be killed, and surprise surprise, I got instabanned. The site is becoming a façade of a fake reality in far more ways than one.
I was permabanned from r/worldnews for saying we should give free meals to kids at schools here instead of wasting money blowing up other country’s kids.
r/worldnews is just a propaganda sub disguised as a hub for world news.
I remember when the narwhal used to bacon only at midnight.
Now the narwhal is forced to bacon continuously.
This kills the narwhal.
🤮
This.
Thank god this isn’t a problem here
P░ U░S░S░ Y░I░ N░B░I░O ░
hey…
There’s no pussy in your bio…
😡
Just keep clicking. You’ll get to the malware eventually.
Never trust a default username
[adjective] [noun] [3-4 digits] is always a sign of bad news, on social media and Xbox Live
And here I thought making a default username looking one was a good idea…
I’ve switched to name generation to stay extra anonymous…
Yeah, that’s what I was going for, I wanted a very typical name.
I don’t know about that. I now stick to default names after HR told my department to help them identify some leakers on reddit.
We use manual approval for programming.dev accounts where there is a very simple instruction you must follow to be approved. The amount of spam that fails that test makes me concerned about the amount of bots from instances without any barriers for account creation.
What happens on reddit (in regards to spam) will inevitably finds its way to ActivityPub link aggregators like lemmy.
I am sad that the current generation of federated social media/networks still doesn’t have much, if any, implementation of web of trust functionality. I believe that’s the only solution to bots/AI/etc content in the future. Show me content from people/accounts/profiles I trust, and accounts they trust, etc. When I see spam or scams or other misbehavior, show me the trust chain connecting me to it so I can sever it at the appropriate level instead of having to block individual accounts. (e.g. “sorry mom, you’ve trusted too many political frauds, I’m going to stop trusting people you trust”)
I think this would be a great feature request: https://github.com/LemmyNet/lemmy/issues
I would definitively use it if it was implemented. Make it work like it is in GPG, where you can rank users based on your trust, and that is then propagated to others.
This concept reminds me of a certain browser extension that marks trans allies and transphobic accounts/websites using a user aggregate with thresholds that mark transphobes as red and trans allies as green.
I guess the question is how specifically you implement such a system, in this case for software like Lemmy. Should instances have a trust level with each other? Should you set a trust when you subscribe to a community? I’m not sure how you can make a solution that will be simple for users to use (and it needs to be simple for users, we can’t only have tech people on Lemmy).
For the simplest users, my initial idea is just a binary “do you trust them?” for each person (aka “friends”) and non-person (aka “follow”), and maybe one global binary of “do you trust who they trust?” that defaults to yes. anything more complex than that can be optional.
But how does this work when you follow communities? Do you need to trust every single poster in a community?
You’d see posts in a community/group/etc based on your trust of the community, unless you’ve explicitly de-trusted the poster or you trust someone who de-trusts them (and you haven’t broken that chain).
Right, so if I have no connection to someone else, it’d be “neutral” and I’d see the post. If I trust them transitively, then it would be a trusted post and if I distrust them transitively, it would be a distrusted post.
I think implementing such a thing would not only be complicated but also quite computationally demanding - I mean you’d need to calculate all of this for every single user?
Honestly I already believe that this has happened.
My reason for thinking this is because of this:
The spike that happened on October 2023 after the initial spike that happened due to the Reddit protests seems unnatural to me.
Someone gave the explanation of the release of the mobile clients but even then I wouldn’t think it would lead to a spike equivalent to the initial one since it would mostly just be people using an account they already had instead of creating a new one.
Like honestly if someone knows what event happened then that made so many new users join I’d appreciate it.
Newer user here… the api stuff got me to delete my reddit account but still surf it, it was the day of the IPO that i created my lemmy account…
That happened in March 2024 I think. And Reddit filed for the IPO in December 2023.
https://www.reuters.com/markets/deals/reddit-seeks-launch-ipo-march-sources-2024-01-18/
The day it IPOd on the market was my final day not the day of filing… I was holding out hope it wouldn’t happen lol…
Fair enough
Is that just accounts in total or active accounts?
I didn’t comment much in the beginning.
Now I try to comment at least once a day.
accounts in total.
Wait, then how would it go down? Are people deleting their accounts that much?
Apparently. But it seems like it only happened around the beginning after the second spike it stabilized for some reason.
Okay then I will admit, it does seem fishy.
Reposts has always been a major issue on reddit, there are an infamous moderator who would delete posts with traction and repost it himself for karma.
Using bots to duplicate comments on reposts is a new low though.
Esteemed, world-renowned actress Margot Robbie?!
Is it new? I got the impression that’s also been going on a while.
It’s definitely not a new issue, but it’s only gotten worse since reddit has gone more and more mainstream.
If you follow me on Lemmy since last year, you should know that I’ve always been extremely against having bots posting here.
I’ve always been extremely against having bots posting here.
As are all who live to see such times.
Except certain transparent bots that serve a clear, particular purpose. Like, we could have a bot that adds a new honorific to your description every time someone says, “oh hey, I saw a Margot Robbie on TV! Is that you?”
MargotRobbieHonorificBot: That’s Her Esteemed Greatness The GOAT Academy Award Deserver And Future Empress Of The High Seas Margot Robbie!
This bit is way funnier when it’s a real person saying it instead of a bot.
What if it’s an actress, does that count? Or is it like, yeah, that’s just her character saying that.
moderator who would delete posts with traction and repost it himself for karma
I’ve had this happen to me, it felt so fucking wrong lol. My thread got deleted by the mod and he reposted it as a sticky on his own name without so much mentioning me.
IMO the only way to not be infected by bot content is to not be popular, or small enough to be irrelevant.
Popularity is overrated. Irrelevance is freedom.
I wonder what the fediverse’s answer will be to this problem once it gets popular. Will instances that has a lot of bot content be defederated? some kind of fedipact against bot (unlabled) content?