this post was submitted on 27 Sep 2024
27 points (100.0% liked)

Programming

17080 readers
273 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 1 year ago
MODERATORS
 

I'm a newbie to ActivityPub so please be patient with me.

All intros into ActivityPub speak about how a user of a server A subscribes to a specific community from server B, and then server A will be informed about changes in that community.

But on lemmy it's possible to look at the posts of all communities. For a single concrete community it would be relatively easy: server A gets the request to serve the top post of a community on server B, so A simple asks B for the posts.

But there is also the "posts from all communities" tab on the lemmy front page. This opens questions:

Does each lemmy instance has a full copy of all posts of all communities? If this is true: How are new Instances discovered? Is each Instance distributing all updates to all other Instances?

If each lemmy instance has only a partial dataset (this theory is backed by [1] "Only if a least one user on your instance subscribes to the remote community, will the community send updates to your instance.") then how is the "all posts" view composed? is it in reality not "all" but only "all posts that at least one user of this instance is subscribed to"?

If this is the case: what happens if a bad actor subscribes to all communities of all servers? Is there a maximum number of subscriptions per user?

The source of those questions is, that I'm looking for a way to subscribe to all events of all lemmy instances, to be able to build statistics about upvotes, new posts, comments etc. There seems to be a similar API endpoint for mastadon [2] but nothing for lemmy?!

you are viewing a single comment's thread
view the rest of the comments
[–] ericjmorey@programming.dev 9 points 1 day ago* (last edited 1 day ago) (1 children)

If each lemmy instance has only a partial dataset

You can stop saying if. It is nearly certain that any instance only has a partial dataset in the same way that a search engine only indexes a partial dataset of every web page.

If this is the case: what happens if a bad actor subscribes to all communities of all servers?

There are bots that were built to do exactly that. I wouldn't call them bad actors unless the instance owner prohibited such actions.

[–] 7EP6vuI@feddit.org 3 points 1 day ago (1 children)

so the instances only save the metadata/title of federated posts, but when a user wants to see the comments or content, then the other instances are queried for more details?

what are the bots good for?

[–] andrew_s@piefed.social 5 points 1 day ago (1 children)

is it in reality not “all” but only “all posts that at least one user of this instance is subscribed to”?

Exactly this, yes. Not literally 'all' (a brand new instance would have nothing in its All feed). This is what was meant by 'partial data set' - everything for a subscribed community (from the moment it was subscribed to), but nothing for a community that no-one's subscribed to.

Some instances run bots to populated their All feed more than what would happen naturally (with the idea being that the bot unsubscribes when a human does)

[–] 7EP6vuI@feddit.org 3 points 1 day ago (1 children)

interesting. thanks.

so this would mean that if i wanted to receive an event for each upvote/comment/post in the lemmy fediverse i would have to create my own instance in the ActivityPub space, subscribe to all communities (there is no such single wildcard call (?), so i would have to subscribe to all ~30k communities each by its own and also watch for new communities) and then i could utilize the ActivityPub protocol as instance feed me with their events?

there are currently about 600 instances and 30k communities, but only ~2k communities have more than 600 subscribers (according to [0]). does this mean that those bots only subscribe to communities above a certain threshold?

[–] andrew_s@piefed.social 4 points 1 day ago

Yeah. There's no wildcard call. One thing you could do to script it would be pull JSONs from https://data.lemmyverse.net - use one for the initial effort, then subsequent ones to track new communities. You'd definitely want to filter it - as you've noticed the vast majority of that 30k are dead or spam or something you wouldn't want for one reason or another (e.g. communities from instances you've defederated from).

As for what bots do, it depends on how they were programmed I suppose. There's a bonkers one on https://leaf.dance that just seems to crawl comments and subscribe to any ! links it finds, but there are others (I can't remember their names) where it's more of a manual job (the mods of a community submit the details to it).