cross-posted from: https://lemmy.ml/post/34374544
Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther
Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.
Full article here.
Link to the full leaked list download: Meta leaked list pdf
Its not that hard to block them, I have basically a single user Lemmy and it was constantly getting hammered by meta and anthropic but then I blocked their user agents. They just get endless redirects now.
Beautiful. The thought of all those robots.txt ignoring theft bots running in circles made me smile. Thank you.
sys-admin skills required
Well yes, one would need sys-admin skills to setup and maintain a Lemmy instance in the first place.
I’m happy to assist other admins if needed. Maybe I’ll write up a post about it later.