Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

  • usernamesAreTricky@lemmy.ml
    link
    fedilink
    arrow-up
    33
    arrow-down
    1
    ·
    15 days ago

    Linked article in the body suggests that likely wouldn’t have made a difference anyway

    The scrapers ignored common web protocols that site owners use to block automated scraping, including “robots.txt” which is a text file placed on websites aimed at preventing the indexing of context

    • mesa@piefed.social
      link
      fedilink
      English
      arrow-up
      23
      ·
      edit-2
      15 days ago

      Yeah ive seen the argument in blog posts that since they are not search engines they dont need to respect robots.txt. Its really stupid.