So the research is out and these LLMs will always be vunerable to poisoned data. That means it will always be worth out time and effort to poison these models and they will never be reliable.

  • supersquirrel@sopuli.xyz
    link
    fedilink
    arrow-up
    46
    ·
    edit-2
    3 months ago

    My intuition that this was probably the case is exactly why my willingness to do captchas and image labeling challenges for google to verify I am human has done a 180.

    I love “helping” when I can now!

    When they ask me to label a bicycle or stairs I get real creative… well mostly not but enough of the time I do… oh well silly me what is important is I still pass the test!

    • DoGeeseSeeGod@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      13
      ·
      3 months ago

      Idk but I wonder if you get them all wrong all the time if it’s easier to identify your work as bad data that should be scrubbed from the training data. Would a better strategy be to get most right and some wrong so you appear as normal user

    • ragas@lemmy.ml
      link
      fedilink
      arrow-up
      7
      ·
      3 months ago

      Most people seem to just half-brain the challenges anyway. So on images where its easy to confuse something, the tests will often refuse you unless you put in the wrong answer, just like everybody else.

    • hexagonwin@lemmy.sdf.org
      link
      fedilink
      arrow-up
      4
      ·
      3 months ago

      nah they’re probably past that stage already. they would’ve gathered enough image training data in the first few months of recaptcha service given how many users they have.