Wikipedia is one of the last genuine places on the Internet, and these rat bastards are trying to contaminate that, too

destructdisc@lemmy.world · 4 days ago

Wikipedia is one of the last genuine places on the Internet, and these rat bastards are trying to contaminate that, too

CileTheSane@lemmy.ca · edit-2 3 days ago

“just tell your LLM not to do that”

You ever ask an LLM to modify a picture and “don’t change anything else”? It’s going to change other things.

Case in point: https://youtu.be/XnWOVQ7Gtzw

MML@sh.itjust.works · 3 days ago

That’s why you always add “and no mistakes”

madjo@feddit.nl · 3 days ago

Also “don’t hallucinate”

bless@lemmy.ml · 3 days ago

And “don’t become self arrest”

Kuinox@lemmy.world · 3 days ago

You are mixing two kind of AI, LLM and diffusion.
It’s way harder for a diffusion model to not change the rest, the first step of a diffusion model is to use a lossy compression to transform the picture into a soup of digits that the diffusion model can understand.

CileTheSane@lemmy.ca · 2 days ago

And an LLM will convert a prompt into a bunch of tokens the model can understand.

Kuinox@lemmy.world · 2 days ago

Tokens are a lossless conversion, you can convert it back to the original text.

CileTheSane@lemmy.ca · 1 day ago

This isn’t about saying “return the original text” this is about assuming LLMs understand language, and they don’t. Telling an LLM “don’t do these things” will be as effective as telling it “don’t hallucinate” or asking it "how many 'r’s in ‘strawberry’.

Kuinox@lemmy.world · 1 day ago

In order to make such affirmation or infirmation we’ll need to define understanding.
The example you gave can be explained by other way than “it doesn’t understand”.

For example, the “how many ‘r’ in strawberry”, LLMs see tokens, and the dataset they use, doesn’t contain a lot of data about the letters that are present in a token.

ZDL@lazysoci.al · 3 days ago

What is wrong in the techbrodude head that makes them only think of ruining things? Like it seems to me that they literally spend their days looking at things that are good and saying “what can I do to fuck this up for a profit?”

Should being a techie go into the DSM-V as a subheading under narcissistic personality disorder?

ThunderQueen@lemmy.world · 2 days ago

Gotta disrupt to peak bro. Just one more app bro

Kaz@lemmy.org · 3 days ago

These fuckin AI “enthusiasts” are just making the rest of the world hate AI more.

Losers who cant achieve anything without AI are just going to keep doing this shit.

thedeadwalking4242@lemmy.world · 3 days ago

Fr if they just let it go instead of forcing it on everyone people might even be enthusiastic.

melsaskca@lemmy.ca · 3 days ago

Finally! Now all of the “Scientology” histories will be safe! /s

markstos@lemmy.world · 3 days ago

Congrats on inventing what high school students figured out a year ago to skirt AI homework detectors.

Jayjader@jlai.lu · 4 days ago

I really despise how Claude’s creators and users are turning the definition of “skill” from “the ability to use [learned] knowledge to enhance execution” into “a blurb of text that [usefully] constrains a next-token-predictor”.

I guess, if you squint, it’s akin to how biologists will talk about species “evolving to fit a niche” amongst themselves or how physicists will talk about nature “abhorring a vacuum”. At least they aren’t talking about a fucking product that benefits from hype to get sold.

prole@lemmy.blahaj.zone · 4 days ago

I can’t help but get secondhand embarrassment whenever I see someone unironically call themselves a “prompt engineer”. 🤮

moonshadow@slrpnk.net · 3 days ago

Sloperator

captainlezbian@lemmy.world · 4 days ago

Hey, they had to learn thermodynamics and spend 3 semesters in calculus to write those prompts

m4xie@lemmy.ca · 3 days ago

I’m a terrible procrastinator engineer.

OctopusNemeses@lemmy.world · 4 days ago

Isn’t this a thing that authoritarians do. They co-opt language. It’s the same thing conservatives do. The venn diagram of tech bros and the far right is too close to being a circle.

You can pretty put any word out of the dictionary into a search engine and the first results are some tech company that took the word either as their company name or redefined it into some buzzword.

chuckleslord@lemmy.world · 4 days ago

Skills were functions/frameworks built for Alexa, so they just appropriated the term from there.

Phoenix3875@lemmy.world · 4 days ago

You do understand this is more akin to white hat testing, right?

Those who want to exploit this will do it anyway, except they won’t publish the result. By making the exploit public, the risk will be known if not mitigated.

unepelle@mander.xyz · edit-2 4 days ago

I’m admittedly not knowledgeable in White Hat Hacking, but are you supposed to publicize the vulnerability, release a shortcut to exploit it telling people to ‘enjoy’, or even call the vulnerability handy ?

teft@piefed.social · 4 days ago

Responsible disclosure is what a white hat does. You report the bug to whomever is the party responsible for patching and give them time to fix it.

PlexSheep@infosec.pub · 4 days ago

That sort of depends on the situation. Responsible disclosure is for if there is some relevant security hole that is an actual risk to businesses and people, while this here is just “haha look LLMs can now better pretend to write good text if you tell it to”. That’s not really responsible disclosurable. It’s not even specific to one singular product.

FooBarrington@lemmy.world · 4 days ago

Considering the “vulnerability” here is on the level of “don’t use password as your password” - yeah, releasing it all is exactly the right step.

JackBinimbul@lemmy.blahaj.zone · 3 days ago

I am so goddamned tired of AI being shoved into every collective orifice of our society.

cheesybuddha@lemmy.world · 3 days ago

So they are using AI to make it so AI can’t detect that they are using AI?

What kind of technological ouroborous of nonsense is this?

DacoTaco@lemmy.world · 3 days ago

It gets better. Using llm’s to check if the output of an llm is hallucinated or not! They call it a judge and its funny as hell tbh

Bakkoda@lemmy.world · 3 days ago

Magic

dumbass@piefed.social · 4 days ago

Wikipedia is one of the last genuine places on the Internet, and these rat bastards are trying to contaminate that, too

Wikipedia just sold the rights to use Wikipedia for AI training to Microsoft and openai…

udon@lemmy.world · 4 days ago

How exactly does that work? Wikipedia does not “own” the content on the website, it’s all CC-BY licensed.

Willow@lemmy.blahaj.zone · 4 days ago

The BY term is not respected by LLMs

udon@lemmy.world · 4 days ago

So? Still doesn’t make sense to me that wikipedia can sell anything meaningful here, but I’m also not a lawyer. Do they promise not to sue them or sell them some guarantee that contributors also can’t sue them? Is it just some symbolic PR washing?

technocrit@lemmy.dbzer0.com · 3 days ago

Yeah, they’re selling the work of others. That’s how the site always worked. This venture into “AI” is nothing new.

Alcoholicorn@mander.xyz · 4 days ago

Why? Wikipedia has like a decade of operating expenses on hand, so they don’t need the money

surewhynotlem@lemmy.world · 4 days ago

This number inflates every time I read it. First it was ten years of hosting cost. Then it’s operating costs. Soon it will be ten years of the entire US GDP.

I’d believe they have ten years of hosting costs on hand.

My quick googling says they have 170m in assets and all 180m in annual operating costs. Give or take.

buddascrayon@lemmy.world · 4 days ago

I just love how people just shit “facts” out of their ass while citing zero sources and people will just believe them and upvote because it confirms their bias.

LadyMeow@lemmy.blahaj.zone · 4 days ago

Greed? It’s probably greed.

green_red_black@slrpnk.net · 4 days ago

It’s a non-profit foundation with the majority being volunteers. If greed was the case one then would have to ask is why not just go ahead and inject ads

LadyMeow@lemmy.blahaj.zone · 4 days ago

OK then why sell data right to m$?

green_red_black@slrpnk.net · 4 days ago

Well as mentioned Wikipedia seems to be in the red and not making enough donations to pay for the expenses. So maybe the foundation is thinking it would help with the deficit.

Also chances are Microsoft will instruct Co-Pilot to prioritize Wikipedia whenever it scours the internet for information.

Think it like that eye rolling Google paying Firefox to be the default search engine deal.

technocrit@lemmy.dbzer0.com · 3 days ago

Well as mentioned Wikipedia seems to be in the red

They keep saying that… at least when they’re asking for more money.

LadyMeow@lemmy.blahaj.zone · 4 days ago

Is wiki in the red? Unclear, omi mean they ask for money donations, but someone in this thread claims they are set for a decade, I’ve seen people post something about how they are fine, and even donate a bunch themselves. I don’t know, and I guess it doesn’t matter.

Not sure where you are going with your second comment, and uninterested in engaging with your comparison as I don’t think it’s very good

green_red_black@slrpnk.net · 4 days ago

I am referring to the reply comment from surewhynotlem. They say that cost is 180 million while Wikipedia has 170 million on hand. That is a 10 million deficit.

While probably not enough to shut down the site it is still operating in the red.

Where I was going is explaining how it’s possibly not greed. Just the foundation looking for another revenue source that theoretically would not ruin the site.

That alt being a deal that gets Wikipedia more traffic

Fedizen@lemmy.world · edit-2 4 days ago

If microsoft is “buying access to training data” it makes what Open AI is doing look illegal. I would encourage every data broker to sell 'AI training data rights" because it undermines the only real advantage AI has and it helps pave the way to forcing AI companies to comply with open source licenses.

Essentially selling ai data rights is a trojan horse for the AI companies. Obviously it would be better to pass laws but until that happens this is imo a better strategy than doing nothing.

LadyMeow@lemmy.blahaj.zone · 4 days ago

I mean, what open ai is doing and did should be illegal if it’s not, in my opinion.

artyom@piefed.social · 4 days ago

I mean it’s free money, why not?

Avid Amoeba@lemmy.ca · 3 days ago

From the repo:

Have opinions. Don’t just report facts - react to them. “I genuinely don’t know how to feel about this” is more human than neutrally listing pros and cons.

JcbAzPx@lemmy.world · 3 days ago

That will at least be easy to spot in a Wikipedia entry.

durindana@lemmy.zip · 3 days ago

lol brilliant

udon@lemmy.world · 4 days ago

If these “signs of AI writing” are merely linguistic, good for them. This is as accurate as a lie detector (i.e., not accurate) and nobody should use this for any real world decision-making.

The real signs of AI writing are not as easy to fix as just instructing an LLM to “read” an article to avoid them.

As a teacher, all of my grading is now based on in person performances, no tech allowed. Good luck faking that with an LLM. I do not mind if students use an LLM to better prepare for class and exams. But my impression so far is that any other medium (e.g., books, youtube explanation videos) leads to better results.

Randelung@lemmy.world · 4 days ago

I sucked in oral exams and therefore hated them. Then again, if they had been mixed into regular school, it might not have sucked so much.

prole@lemmy.blahaj.zone · 4 days ago

Doesn’t need to be oral, I remember occasionally having exams that were essay questions that needed to be answered in class.

udon@lemmy.world · 4 days ago

I do both of these as well as smaller but more frequent tests, group work, project work over several sessions etc… The only things I stopped doing are reports to write at home, paper summaries etc. Doesn’t make sense anymore.

DFX4509B@lemmy.wtf · 4 days ago

Download an offline copy while you still can.

☂️-@lemmy.ml · 4 days ago

what are the best ways to do it while still having practical access?

DFX4509B@lemmy.wtf · edit-2 4 days ago

Here’s a link to the Kiwix library download for all of Wikipedia. It’s 111GB though, so you’ll need a lot of space and also a lot of time to wait for it to download.

Note, you’ll also need Kiwix in some manner to read the zim file once it’s downloaded.

Kiwix library - All of Wikipedia - direct download link

Kiwix app download page

But this’ll let you have a local copy you can reference should actual Wikipedia ever get ruined by GenAI, or worse, get taken down by hostile governments.

☂️-@lemmy.ml · 3 days ago

sounding like a great app for my server!

minorkeys@lemmy.world · edit-2 3 days ago

It’s an arms race, AI identification vs AI adaptation. I wonder which side the companies that own these LLMs want to win…

elfin8er@lemmy.world · 3 days ago

They don’t want anyone to win. The arms race makes money.

snoons@lemmy.ca · 4 days ago

Fuck you, Siqi Chen.