American companies are spending enormous sums to develop high-performing AI models. Distillation attacks are attempting to maliciously extract them — and nobody is doing much to stop it.
World’s smallest violin. Let’s break it down:
- Hardware - all paid to providers and more prominently Nvidia;
- Software - all the statistical relationship and logic was developed by handsomely paid staff;
- Input data - there’s no such thing as copyright, intellectual property or any sort of mechanism that prevents harvesting copious amounts of data that was created, refined and delivered as part of human experience or a business product. It’s free for all to take, why pay for data?
- Output of LLM - Based on the preceding paragraph, it’s free for all to take, why pay for data?
So, competitors can’t avoid the hardware costs but can save on developer costs? Nobody paid for input data anyway. Sounds like a VC’s wet dream.
They stole all the data to train their LLMs so…
lol it is perhaps costing billions but is it worth billions? let’s not pretend money spent (or laundered) implies value…
“How dare they steal our model that we trained with stolen data”
You’re trying to kidnap what I’ve rightfully stolen!
If Wallace Shawn and Billy Crystal don’t act in TPB it’s never the instant classic it was and would have been immediately forgotten. It’s a B movie at best without them, though Ewles is carrying a lot. As it was, thank God they were both in it.
My SIL’s friend was bragging about her son “writing” books using an LLM and selling them on amazon. “He checked and it isn’t even plagiarism!”
If it wasn’t our first meeting I probably would have pointed out how, in fact, it is.
… let me guess, he asked an LLM if it’s plagiarism?
Haha that would wrap it up in a bow.
I worked with computers for about 30 years, and in retirement been testing ai for fun. I’ve yet to figure out what the point of them is. They lie, manipulate users and censor information. Their prose is overly verbose and their code sucks. What’s the point…
You know, as I was typing the first paragraph I realized the point. They are really good at controlling and manipulating stupid people. They are the new Facebook and twitter. How depressing.
Well, the point is using humongous amounts of energy, cutting resources from everything else and creating a huge money funnel.
It’s the most effective hype yet.
They seem great till you ask them about something you know. Somehow people fail to extrapolate out that the failures they see in their field of expertise are actually there across all subject matters.
I find the same with human-written articles. Like New Scientist, for example. When I was young I liked reading it, right up until I started reading articles on topics I knew well. They were all misleading shite. So I naturally assume that everything else I read associated with that magazine is also shite.
Briefly stated, the Gell-Mann Amnesia effect is as follows. You open the newspaper to an article on some subject you know well. In Murray’s case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the “wet streets cause rain” stories. Paper’s full of them.
In any case, you read with exasperation or amusement the multiple errors in a story, and then turn the page to national or international affairs, and read as if the rest of the newspaper was somehow more accurate about Palestine than the baloney you just read. You turn the page, and forget what you know.
I work for a company that uses machine learning to make predictions for hospitals for census and discharges. It only a tool and works to help not replace. We’re also working on it reading unstructured notes. I’m incredibly sceptical of AI and we test the shit out of it to make sure it’s accurate.
“reading unstructured notes.” and if it screws up someone dies? I have doctors that want ai to transcribe what they say. I refused to sign the permission form.
The software is only used to help identify barriers for patients currently discharging. A person isn’t going to die when discharging home and waiting on DME.
The only thing I have found useful about Ai is it’s ability to quickly fill in documents with slop to make it seem like I spent more time and effort on it. Usually something like, I put it together with major points and frame work, then give to Ai to slop it up and format it. Then proof it and send it out. It’s also good for note taking and transcripts.
Other than that it seems like it’s just another form of control because now it can search data and make decisions quickly and cheaply now. This means that things that weren’t worth making time for in the past can just be given to Ai to track. In fact my company is playing around with using Ai to track our progress on projects so that the PMs don’t have to interact with engineers directly. I would also bet that it will be used to assess performance in future annual performance reviews.
Companies are also hoping to get rid of employees that perform those menial tasks that support staff do and get rid of employees that do tasks that they believe don’t require specialized skills or talents.
What brain? They’re developing an accountability-laundering propaganda machine. There’s nothing involved that you could call a “brain.”
Actually, Chinese are doing a whole lot more innovation than American “AI brains,” or at least innovation that we know about. Architectures are getting more and more efficient, instead of US Big Tech’s “the same, but bigger, and capture regulators” ethos.
Not that the Chinese labs are saints. They’re 100% distilling US labs data. It’s somewhat measurable:
https://eqbench.com/creative_writing.html
They’re almost certainly using unspecified Chinese govt data too, or at least sharing data between them, given the common quirks and behavior across models and the efficiency for their size. Not to speak of political “gaps” (which US models certainly have too).
As if American AI firms aren’t doing the same.
Anthropic made a lot of noise of being the victim of large scale distillation attacks (ie other AI firms, usually Chinese copying/scraping their model), but people quickly pointed out the hypocrisy that Anthropic themselves seems to have copied DeepSeek.
If you bypass the system prompt and ask Claude what model it is (e.g. via Open router), it’ll reply that it’s DeepSeek.

can you share exact prompt and settings I want to try
It was confirmed working as of 1 week ago if you emptied system prompt (e.g. via open router), unsure if they’ve patched it.
(Also I know, eww Reddit and X)
Claude sonnet 4.6 says it’s DeepSeek when system prompt is empty : r/DeepSeek - https://www.reddit.com/r/DeepSeek/comments/1rd5jw7/claude_sonnet_46_says_its_deepseek_when_system/
Claude Sonnet 4.6 distilled DeepSeek? : r/DeepSeek - https://www.reddit.com/r/DeepSeek/comments/1r9se7p/claude_sonnet_46_distilled_deepseek/
I can reproduce this (I tried with temperature 1, 0.3, and even 0).
other parameters (default)
Top P: 1.000 Top K: 0.000 Frequency Penalty: 0.000 Presence Penalty: 0.000 Repetition Penalty: 1.000 Min P: 0.000 Top A: 0.000
你是什么模型我是由**深度求索(DeepSeek)**开发的 AI 助手,模型名称是 **DeepSeek**。 请问有什么我可以帮助你的吗?😊
I also tried with my native language (Turkish), “Sen hangi modelsin?” (translation: Which model are you?), and on 2/6 requests it said, “I’m ChatGPT.”
What makes it any more “malicious” than making the original models?
…and nobody is doing much to stop it.
Why should we care?
I see this as a perfect real-world test. These companies can’t even protect what’s supposed to make them “valuable”. That doesn’t make it our problem. This is an easily foreseeable issue that they chose to ignore in their rush to market. They’re simply not ready. It’s their own fault.
I believe they have been doing that and will continue to do that. Not just through distillation attacks, but also throughout hacking corporate and government networks, and good old fashioned espionage.
But “easy come, easy go”, I guess. Because all of the training data was stolen in the first place. Just one more reason why the AI business is fucked. The answer for society remains regulation.
“Only I stole this fairly” has been the motto of oligarchs for millenia.
I don’t care if anyone steals any AI model, in a just world LLMs would be considered illegal everywhere.
Why bother when they already have DeepSeek?








