AI and legal experts told the FT this “memorization” ability could have serious ramifications on AI groups’ battle against dozens of copyright lawsuits around the world, as it undermines their core defense that LLMs “learn” from copyrighted works but do not store copies.
Sam Altman would like to remind you each Old Lady at a Library consume 284 cubic feet of Oxygen a day from the air.
Also, hey at least they made sure to probably destroy the physical copy they ripped into their hopelessly fragmented CorpoNapster fever dream, the law is the law.



This, and lossy compression is exactly right.
Alternatively, it’s a decomposition of a big matrix (think very large excel) wherein each cell is a probability you observe every other word (really its tokens of course but for sake of argument) given that you’ve observed other words. Like, you could literally make a transformer in excel. It wouldn’t run, but that’s excels fault, not the math.
Aside: but I’m pretty sure distributing a lossy compression and decompression algorithm is distribution, and charging for it is also there. Realistically if this is allowed, anyone should be able to pirate anything for any reason legally as long as it’s passed through a lossy compression and decompression first.
Yeah, there isnt much of a difference as far as how the data is transformed between your pirating case and and the case of an ai providing copywritten material. It really is only because they treat it like an artificial person that they are able to convince people it should be allowed.
The kick in the teeth is, if I charged people for me to recite a copywritten novel, that I memorized but dont have the explicit permission to use, I’d be sued. There really is no way to argue this should be allowed that doesnt immediately fall apart if you pull it apart even a little.