

In the Office’s view, training a generative AI foundation model on a large and diverse dataset will often be transformative. The process converts a massive collection of training examples into a statistical model that can generate a wide range of outputs across a diverse array of new situations. It is hard to compare individual works in the training data—for example, copies of The Big Sleep in various languages—with a resulting language model capable of translating emails, correcting grammar, or answering natural language questions about 20th-century literature, without perceiving a transformation.
You can read the whole doc. The part above is cherry picked. I haven’t read through the whole thing but at a glance, the doc basically explains how it depends. If the model is trained specifically to output one piece content, it wouldn’t be acceptable.
The waters are muddy but holy fuck does taking the copyright juggernauts side sound bloody stupid.













It’s a good thing if you are smart enough to understand that AI isn’t going away. Universal bought udio, the “legal” variant of the dataset will be used to train models, only they will be closed source, censored and come with a ToS that gives all the rights from the generated music to the record companies from the get go.
At least this gives open source a chance.