Weirder in that it gets better at “photorealism” (textures, etc) but subjects might be nonsensical. Only teaching it how to avoid automated detection will not teach it to understand what scenes mean.
I believe most image generating models are too small (like only 4GB RAM). Deepseek R1 is 1.5TB ram (or half or quarter that at reduced precision) to get some semblance of “general knowledge”. So to get the “semantics” of an image right, not just the “syntax” you’d need bigger models and probably more data describing images. Of course, do we really want that?
Not necessarily, but errors would be less obvious or weirder since it would spend more time in training
Weirder? Interesting, like how for example?
Weirder in that it gets better at “photorealism” (textures, etc) but subjects might be nonsensical. Only teaching it how to avoid automated detection will not teach it to understand what scenes mean.
I believe most image generating models are too small (like only 4GB RAM). Deepseek R1 is 1.5TB ram (or half or quarter that at reduced precision) to get some semblance of “general knowledge”. So to get the “semantics” of an image right, not just the “syntax” you’d need bigger models and probably more data describing images. Of course, do we really want that?