@[email protected] Can you share more details on installing it? Are you using SGLang or vLLM or something else? What kind of hardware do you have that can fit the 600B model? What is your inference tok/s?
That’s why I wanted to confirm what you are using lol. Some people on Reddit were claiming the full thing, when run locally, has very little censorship. It sounds somewhat plausible since the web version only censors content after they’re generated.
@[email protected] Wrong community for this kind of post.
@[email protected] Can you share more details on installing it? Are you using SGLang or vLLM or something else? What kind of hardware do you have that can fit the 600B model? What is your inference tok/s?
I’m using Ollama, a single GPU with 10Gb of VRAM
You’re probably running one of the distillations then, not the full thing?
What’s the difference? Does the full thing not have censorship?
That’s why I wanted to confirm what you are using lol. Some people on Reddit were claiming the full thing, when run locally, has very little censorship. It sounds somewhat plausible since the web version only censors content after they’re generated.