Given how quickly things evolve, it’s easy to get lost in the numerous offerings and hard to get the best deal. So, what do you use? Both clients/harnesses and LLM providers or local setups would be interesting.
Personally, I’ve been using opencode with Github copilot for work. I’m currently looking for cost-effective provider for personal work. Maybe openrouter with one of the cheap models?


I use opencode with locally-hosted llama.cpp - usually with qwen3.6-35b-a3b.
I tried opencode go for a couple month, and its definitely nice to have an lln runner with more gram and more GPUs, but I prefer to have all my stuff local whenever it’s possible. Also, I’d use up my token allotments fairly quickly on opencode go.
I also tried opentouter and it, too, was great - many more models. But I exhausted by credits even quicker than opencode go, and its also not local.
What hardware do you use? How fast is it?
AMD Ryzen 9 9950X CPU and AMD radeon pro w7900 (48GB vram). I get 55tps output pretty consistently, but ingesting context starts around 1500tps and if context size reaches, say, 50K, tps drops to around 200tps. I often have to wait a bit, but it’s a price I’m happy to pay for local-only AI
Thank you!