Given how quickly things evolve, it’s easy to get lost in the numerous offerings and hard to get the best deal. So, what do you use? Both clients/harnesses and LLM providers or local setups would be interesting.

Personally, I’ve been using opencode with Github copilot for work. I’m currently looking for cost-effective provider for personal work. Maybe openrouter with one of the cheap models?

  • Mike Wooskey@lemmy.thewooskeys.com
    link
    fedilink
    English
    arrow-up
    12
    ·
    14 hours ago

    I use opencode with locally-hosted llama.cpp - usually with qwen3.6-35b-a3b.

    I tried opencode go for a couple month, and its definitely nice to have an lln runner with more gram and more GPUs, but I prefer to have all my stuff local whenever it’s possible. Also, I’d use up my token allotments fairly quickly on opencode go.

    I also tried opentouter and it, too, was great - many more models. But I exhausted by credits even quicker than opencode go, and its also not local.

      • Mike Wooskey@lemmy.thewooskeys.com
        link
        fedilink
        English
        arrow-up
        5
        ·
        14 hours ago

        AMD Ryzen 9 9950X CPU and AMD radeon pro w7900 (48GB vram). I get 55tps output pretty consistently, but ingesting context starts around 1500tps and if context size reaches, say, 50K, tps drops to around 200tps. I often have to wait a bit, but it’s a price I’m happy to pay for local-only AI