Given how quickly things evolve, it’s easy to get lost in the numerous offerings and hard to get the best deal. So, what do you use? Both clients/harnesses and LLM providers or local setups would be interesting.

Personally, I’ve been using opencode with Github copilot for work. I’m currently looking for cost-effective provider for personal work. Maybe openrouter with one of the cheap models?

  • Mike Wooskey@lemmy.thewooskeys.com
    link
    fedilink
    English
    arrow-up
    4
    ·
    12 hours ago

    AMD Ryzen 9 9950X CPU and AMD radeon pro w7900 (48GB vram). I get 55tps output pretty consistently, but ingesting context starts around 1500tps and if context size reaches, say, 50K, tps drops to around 200tps. I often have to wait a bit, but it’s a price I’m happy to pay for local-only AI