• 1 Post
  • 67 Comments
Joined 5 months ago
cake
Cake day: March 22nd, 2024

help-circle

















  • 8GB or 4GB?

    Yeah you should get kobold.cpp’s rocm fork working if you can manage it, otherwise use their vulkan build.

    llama 8b at shorter context is probably good for your machine, as it can fit on the 8GB GPU at shorter context, or at least be partially offloaded if its a 4GB one.

    I wouldn’t recommend deepseek for your machine. It’s a better fit for older CPUs, as it’s not as smart as llama 8B, and its bigger than llama 8B, but it just runs super fast because its an MoE.


  • Oh I got you mixed up with the other commenter, apologies.

    I’m not sure when llama 8b starts to degrade at long context, but I wanna say its well before 128K, and where other “long context” models start to look much more attractive depending on the task. Right now I am testing Amazon’s mistral finetune, and it seems to be much better than Nemo or llama 3.1 out there.