The closest that I know is distillation, you can google to get few resources (e.g. https://huggingface.co/papers/2306.08543). I don't know if it is what you are looking for
this post was submitted on 17 Jun 2023
12 points (92.9% liked)
LocalLLaMA
2249 readers
1 users here now
Community to discuss about LLaMA, the large language model created by Meta AI.
This is intended to be a replacement for r/LocalLLaMA on Reddit.
founded 1 year ago
MODERATORS
I don't know about that, but you could try GGML (llama.cpp). It has quantization up to 2-bits so that might be small enough.