I’m not a super hacker or expert coder or tech genius, but I’m also not a total beginner. I’m good with getting my hands a bit dirty to figure things out.
I have done a few things regarding LLMs on in my Job (RAG, fine-tuning, vectorizing, chunking and implementation)
And to say it: You and even I can’t. No single individual has the capabilities of training an AI model. You just need too much data and too much power. Organizations, countries or companies have them, but not you or me.
What we can do is using existing model offline and locally, creating RAGs or fine–tune them. But for this you need¹ a deep tech understanding and good understanding of machine– and deep-learning as well as about LLMs in general.
If you want to learn you can and if you want I can send you some links to start.
Now what you normally do is installing Ollama on a system that has a good GPU or NPU.
I would go with an RTX 4070 and higher. If you have such GPU you can start with local AI.
If you want, you also can build your own server. But please don’t follow the guides on YouTube. There are many people that use very overkill and shitty setups.
Two Nvidia Tesla 4 (GPU for server) are a good start, and you can find a T4 used for around 800€.
1: I’m talking about fine-tuning or RAGs not about using them locally.
If you need to choose a model it depends on what you want to do. I would go with: Deepseek-R1 (12B), mixtral-nemo or LLama3.2.