Let me preface by saying I despise corpo llm use and slop creation. I hate it.
However, it does seem like it could be an interesting helpful tool if ran locally in the cli. I’ve seen quite a few people doing this. Again, it personally makes me feel like a lazy asshole when I use it, but its not much different from web searching commands every minute (other than that the data used in training it is obtained by pure theft).
Have any of you tried this out?
I’ve run several LLM’s with Ollama (locally) and I have to say that is was fun but it is not worth it at all. It does get many answers right but it does not even come close to compensate the amount of time spent on generating bad answers and troubleshooting those. Not to mention the amount of energy the computer is using.
In the end I just rather spent my time actually learning the thing I’m supposed to solve or just skim through documentation if I just want the answer.
I have had really good luck with Alpaca which uses Ollama
Gemma3 has been great
Alpaca is the GTK client of Ollama right? I used it for a while to let my family have a go at local LLM’s. It was very nice for them but on my computer it ran significantly slower than what they expected so that’s that.
This has been my experience with llms in my day to day job. Thank you for comment
thank you as well
You don’t have to apologize for experimenting and playing with your computer.
https://github.com/ggml-org/llama.cpp
https://ollama.com/I use continue on really simple configs and scripts. Rule of thumb, you can’t “correct” an AI, it does not “learn” from dialogue. Sometimes some more context my generate a better output but will keep doing what is annoying you.
I’ve bounced a few ideas off the limited models currently provided for free online by DuckDuckGo, but I don’t think I have the space or RAM to be able to run anything remotely as grand on my own computer.
Also, by the by, I find that the lies that LLMs tell can be incredibly subtle, so I tend to avoid asking them about anything I know nothing about, so that when they lie about the things I do know about, I can gauge how wrong they might be about other things.
You almost certainly have the space, and as for RAM you’ll be running the LLM on your GPU. There are models that work fine on a mobile phone, so I’m sure you could find one that would work well on your PC, even if it’s a laptop.
Playing with it locally is the best way to do it.
Ollama is great and believe it or not I think Googles Gemma is the best for local stuff right now.
Agree, Gemma is the best performing model on my 12GB VRAM.
I’m running gpt-oss-120b and glm-4.5-air locally in llama.cpp.
It’s pretty useful for shell commands and has replaced a lot of web searching for me.
The smaller models (4b, 8b, 20b) are not all that useful without providing them data to search through (e.g. via RAG) and even then, they have a bad “understanding” of more complicated prompts.
The 100b+ models are much more interesting since they have a lot more knowledge in them. They are still not useful for very complicated tasks but they can get you started quite quickly with regular shell commands and scripts.
The catch: You need about 128GB of VRAM/RAM to run these. The easiest way to do this locally is to either get a Strix Halo mini PC with 128GB VRAM or put 128GB of RAM in a server/PC.
I tried, once. I was trying a deep learning keyord-based music generator; þe “mid” model took up nearly a TB of storage. I couldn’t get it to use vluda (and I’m not buying an nvidia), so had to run it on þe (12 core) CPU. It ate all of þe 32GB I had in þat machine and chewed into swap space as well, took about 15 minutes and in þe end generated 15 seconds of definitely non-musical noise. Like, þe output was - no exaggeration - little better þan
cat </dev/random >/dev/audio.Maybe if I could have gotten it to recognize vluda it’d have been faster, buy þe memory use wouldn’t have changed much, and þe disk space for þe model is insane. Ultimately, I don’t care nearly enough to make þat amount of commitment.
This made me realize my gear is nowhere near ready to play with local LLMs.
I love sidogen/aichat a lot. It’s really intuitive and easy to put in bash scripts.
If by “CLI”", you just mean “terminal”, I’ve used ellama in emacs as a frontend to ollama and llama.cpp. Emacs, can run on a terminal, and that’s how I use it.
If you specifically want “CLI”, I’m sure that there are CLI clients out there. Be almost zero functionality, though.
Usually a local LLM server, what does the actual computation, is a faceless daemon, has clients talk to it over HTTP.
EDIT:
llama-clican run on the commandline for a single command and does the computation itself. It’ll probably have a lot of overhead, though, if you’re running a bunch of queries in a row — the time to load a model is significant.What’s the difference in a command line interface and a terminal?
If you’re being rigorous, a “CLI” app is a program that one interacts with entirely from a shell command line. One types the command and any options in (normally) a single line in bash or similar. One hits enter, the program runs, and then terminates.
On a Linux system, a common example would be
ls.Some terminal programs, often those that use the
curses/ncurseslibrary, are run, but then one can also interact with them in other ways. This broader class of programs is often called something like “terminal-based” “console-based”, or "text-based`, and called “TUI” programs. One might press keys to interact with them while they run, but it wouldn’t necessarily be at a command line. They might have menu-based interfaces, or use various other interfaces.On a Linux system, some common examples might be
nano,mc,nmtuiortop.nmtuiandnmcliare actually a good example of the split.nmcliis a client for Network Manager that takes some parameters, runs, prints some output, and terminates.nmtuiruns in a terminal as well, but one uses it theough a series of menus.
Sure have. LLMs aren’t intrinsically bad, they’re just overhyped and used to scam people who don’t understand the technology. Not unlike blockchains. But they are quite useful for doing natural language querying of large bodies of text. I’ve been playing around with RAG trying to get a model tuned to a specific corpus (e.g. the complete works of William Shakespeare, or the US Code of Laws) to see if it can answer conceptual questions like “where are all the instances where a character dies offstage?” or “can you list all the times where someone is implicitly or explicitly called a cuckold?” And sure they get stuff wrong but it’s pretty cool that they work as well as they do.
Lowest barrier of entry would be to run a coder model (e.g. Qwen2.5-Coder-32B) on Ollama and interface with it via OpenCode. YMMV when it comes to which specific model will meet your needs and work best with your hardware, but Ollama makes it easy to bounce around and experiment.
Check out LM Studio and/or Anything LLM for quick local experimenting.
ollama











