Anyone messed with local cli LLMs?

bridgeenjoyer@sh.itjust.works · 2 days ago

Anyone messed with local cli LLMs?

Possibly linux@lemmy.zip · 12 hours ago

https://flathub.org/en/apps/com.jeffser.Alpaca

alecsargent@lemmy.zip · edit-2 1 day ago

I’ve run several LLM’s with Ollama (locally) and I have to say that is was fun but it is not worth it at all. It does get many answers right but it does not even come close to compensate the amount of time spent on generating bad answers and troubleshooting those. Not to mention the amount of energy the computer is using.

In the end I just rather spent my time actually learning the thing I’m supposed to solve or just skim through documentation if I just want the answer.

Possibly linux@lemmy.zip · 12 hours ago

I have had really good luck with Alpaca which uses Ollama

Gemma3 has been great

alecsargent@lemmy.zip · 1 hour ago

Alpaca is the GTK client of Ollama right? I used it for a while to let my family have a go at local LLM’s. It was very nice for them but on my computer it ran significantly slower than what they expected so that’s that.

bridgeenjoyer@sh.itjust.works · 2 days ago

This has been my experience with llms in my day to day job. Thank you for comment

alecsargent@lemmy.zip · 1 day ago

thank you as well

drkt@scribe.disroot.org · 2 days ago

You don’t have to apologize for experimenting and playing with your computer.
https://github.com/ggml-org/llama.cpp
https://ollama.com/

fruitycoder@sh.itjust.works · 1 day ago

I use continue on really simple configs and scripts. Rule of thumb, you can’t “correct” an AI, it does not “learn” from dialogue. Sometimes some more context my generate a better output but will keep doing what is annoying you.

palordrolap@fedia.io · 2 days ago

I’ve bounced a few ideas off the limited models currently provided for free online by DuckDuckGo, but I don’t think I have the space or RAM to be able to run anything remotely as grand on my own computer.

Also, by the by, I find that the lies that LLMs tell can be incredibly subtle, so I tend to avoid asking them about anything I know nothing about, so that when they lie about the things I do know about, I can gauge how wrong they might be about other things.

_cryptagion [he/him]@anarchist.nexus · 2 days ago

You almost certainly have the space, and as for RAM you’ll be running the LLM on your GPU. There are models that work fine on a mobile phone, so I’m sure you could find one that would work well on your PC, even if it’s a laptop.

nagaram@startrek.website · 2 days ago

Playing with it locally is the best way to do it.

Ollama is great and believe it or not I think Googles Gemma is the best for local stuff right now.

harmbugler@piefed.social · 18 hours ago

Agree, Gemma is the best performing model on my 12GB VRAM.

Domi@lemmy.secnd.me · 2 days ago

I’m running gpt-oss-120b and glm-4.5-air locally in llama.cpp.

It’s pretty useful for shell commands and has replaced a lot of web searching for me.

The smaller models (4b, 8b, 20b) are not all that useful without providing them data to search through (e.g. via RAG) and even then, they have a bad “understanding” of more complicated prompts.

The 100b+ models are much more interesting since they have a lot more knowledge in them. They are still not useful for very complicated tasks but they can get you started quite quickly with regular shell commands and scripts.

The catch: You need about 128GB of VRAM/RAM to run these. The easiest way to do this locally is to either get a Strix Halo mini PC with 128GB VRAM or put 128GB of RAM in a server/PC.

Ŝan@piefed.zip · 1 day ago

I tried, once. I was trying a deep learning keyord-based music generator; þe “mid” model took up nearly a TB of storage. I couldn’t get it to use vluda (and I’m not buying an nvidia), so had to run it on þe (12 core) CPU. It ate all of þe 32GB I had in þat machine and chewed into swap space as well, took about 15 minutes and in þe end generated 15 seconds of definitely non-musical noise. Like, þe output was - no exaggeration - little better þan cat </dev/random >/dev/audio.

Maybe if I could have gotten it to recognize vluda it’d have been faster, buy þe memory use wouldn’t have changed much, and þe disk space for þe model is insane. Ultimately, I don’t care nearly enough to make þat amount of commitment.

shalafi@lemmy.world · 11 hours ago

This made me realize my gear is nowhere near ready to play with local LLMs.

danhab99@programming.dev · 2 days ago

I love sidogen/aichat a lot. It’s really intuitive and easy to put in bash scripts.

tal@lemmy.today · 2 days ago

If by “CLI”", you just mean “terminal”, I’ve used ellama in emacs as a frontend to ollama and llama.cpp. Emacs, can run on a terminal, and that’s how I use it.

If you specifically want “CLI”, I’m sure that there are CLI clients out there. Be almost zero functionality, though.

Usually a local LLM server, what does the actual computation, is a faceless daemon, has clients talk to it over HTTP.

EDIT: llama-cli can run on the commandline for a single command and does the computation itself. It’ll probably have a lot of overhead, though, if you’re running a bunch of queries in a row — the time to load a model is significant.

shalafi@lemmy.world · 11 hours ago

What’s the difference in a command line interface and a terminal?

tal@lemmy.today · 1 hour ago

If you’re being rigorous, a “CLI” app is a program that one interacts with entirely from a shell command line. One types the command and any options in (normally) a single line in bash or similar. One hits enter, the program runs, and then terminates.

On a Linux system, a common example would be ls.

Some terminal programs, often those that use the curses/ncurses library, are run, but then one can also interact with them in other ways. This broader class of programs is often called something like “terminal-based” “console-based”, or "text-based`, and called “TUI” programs. One might press keys to interact with them while they run, but it wouldn’t necessarily be at a command line. They might have menu-based interfaces, or use various other interfaces.

On a Linux system, some common examples might be nano, mc, nmtui or top.

nmtui and nmcli are actually a good example of the split. nmcli is a client for Network Manager that takes some parameters, runs, prints some output, and terminates. nmtui runs in a terminal as well, but one uses it theough a series of menus.

queerlilhayseed@piefed.blahaj.zone · 2 days ago

Sure have. LLMs aren’t intrinsically bad, they’re just overhyped and used to scam people who don’t understand the technology. Not unlike blockchains. But they are quite useful for doing natural language querying of large bodies of text. I’ve been playing around with RAG trying to get a model tuned to a specific corpus (e.g. the complete works of William Shakespeare, or the US Code of Laws) to see if it can answer conceptual questions like “where are all the instances where a character dies offstage?” or “can you list all the times where someone is implicitly or explicitly called a cuckold?” And sure they get stuff wrong but it’s pretty cool that they work as well as they do.

arcayne@lemmy.today · 2 days ago

Lowest barrier of entry would be to run a coder model (e.g. Qwen2.5-Coder-32B) on Ollama and interface with it via OpenCode. YMMV when it comes to which specific model will meet your needs and work best with your hardware, but Ollama makes it easy to bounce around and experiment.

dariusj18@lemmy.world · 2 days ago

Check out LM Studio and/or Anything LLM for quick local experimenting.

Lucy :3@feddit.org · 2 days ago

ollama