Hacker Newsnew | past | comments | ask | show | jobs | submit | redrove's commentslogin


Does this have a CLI only interface?

Yes. You could also look at the README.md.

There is virtually no reason to use Ollama over LM Studio or the myriad of other alternatives.

Ollama is slower and they started out as a shameless llama.cpp ripoff without giving credit and now they “ported” it to Go which means they’re just vibe code translating llama.cpp, bugs included.


I really like LM Studio when I can use it under Windows but for people like me with Intel Macs + AMD gpu ollama is the only option because it can leverage the gpu using MoltenVK aka Vulkan, unofficially. We're still testing it, hoping to get the Vulkan support in the main branch soon. It works perfectly for single GPUs but some edge cases when using multiple GPUs are unsupported until upstream support from MoltenVK comes through. But yeah, I agree, it wasn't cool to repackage Georgi's work like that.

>Ollama is slower

I've benchmarked this on an actual Mac Mini M4 with 24 GB of RAM, and averaged 24.4 t/s on Ollama and 19.45 t/s on LM Studio for the same ~10 GB model (gemma4:e4b), a difference which was repeated across three runs and with both models warmed up beforehand. Unless there is an error in my methodology, which is easy to repeat[1], it means Ollama is a full 25% faster. That's an enormous difference. Try it for yourself before making such claims.

[1] script at: https://pastebin.com/EwcRqLUm but it warms up both and keeps them in memory, so you'll want to close almost all other applications first. Install both ollama and LM Studio and download the models, change the path to where you installed the model. Interestingly I had to go through 3 different AI's to write this script: ChatGPT (on which I'm a Pro subscriber) thought about doing so then returned nothing (shenanigans since I was benchmarking a competitor?), I had run out of my weekly session limit on Pro Max 20x credits on Claude (wonder why I need a local coding agent!) and then Google rose to the challenge and wrote the benchmark for me. I didn't try writing a benchmark like this locally, I'll try that next and report back.


It depends on the hardware, backend and options. I've recently tried running some local AIs (Qwen3.5 9B for the numbers here) on an older AMD 8GB VRAM GPU (so vulkan) and found that:

llama.cpp is about 10% faster than LM studio with the same options.

LM studio is 3x faster than ollama with the same options (~13t/s vs ~38t/s), but messes up tool calls.

Ollama ended up slowest on the 9B, Queen3.5 35B and some random other 8B model.

Note that this isn't some rigorous study or performance benchmarking. I just found ollama unnaceptably slow and wanted to try out the other options.


LM Studio is closed source.

And didn't Ollama independently ship a vision pipeline for some multimodal models months before llama.cpp supported it?


Yes, they introduced that Golang rewrite precisely to support the visual pipeline and other things that weren't in llama.cpp at the time. But then llama.cpp usually catches up and Ollama is just left stranded with something that's not fully competitive. Right now it seems to have messed up mmap support which stops it from properly streaming model weights from storage when doing inference on CPU with limited RAM, even as faster PCIe 5.0 SSDs are finally making this more practical.

The project is just a bit underwhelming overall, it would be way better if they just focused on polishing good UX and fine-tuning, starting from a reasonably up-to-date version of what llama.cpp provides already.


> There is virtually no reason to use Ollama over LM Studio or the myriad of other alternatives.

Hmm, the fact that Ollama is open-source, can run in Docker, etc.?


Ollama is quasi-open source.

In some places in the source code they claim sole ownership of the code, when it is highly derivative of that in llama.cpp (having started its life as a llama.cpp frontend). They keep it the same license, however, MIT.

There is no reason to use Ollama as an alternative to llama.cpp, just use the real thing instead.


If it’s MIT code derived from MIT code, in what way is its openness ”quasi”? Issues of attribution and crediting diminish the karma of the derived project, but I don’t see how it diminishes the level of openness.

FOSS licensing can only exist in terms of Copyright. Without Copyright, you cannot license FOSS. If something has an incorrect Copyright attribution, then the license can be viewed as invalid until this deficiency has been corrected (obv. depending on local laws, etc).

On top of this, it would not be unreasonable for the numerous authors of llama.cpp to issue DMCA takedown requests if Ollama is unwilling to correct it.


Do y'all mean backend or the Ollama frontend or both? I find it trivially easy to sub in my local Ollama api thing in virtually all of the interesting frontend things. I'm quite curious about the "why not Ollama" here.

Does LM Studio have an equivalent to the ollama launch command? i.e. `ollama launch claude --model qwen3.5:35b-a3b-coding-nvfp4`

I don't think it does, but llama.cpp does, and can load models off HuggingFace directly (so, not limited to ollama's unofficial model mirror like ollama is).

There is no reason to ever use ollama.


> I don't think it does, but llama.cpp does

I just checked their docs and can't see anything like it.

Did you mistake the command to just download and load the model?


As a sibling comment answered you, it is `-hf`.

And yes, it downloads the model, caches it, and then serves future loads of that model out of the cache if the file hasn't changed in the hf repo.


So I'm summary: no, it does not have an equivalent command either.

-hf ModelName:Q4_K_M

Did you mistake the command to just download and load the model too?

Actually that shouldn't be a question, you clearly did.

Hint: it also opens Claude code configured to use that model


sure there's a reason...it works fine thats the reason

I feel like the READMEs for these 3 large popular packages already illustrate tradeoffs better than hacker news argument

lm studio is not opensource and you can't use it on the server and connect clients to it?

LM Studio can absolutely run as as server.

IIRC it does so as default too. I have loads of stuff pointing at LM Studio on localhost

Not necessarily; I would very much like to use those features on a Linux server. Currently the Anthropic implementation forces a desktop (or worse, a laptop) to be turned on instead of working headless as far as I understand it.

I’ll give clappie a go, love the theme for the landing page!


I disagree. I think a sharp drop in memory requirements of at least an order of magnitude will cause demand to adjust accordingly.

Department of Transportation always thinks adding more lanes will reduce traffic.

It doesn't, it induces demand. Why? Because there's always too many people with cars who will fill those lanes.


Citation needed. I've heard this quite often, but so far, I haven't seen proof of the stated causality.

PS: This doesn't mean that better public transportation could deliver more bang for the buck than the n-th additional car lane. But never ever have I heard from anybody that they chose to buy a car or use an existing car more often because an additional lane has been built.


Have you tried the "Reference" section on the Wikipedia article?

https://en.wikipedia.org/wiki/Induced_demand#cite_note-vande...


You've never heard anyone choose to take side streets instead of the highway because of traffic jams? No one ever goes out of their way to avoid heavily trafficed areas?

I don't understand what the point is you're trying to make. When people at t0 take detours because of traffic jams on the direct route, and then at t1, there are less traffic jam on the direct route due to additional lanes, so they decide to take the direct route, then total traffic is down, because they no longer take a detour. Even if they are still part of a newly induced traffic jam.

> Rent a VPS in another country and set up your own personal VPN server on it, and no one will be able to block you.

(machine translation)

How would this ever work with a whitelist? did you even read the post?


How did PYPI_PUBLISH lead to a full GH account takeover?

I'd imagine the attacker published a new compromised version of their package, which the author eventually downloaded, which pwned everything else.

Their Personal Access Token must’ve been pwned too, not sure through what mechanism though

They have written about it on github to my question:

Trivvy hacked (https://www.aquasec.com/blog/trivy-supply-chain-attack-what-...) -> all circleci credentials leaked -> included pypi publish token + github pat -> | WE DISCOVER ISSUE | -> pypi token deleted, github pat deleted + account removed from org access, trivvy pinned to last known safe version (v0.69.3)

What we're doing now:

    Block all releases, until we have completed our scans
    Working with Google's mandiant.security team to understand scope of impact
    Reviewing / rotating any leaked credentials
https://github.com/BerriAI/litellm/issues/24518#issuecomment...

69.3 isnt safe. The safe thing to do is remove all trivy access. or failing that version. 0.35 is the last and AFAIK only safe version.

https://socket.dev/blog/trivy-under-attack-again-github-acti...


I have sent your message to the developer on github and they have changed the version to 0.35.0 ,so thanks.

https://github.com/BerriAI/litellm/issues/24518#issuecomment...


Does that explain how circleci was publishing commits and closing issues?

Don't hold your breath for an answer.

>I am unable to understand how it compromised your account itself from the exploit at trivvy being used in CI/CD as well.

Token in CI could've been way too broad.


>1. Looks like this originated from the trivvy used in our ci/cd

Were you not aware of this in the short time frame that it happened in? How come credentials were not rotated to mitigate the trivy compromise?


The latest trivy attack was announced just yesterday. If you go out to dinner or take a night off its totally plausible to have not seen it.

afaik the trivy attack was first in the news on March 19th for the github actions and for docker images it was on March 23rd

[flagged]


Probably more "serious human" than "serious over-capitalist" or "seriously overworked". Good for them.

Bifrost is the only real alternative I'm aware of https://github.com/maximhq/bifrost

Virtual Keys is an Enterprise feature. I am not going to pay for something like this in order to provide my family access to all my models. I can do without cost control (although it would be nice) but I need for users to be able to generate a key and us this key to access all the models I provide.

I just deployed it to test it out and this is FALSE. I was able to create Virtual Keys on the free version with no issues.

Please do a double take on the facts, you might falsely deter people.


I don’t believe it is an enterprise feature. I did some testing on Bifrost just last month on a free open source instance and was able to set up virtual keys.

We have tried reaching out to their sales multiple times but never get a response.

First line of defense is the git host and artifact host scrape the malware clean (in this case GitHub and Pypi).

Domains might get added to a list for things like 1.1.1.2 but as you can imagine that has much smaller coverage, not everyone uses something like this in their DNS infra.


This threat actor is also using Internet Computer Protocol (ICP) "Canisters" to deliver payloads. I'm not too familiar with the project, but I'm not sure blocking domains in DNS would help there.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: