avatar
Tim Kellogg @timkellogg.me

mlx-knife: an ollama-like CLI for Apple Silicon alright, this is the end of the road for me & ollama github.com/mzau/mlx-knife

sep 1, 2025, 11:54 am • 25 1

Replies

avatar
VagabondVisions @vagabondvisions.bsky.social

This is just for working directly with the models, right? It doesn't serve locally so I can use other things to talk to the model?

sep 1, 2025, 5:50 pm • 0 0 • view
avatar
Tim Kellogg @timkellogg.me

this is for local serving on Apple Silicon

sep 1, 2025, 5:53 pm • 1 0 • view
avatar
VagabondVisions @vagabondvisions.bsky.social

Right, yes, (and I am a novice on this, so please bear with me) but it will serve to my network? I'm playing around running various models on my Macbook serving to some apps running on RPis on my network. Anything that does that better or more efficiently, I'm intriqued.

sep 1, 2025, 5:56 pm • 0 0 • view
avatar
Tim Kellogg @timkellogg.me

oh, i think so. you just need an HTTP server, which i think it provides

sep 1, 2025, 6:02 pm • 1 0 • view
avatar
VagabondVisions @vagabondvisions.bsky.social

Excellent, thank you! I will give it a look! I'm working on something that will watch my photography ingest folder on my NAS and then query a vision model to get alt-text that is then written to the sidecar file on the photos.

sep 1, 2025, 6:05 pm • 1 0 • view
avatar
Keith Duke @kmduke.bsky.social

Nice, I’m working with mlx a lot, great to see more examples using the library. Thanks for sharing!

sep 1, 2025, 3:04 pm • 1 0 • view
avatar
🦆 @pretzelkins.bsky.social

Someone should compile a list of these ollama-like adapters for everything

sep 1, 2025, 2:26 pm • 0 0 • view
avatar
Tim Kellogg @timkellogg.me

i’ve been annoyed by ollama’s gguf dependency for a while, increasingly so with all the recent LLM architecture innovations MLX is such a high quality library whereas ollama/llama.cpp us such a.. barely passable tool

sep 1, 2025, 11:54 am • 8 0 • view
avatar
Craig Hughes @craig.rungie.com

Lmstudio was my step up from Ollama. Mlx and gguf with a handy server great ui and mcp client. I wish the server exposed a reranking endpoint but other than that it’s great.

sep 1, 2025, 5:32 pm • 0 0 • view
avatar
Chortle Reborn @warrenchortle.bsky.social

Can you go into more detail about what makes ollama/llama.cpp so hard to deal with? I haven't used either much so I'm curious. I had thought they were pretty well liked by people.

sep 1, 2025, 1:37 pm • 1 0 • view
avatar
Tim Kellogg @timkellogg.me

ollama wraps llama.cpp which wraps gguf file format gguf represents the entire model as a single execution graph. this causes mismatch with some newer model architectures llama.cpp is designed for CPU & personal GPU workloads, and sacrifices a lot of normal features like a KV cache

sep 1, 2025, 1:45 pm • 2 0 • view
avatar
Tim Kellogg @timkellogg.me

basically ollama/llama.cpp/gguf is the mysql/php of AI it’s extremely easy to get started but sacrifices correctness and quality all over

sep 1, 2025, 1:45 pm • 3 1 • view
avatar
𝙄𝙣𝙛𝙞𝙣𝙞𝙩𝙚 𝙅𝙚𝙨𝙩 Audiobook Narrator @jefferyharrell.bsky.social

What's the Postgres of AI? It's not Oracle, but it's the next step up from the thing everybody gets into first?

sep 1, 2025, 1:57 pm • 2 0 • view
avatar
Tim Kellogg @timkellogg.me

vLLM? sglang?

sep 1, 2025, 2:07 pm • 3 0 • view
avatar
𝙄𝙣𝙛𝙞𝙣𝙞𝙩𝙚 𝙅𝙚𝙨𝙩 Audiobook Narrator @jefferyharrell.bsky.social

Never heard of either one! Neat! 😁

sep 1, 2025, 2:08 pm • 1 0 • view
avatar
Tim Kellogg @timkellogg.me

yeah, they’re what you’d use in production. you only do it on a dev box if you’re either masochistic or a fan of Arch Linux

sep 1, 2025, 2:14 pm • 3 0 • view
avatar
𝙄𝙣𝙛𝙞𝙣𝙞𝙩𝙚 𝙅𝙚𝙨𝙩 Audiobook Narrator @jefferyharrell.bsky.social

But sir, you repeat yourself.

sep 1, 2025, 2:15 pm • 2 0 • view
avatar
Chortle Reborn @warrenchortle.bsky.social

Thanks for elaborating. I didn't know they sacrificed the KV cache altogether!

sep 1, 2025, 2:09 pm • 1 0 • view
avatar
Tim Kellogg @timkellogg.me

it’s actually not a bad trade-off for personal gear when you’re extremely memory bound. i just wish they offered *a little*, something to cache the attention sinks

sep 1, 2025, 2:20 pm • 1 0 • view
avatar
Richard Whaling @r.whal.ing

Oooooooh

sep 1, 2025, 11:59 am • 1 0 • view