Ancestors

Toot

Written by Alexandre Mutel on 2025-01-10 at 19:26

@nietras I'm waiting for the Phi-4 ONNX version to be available (tried optimum to convert and failed installing it). I could go with LLamaSharp but dunno if it is worth it. With which runtime representation do you mostly work with these days?

=> More informations about this toot | More toots from xoofx@mastodon.social

Descendants

Written by nietras 👾 on 2025-01-10 at 19:59

@xoofx have you tried gguf files https://huggingface.co/microsoft/phi-4-gguf/tree/main with LlamaSharp

Unfortunately many models not available as onnx.

=> More informations about this toot | More toots from nietras@mastodon.social

Written by Alexandre Mutel on 2025-01-10 at 20:06

@nietras Nope, I'm just not sure LlamaSharp is good (e.g. if everything is correctly using the GPU, CUDA if possible...etc.) and how it compares to ONNX. The Semantic Kernel package was nice to use so I was trying to stick with one API... but yeah, maybe there is no other options today...

=> More informations about this toot | More toots from xoofx@mastodon.social

Written by nietras 👾 on 2025-01-10 at 20:26

@xoofx afaik LlamaSharp is based on llama.cpp so I presume gpu support should be reasonable. Onnx rt is great if you have a onnx model but not the case for llms often and api for those still wip...

As mentioned in

https://github.com/awaescher/OllamaSharp MS has abstractions intended for chat/llm use, I believe that's also what semantic kernel uses underneath.

It's hard to keep up and nothing is really mature imo.

=> More informations about this toot | More toots from nietras@mastodon.social

Proxy Information

Original URL: gemini://mastogem.picasoft.net/thread/113805702813813589
Status Code: Success (20)
Meta: text/gemini
Capsule Response Time: 264.110082 milliseconds
Gemini-to-HTML Time: 0.863347 milliseconds

This content has been proxied by September (3851b).