@nietras I'm waiting for the Phi-4 ONNX version to be available (tried optimum to convert and failed installing it). I could go with LLamaSharp but dunno if it is worth it. With which runtime representation do you mostly work with these days?
=> More informations about this toot | More toots from xoofx@mastodon.social
@xoofx have you tried gguf files https://huggingface.co/microsoft/phi-4-gguf/tree/main with LlamaSharp
Unfortunately many models not available as onnx.
=> More informations about this toot | More toots from nietras@mastodon.social
@nietras Nope, I'm just not sure LlamaSharp is good (e.g. if everything is correctly using the GPU, CUDA if possible...etc.) and how it compares to ONNX. The Semantic Kernel package was nice to use so I was trying to stick with one API... but yeah, maybe there is no other options today...
=> More informations about this toot | More toots from xoofx@mastodon.social
@xoofx afaik LlamaSharp is based on llama.cpp so I presume gpu support should be reasonable. Onnx rt is great if you have a onnx model but not the case for llms often and api for those still wip...
As mentioned in
https://github.com/awaescher/OllamaSharp MS has abstractions intended for chat/llm use, I believe that's also what semantic kernel uses underneath.
It's hard to keep up and nothing is really mature imo.
=> More informations about this toot | More toots from nietras@mastodon.social This content has been proxied by September (3851b).Proxy Information
text/gemini