Skip to content

Releases: shareup/shllm

v0.10.0

11 Dec 00:26
d7f4322

Choose a tag to compare

  • Parse Qwen3-VL reasoning tokens correctly.
  • Add Qwen3-VL-2B-Thinking-4bit and Qwen3-VL-4B-Instruct-4bit models.

v0.9.2

08 Dec 21:32
20ba887

Choose a tag to compare

  • End reasoning blocks when tool calls arrive
  • Add Qwen3-VL models

v0.9.1

06 Nov 20:58
57c10e4

Choose a tag to compare

  • Add 4-bit quantization of gpt-oss

Full Changelog: v0.9.0...v0.9.1

v0.9.0

22 Oct 01:09
3e10cf5

Choose a tag to compare

  • Support Python-formatted tool calls from LFM2-8B-A1B.
  • Expose Python.parseFunctionCall.
  • Add tools parameter to LLM.init() and convenience factory methods.

v0.8.1

17 Oct 23:33
d24bcbc

Choose a tag to compare

  • Remove responseParser argument from model factories.

v0.8.0

17 Oct 22:29
99166ae

Choose a tag to compare

  • Add LFM2-8B-A1B-4bit
  • Remove SHLLM.memoryLimit
  • Update mlx-swift-examples

v0.7.0

16 Oct 22:01
56c99d4

Choose a tag to compare

  • Add support for GPT-OSS and improve tool calling across all models.

GPT-OSS model tool calling

let chat: [Chat.Message] = [
    .system(
        "You are a helpful assistant that can provide stock prices. When asked for a stock price, you must use the get_stock_price tool."
    ),
    .user("What is the price of AAPL?"),
]

var input = UserInput(chat: chat)

guard let llm1 = try gptOSS_20B(
    input,
    tools: [stockTool]
) else { return }

let (_, _, toolCalls) = try await llm1.result
guard let toolCall = toolCalls?.first else {
    return
}

input.appendHarmonyAssistantToolCall(toolCall)
input.appendHarmonyToolResult(["price": 123.45])

guard let llm2 = try gptOSS_20B(
    input,
    tools: [stockTool]
) else { return }

let result = try await llm2.text.result

Other model tool calling

let chat: [Chat.Message] = [
    .system(
        "You are a helpful assistant that can provide stock prices. When asked for a stock price, you must use the get_stock_price tool."
    ),
    .user("What is the price of AAPL?"),
]

var input = UserInput(chat: chat)

guard let llm1 = try qwen3MoE(
    input,
    tools: [stockTool]
) else { return }

let (_, _, toolCalls) = try await llm1.result
guard let toolCall = toolCalls?.first else {
    return
}

input.appendToolResult(["price": 123.45])

guard let llm2 = try qwen3MoE(
    input,
    tools: [stockTool]
) else { return }

let result = try await llm2.text.result

v0.6.0

11 Oct 10:02
c9b9127

Choose a tag to compare

  • Cache most recently-used model to ensure follow-up inference is fast. This behavior can be controlled via SHLLM.isModelCacheEnabled.

v0.5.3

24 Sep 11:20
5b12246

Choose a tag to compare

  • Update mlx-swift-examples

v0.5.2

06 Sep 00:38
5d810be

Choose a tag to compare

  • Update to newest version of mlx-swift-examples
  • Add C++17 language standard specification to the package configuration