-
Notifications
You must be signed in to change notification settings - Fork 87
Add grammar-based tool calling support to RunAnywhere SDK #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Introduced Tool and ToolCall models for defining and executing tools. - Implemented grammar-based constrained generation in LlamaCppService. - Added RunAnywhere.generateWithTools() for generating text with tool calls. - Created GrammarBridge for JNI integration with grammar functions. - Updated GenerationOptions to include tools and tool calling mode. - Added example usage in documentation for clarity.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis PR introduces tool-calling capabilities to the RunAnywhere SDK, transitioning from a deprecated grammar-based approach to a prompt-based implementation. It adds Tool model definitions, orchestration components, Android UI examples, native JNI bindings for grammar support, and comprehensive documentation including troubleshooting and setup guides. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Android as Android App
participant PromptBased as PromptBasedToolCalling
participant PromptBuilder as PromptBuilder
participant LLM as LLM Service
participant ResponseParser as ResponseParser
participant Dedup as Deduplicator
User->>Android: Enter prompt & select tools
Android->>PromptBased: generateWithTools(userMessage, tools)
PromptBased->>PromptBuilder: buildSystemPrompt(tools)
PromptBuilder-->>PromptBased: system prompt with tool descriptions
PromptBased->>LLM: send(systemPrompt + userMessage)
LLM-->>PromptBased: LLM response (JSON or text)
PromptBased->>ResponseParser: parse(response, tools)
ResponseParser-->>PromptBased: ParseResult (ToolCall or RegularResponse)
alt ToolCall detected
PromptBased->>Dedup: isDuplicate(toolName, args)
Dedup-->>PromptBased: false (new call)
PromptBased->>Android: ToolCallResult with tool execution
else RegularResponse detected
PromptBased->>Android: ToolCallResult with text response
else Invalid or error
PromptBased->>Android: retry or error result
end
Android->>User: Display results & tool execution output
sequenceDiagram
participant Android as Android App
participant LLamaAndroid as LLamaAndroid (Grammar Path)
participant GrammarBridge as GrammarBridge
participant NativeGrammar as Native llama.cpp
participant TextStream as Token Stream
Android->>GrammarBridge: Grammar.fromSchema(modelPtr, jsonSchema)
GrammarBridge->>NativeGrammar: jsonSchemaToGBNF(jsonSchema)
NativeGrammar-->>GrammarBridge: GBNF string
GrammarBridge->>NativeGrammar: createGrammar(modelPtr, gbnfRules)
NativeGrammar-->>GrammarBridge: grammar handle
GrammarBridge-->>Android: Grammar object (AutoCloseable)
Android->>LLamaAndroid: sendWithGrammar(message, grammarHandle)
LLamaAndroid->>NativeGrammar: new_sampler_with_grammar(temp, minP, topK, grammar)
NativeGrammar-->>LLamaAndroid: sampler chain (owns grammar)
LLamaAndroid->>NativeGrammar: generate tokens (constrained by grammar)
NativeGrammar-->>TextStream: constrained token stream
TextStream-->>Android: Flow<String> with grammar-constrained output
Note over NativeGrammar: Grammar lifecycle owned by sampler chain
Note over Android: Note: Grammar-based approach deprecated
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes
Possibly related issues
Possibly related PRs
Suggested labels
Poem
Pre-merge checks and finishing touches❌ Failed checks (3 warnings)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This commit preserves our grammar-based tool calling investigation before switching to the prompt-based approach. ## What was implemented: - grammar_jni.cpp: JNI bindings for JSON schema → GBNF conversion - GrammarBridge.kt: Kotlin wrapper for grammar operations - Grammar-constrained sampler chains with proper ordering - Tool calling API with schema generation - Integration with LlamaCppService ## Issues encountered: 1. SIGABRT crashes in llama_grammar_accept_impl after ~10 tokens 2. Root causes: - llama.cpp grammar stack bugs (unfixed as of Oct 2025) - Chat template interference with Qwen2 EOG tokens - Missing additionalProperties: false in schemas - Double-accept bug (fixed but crash persisted) ## Investigation summary: - 40+ hours of debugging and deep-diving into llama.cpp source - Fixed double-free bug, empty response issue, double-accept issue - Crash persisted due to upstream llama.cpp grammar bugs - Success rate: ~70-85% with crashes (unacceptable for production) ## Decision: Switch to prompt-based tool calling with few-shot examples: - No crashes - pure Kotlin/JSON parsing - 85-95% success rate (proven in production) - Works with small models (0.5B) via few-shot examples - Universal compatibility, easier debugging ## Files added: - GRAMMAR_IMPLEMENTATION_NOTES.md: Full investigation documentation - TOOL_CALLING.md: Grammar-based approach docs ## Next steps: Implement prompt-based tool calling per prompt_based_tool_calling_implementation.md
|
@coderabbitai please review this |
|
@sanchitmonga22: I'll review the changes in this PR that adds grammar-based tool calling support to the RunAnywhere SDK. ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 19
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
QUICKSTART_ANDROID.md (1)
815-844: API Reference missing new tool-calling APIs.This PR introduces tool-calling capabilities (
generateWithTools(), Tool models, etc.), but the API Reference section doesn't document these new public APIs. Users won't know how to use the feature without this guidance.Add documentation for:
RunAnywhere.generateWithTools()and its parameters- Tool model definition and schema
- Tool calling modes (grammar-based vs. prompt-based)
- Response parsing and tool execution
sdk/runanywhere-kotlin/native/llama-jni/CMakeLists.txt (1)
63-68: Fix ABI‑specific define leaking to non‑arm64 builds.GGML_USE_CPU_AARCH64 should only be defined for arm64‑v8a.
Apply:
target_compile_definitions(${target_name} PRIVATE GGML_COMMIT="" GGML_VERSION="" GGML_USE_CPU - GGML_USE_CPU_AARCH64 + $<$<STREQUAL:${ANDROID_ABI},arm64-v8a>:GGML_USE_CPU_AARCH64> )sdk/runanywhere-kotlin/native/llama-jni/src/llama-android.cpp (4)
48-53: Fix format-string misuse in log callback (potential crash/UB).fmt is a message, not a printf format. Print with "%s".
static void log_callback(ggml_log_level level, const char * fmt, void * data) { - if (level == GGML_LOG_LEVEL_ERROR) __android_log_print(ANDROID_LOG_ERROR, TAG, fmt, data); - else if (level == GGML_LOG_LEVEL_INFO) __android_log_print(ANDROID_LOG_INFO, TAG, fmt, data); - else if (level == GGML_LOG_LEVEL_WARN) __android_log_print(ANDROID_LOG_WARN, TAG, fmt, data); - else __android_log_print(ANDROID_LOG_DEFAULT, TAG, fmt, data); + int prio = ANDROID_LOG_DEFAULT; + if (level == GGML_LOG_LEVEL_ERROR) prio = ANDROID_LOG_ERROR; + else if (level == GGML_LOG_LEVEL_INFO) prio = ANDROID_LOG_INFO; + else if (level == GGML_LOG_LEVEL_WARN) prio = ANDROID_LOG_WARN; + __android_log_print(prio, TAG, "%s", fmt); }
160-181: Replace manual batch allocation/free with llama.cpp API (heap corruption risk).free_batch frees 512 seq entries regardless of actual n_tokens; this can double-free or leak. Use llama_batch_init/llama_batch_free to avoid mismatched alloc/free.
JNIEXPORT jlong JNICALL Java_com_runanywhere_sdk_llm_llamacpp_LLamaAndroid_new_1batch(JNIEnv *, jobject, jint n_tokens, jint embd, jint n_seq_max) { - llama_batch *batch = new llama_batch { - 0, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, - }; - if (embd) { - batch->embd = (float *) malloc(sizeof(float) * n_tokens * embd); - } else { - batch->token = (llama_token *) malloc(sizeof(llama_token) * n_tokens); - } - batch->pos = (llama_pos *) malloc(sizeof(llama_pos) * n_tokens); - batch->n_seq_id = (int32_t *) malloc(sizeof(int32_t) * n_tokens); - batch->seq_id = (llama_seq_id **) malloc(sizeof(llama_seq_id *) * n_tokens); - for (int i = 0; i < n_tokens; ++i) { - batch->seq_id[i] = (llama_seq_id *) malloc(sizeof(llama_seq_id) * n_seq_max); - } - batch->logits = (int8_t *) malloc(sizeof(int8_t) * n_tokens); - return reinterpret_cast<jlong>(batch); + auto batch = new llama_batch(llama_batch_init(n_tokens, embd, n_seq_max)); + return reinterpret_cast<jlong>(batch); } ... JNIEXPORT void JNICALL Java_com_runanywhere_sdk_llm_llamacpp_LLamaAndroid_free_1batch(JNIEnv *, jobject, jlong batch_pointer) { - const auto batch = reinterpret_cast<llama_batch *>(batch_pointer); - free(batch->pos); - free(batch->n_seq_id); - for (int i = 0; i < 512; ++i) { // Match batch size - free(batch->seq_id[i]); - } - free(batch->seq_id); - free(batch->logits); - if (batch->token) free(batch->token); - if (batch->embd) free(batch->embd); - delete batch; + auto batch = reinterpret_cast<llama_batch *>(batch_pointer); + llama_batch_free(*batch); + delete batch; }If your llama.cpp version exposes different helpers, adapt accordingly (the key point is: don’t hardcode 512 and don’t hand-roll alloc/free).
Also applies to: 187-199
11-15: Avoid caching local jclass globally; cache only method IDs.la_int_var holds a local ref that becomes invalid. Resolve class per-call, cache method IDs, and release the local ref.
-// Global variables for JNI method caching -jclass la_int_var; -jmethodID la_int_var_value; -jmethodID la_int_var_inc; +// Global method IDs (valid across JNIEnv calls) +static jmethodID la_int_var_value = nullptr; +static jmethodID la_int_var_inc = nullptr; ... - if (!la_int_var) la_int_var = env->GetObjectClass(intvar_ncur); - if (!la_int_var_value) la_int_var_value = env->GetMethodID(la_int_var, "getValue", "()I"); - if (!la_int_var_inc) la_int_var_inc = env->GetMethodID(la_int_var, "inc", "()V"); + jclass cls = env->GetObjectClass(intvar_ncur); + if (!la_int_var_value) la_int_var_value = env->GetMethodID(cls, "getValue", "()I"); + if (!la_int_var_inc) la_int_var_inc = env->GetMethodID(cls, "inc", "()V"); + env->DeleteLocalRef(cls);Also applies to: 316-319
1-6: Add missing header for std::transform.Ensure is included to avoid build issues.
#include <android/log.h> #include <jni.h> #include <string> +#include <algorithm> #include "llama.h" #include "common.h"Also applies to: 420-423
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt (1)
151-164: Reduce sensitive logging (PII risk) and verbosity.Logging full messages and full prompts can leak user data; trim lengths or guard behind a debug flag.
-logger.info("📨 Messages:") -input.messages.forEach { msg -> - logger.info(" - ${msg.role}: ${msg.content.take(100)}") -} -logger.info("📝 Full prompt:\n$prompt") +logger.debug("📨 ${input.messages.size} messages (content redacted; showing first 100 chars each in debug builds)") +input.messages.forEach { msg -> + logger.debug(" - ${msg.role}: ${msg.content.take(100)}") +} +logger.debug("📝 Prompt preview (first 500 chars): ${prompt.take(500)}")If SDKLogger supports levels, prefer debug for previews; otherwise gate with a configuration flag.
Also applies to: 165-175, 316-319
🧹 Nitpick comments (31)
QUICKSTART_ANDROID.md (4)
9-52: Critical Dependencies section is comprehensive but may need versioning strategy.The new dependencies checklist provides excellent clarity. However, consider documenting:
- Which dependencies are truly required vs. recommended
- Compatibility matrix for different Kotlin/Ktor versions
- Migration path if versions need updates
This will reduce future maintenance burden and support burden.
54-75: Hard-coded release URLs may create maintenance friction.The AAR download links are pinned to
android-v0.1.0-alpha. Consider:
- Adding a "Check for updates" note or link to releases page
- Documenting how/when to update URLs for new versions
- Adding checksums or signatures for AAR integrity verification (security best practice)
53-53: Fix static analysis linting issues.Per
markdownlint-cli2output:
- Line 53 (MD040): Fenced code blocks should have a language specified (e.g.,
```tomlinstead of```)- Line 205 (MD038): Remove extra spaces inside inline code spans
These are formatting nits but fixing them will improve documentation consistency and pass linting checks.
Also applies to: 205-205
728-801: Troubleshooting section is strong on dependencies but lacks tool-calling guidance.The expanded troubleshooting for Ktor/Kotlin compatibility is excellent. However, given that this PR introduces tool-calling capabilities to the SDK, consider adding a troubleshooting section for:
- Tool schema validation errors
- Tool execution failures
- Grammar-based vs. prompt-based tool calling mode selection/debugging
- Common tool-calling issues with specific error messages
This ensures users have adequate guidance for the new feature set introduced in this PR.
sdk/runanywhere-kotlin/native/llama-jni/CMakeLists.txt (2)
52-60: Scope compile options to this target; avoid PUBLIC propagation.These flags shouldn’t leak to dependents.
- target_compile_options(${target_name} PUBLIC + target_compile_options(${target_name} PRIVATE -O3 -fvisibility=hidden -fvisibility-inlines-hidden -ffunction-sections -fdata-sections ${cpu_flags} )Also confirm NDK Clang accepts -ffile-prefix-map; if not, prefer -fdebug-prefix-map.
71-77: Multiple CPU‑feature variants require runtime selection logic.Building 7 differently‑named .so files under the same ABI won’t auto‑select at load time. Ensure the app chooses the right library by name based on detected CPU features, or only package a baseline.
Please point to (or add) the code that:
- Detects CPU features at runtime.
- Loads the best‑fit lib via System.loadLibrary("<matching‑name>").
- Excludes unused variants from release artifacts if not selected.
Also applies to: 81-97
sdk/runanywhere-kotlin/native/llama-jni/GRAMMAR_IMPLEMENTATION_NOTES.md (1)
3-6: Align docs and build: deprecate in code, too.Since status is DEPRECATED, add a note pointing to the CMake flag (or disable building by default). Keeps artifacts but avoids shipping them unintentionally.
examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/navigation/AppNavigation.kt (1)
109-114: Minor: use string resources and explicit content descriptions.For i18n and accessibility, move "Tools" to a string resource and ensure icon content descriptions are localized.
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBuilder.kt (1)
54-75: Optional: allow non‑string argument types.If ToolParameter types include numbers/booleans/arrays, consider Map<String, JsonElement> (or a serializable data class) for examples to preserve types.
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/models/GenerationOptions.kt (1)
56-61: Add validation and parity helpers for tool-calling fields.
- When toolCallingMode == GRAMMAR_BASED and tools is empty, fail fast with a clear message in validate().
- Consider convenience methods withTools(...) and withToolCallingMode(...) to mirror other withX helpers.
Apply this diff to tighten validation:
fun validate() { require(maxTokens > 0) { "Max tokens must be positive" } require(temperature >= 0f && temperature <= 2f) { "Temperature must be between 0 and 2" } require(topP >= 0f && topP <= 1f) { "Top-p must be between 0 and 1" } topK?.let { require(it > 0) { "Top-K must be positive" } } repetitionPenalty?.let { require(it > 0f) { "Repetition penalty must be positive" } } frequencyPenalty?.let { require(it >= -2f && it <= 2f) { "Frequency penalty must be between -2 and 2" } } presencePenalty?.let { require(it >= -2f && it <= 2f) { "Presence penalty must be between -2 and 2" } } contextLength?.let { require(it > 0) { "Context length must be positive" } } + if (toolCallingMode == ToolCallingMode.GRAMMAR_BASED) { + require(tools.isNotEmpty()) { "Tools must be provided when using GRAMMAR_BASED tool calling" } + } }sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/PromptBasedToolCallingExtensions.kt (2)
64-88: Early fallback when tools list is empty.Avoid invoking the orchestrator with no tools; return regular generation to callers.
if (tools.isEmpty()) { - logger.warn("No tools provided, falling back to regular generation") + logger.warn("No tools provided, falling back to regular generation") + val text = this.generate( + prompt = prompt, + options = options ?: RunAnywhereGenerationOptions(temperature = 0.3f) + ) + return ToolCallResult(success = true, text = text, toolCalls = emptyList(), mode = ToolCallingMode.PROMPT_BASED) }Note: adjust generate(...) call to your actual RunAnywhere API if named differently.
130-135: Prevent silent no‑op: deprecate ToolBuilder.example.Mark as @deprecated with a ReplaceWith to steer devs to Tool.withExamples(...).
-fun ToolBuilder.example(userQuery: String, arguments: Map<String, String>) { +@Deprecated( + message = "ToolBuilder.example() is not supported. Use Tool.withExamples() on the built Tool.", + replaceWith = ReplaceWith("/* build tool first */") +) +fun ToolBuilder.example(userQuery: String, arguments: Map<String, String>) { // Note: ToolBuilder doesn't currently support examples directly, // so users should use .withExamples() after building the tool. // This function is provided for future extensibility. logger.warn("ToolBuilder.example() is not yet supported. Use Tool.withExamples() instead.") }examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/tools/ToolCallingScreen.kt (3)
176-181: Add content descriptions for icons (accessibility).Provide non-null contentDescription for assistive tech; keep emojis in text but make icons screen-reader friendly.
-Icon( - imageVector = Icons.Default.Error, - contentDescription = null, +Icon( + imageVector = Icons.Default.Error, + contentDescription = "Error", tint = MaterialTheme.colorScheme.error ) ... -Icon( - imageVector = if (result.success) Icons.Default.CheckCircle else Icons.Default.Error, - contentDescription = null, +Icon( + imageVector = if (result.success) Icons.Default.CheckCircle else Icons.Default.Error, + contentDescription = if (result.success) "Success" else "Error", tint = if (result.success) { MaterialTheme.colorScheme.primary } else { MaterialTheme.colorScheme.error } )Also applies to: 216-223
44-46: Externalize hardcoded strings to resources.Move UI strings to string resources for i18n and consistency.
Example (not exhaustive): "Tool Calling Demo", "Available Tools", tool descriptions, placeholders, "Processing...", "Call Tools", and the model warning.
Also applies to: 80-91, 98-107, 126-127, 133-137
153-160: Provide a stable key for LazyColumn items.Avoid recomposition jitter by supplying keys.
- items(uiState.results) { result -> + items(uiState.results, key = { it.id /* or index if no stable id */ }) { result -> ToolResultCard(result) }If ToolExecutionResult has no id, use itemsIndexed and key = { idx, _ -> idx } as a fallback.
sdk/runanywhere-kotlin/native/llama-jni/src/llama-android.cpp (1)
112-119: Correct error message for context creation.The call is llama_new_context_with_model(); update the message to match for easier debugging.
- LOGe("llama_init_from_model() returned null"); - env->ThrowNew(env->FindClass("java/lang/IllegalStateException"), - "llama_init_from_model() returned null"); + LOGe("llama_new_context_with_model() returned null"); + env->ThrowNew(env->FindClass("java/lang/IllegalStateException"), + "llama_new_context_with_model() returned null");sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt (2)
542-566: Replace manual JSON string building with kotlinx.serialization builders.Manual escaping is brittle (e.g., control chars, unicode). Build a JsonElement and encode it.
-private fun buildJsonString(obj: Any?): String { - return when (obj) { - null -> "null" - is String -> "\"${obj.replace("\\", "\\\\").replace("\"", "\\\"")}\"" - is Number -> obj.toString() - is Boolean -> obj.toString() - is Map<*, *> -> { - obj.entries.joinToString(separator = ",", prefix = "{", postfix = "}") { (key, value) -> - "\"$key\":${buildJsonString(value)}" - } - } - is List<*> -> { - obj.joinToString(separator = ",", prefix = "[", postfix = "]") { buildJsonString(it) } - } - else -> "\"$obj\"" - } -} +private fun buildJsonString(obj: Any?): String { + fun toJsonElement(value: Any?): JsonElement = when (value) { + null -> JsonNull + is JsonElement -> value + is String -> JsonPrimitive(value) + is Number -> JsonPrimitive(value) + is Boolean -> JsonPrimitive(value) + is Map<*, *> -> JsonObject(value.entries.associate { (k, v) -> + k.toString() to toJsonElement(v) + }) + is List<*> -> JsonArray(value.map { toJsonElement(it) }) + else -> JsonPrimitive(value.toString()) + } + return Json.encodeToString(toJsonElement(obj)) +}
467-499: System prompt rendering: handle empty parameter lists gracefully.When a tool has no parameters, avoid rendering “Parameters: ” with trailing space; it confuses models.
val toolDescriptions = tools.joinToString("\n") { tool -> - val params = tool.parameters.properties.entries.joinToString(", ") { (name, param) -> + val props = tool.parameters.properties + val params = if (props.isEmpty()) "(none)" else props.entries.joinToString(", ") { (name, param) -> val req = if (param.required) "(required)" else "(optional)" "$name: ${param.type.toJsonSchemaType()} $req - ${param.description}" } "- ${tool.name}: ${tool.description}\n Parameters: $params" }sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/ResponseParser.kt (4)
59-65: Don't swallow parse exceptions silently; log at debug to aid diagnostics.Falling back is fine, but detekt is right: we should at least log the failure reason.
- private fun tryParseJson(text: String): JsonObject? { - return try { - json.parseToJsonElement(text).jsonObject - } catch (e: Exception) { - null - } - } + private fun tryParseJson(text: String): JsonObject? = + try { + json.parseToJsonElement(text).jsonObject + } catch (e: Exception) { + logger.debug("Direct JSON parse failed: ${e.message}") + null + }Based on learnings.
70-86: Precompile code‑block regexes and support case‑insensitive ```JSON fences; also scan all blocks.Reduces allocations and improves accuracy when multiple fenced blocks exist.
Add near Line 18:
private val codeBlockRegexes = listOf( Regex("```json\\s*(.+?)\\s*```", setOf(RegexOption.DOT_MATCHES_ALL, RegexOption.IGNORE_CASE)), Regex("```\\s*(.+?)\\s*```", RegexOption.DOT_MATCHES_ALL) )Then refactor function:
- private fun extractFromCodeBlock(response: String): JsonObject? { - val patterns = listOf( - "```json\\s*(.+?)\\s*```", - "```\\s*(.+?)\\s*```" - ) - - for (pattern in patterns) { - val regex = Regex(pattern, RegexOption.DOT_MATCHES_ALL) - val match = regex.find(response) - if (match != null) { - val json = tryParseJson(match.groupValues[1].trim()) - if (json != null) return json - } - } - return null - } + private fun extractFromCodeBlock(response: String): JsonObject? { + codeBlockRegexes.forEach { regex -> + regex.findAll(response).forEach { m -> + tryParseJson(m.groupValues[1].trim())?.let { return it } + } + } + return null + }
91-100: Naive first/last brace slice can over‑capture; prefer balanced scanning.Current approach breaks with prose containing multiple JSON fragments. Consider a simple depth scan.
private fun extractJsonFromText(response: String): JsonObject? { var depth = 0 var start = -1 for ((i, ch) in response.withIndex()) { if (ch == '{') { if (depth++ == 0) start = i } else if (ch == '}' && --depth == 0 && start >= 0) { tryParseJson(response.substring(start, i + 1))?.let { return it } } } return null }
169-174: Potential NPEs if schema fields are nullable; guard or confirm non‑null.If Tool.parameters.required or .properties can be null from deserialization, this can NPE.
If nullable, refactor:
- val requiredParams = tool.parameters.required - val allParams = tool.parameters.properties.keys + val requiredParams = tool.parameters.required ?: emptyList() + val allParams = tool.parameters.properties?.keys ?: emptySet()Please confirm types in sdk/.../models/Tool.kt. Based on learnings.
Also applies to: 171-172, 185-191
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/GrammarBridge.kt (1)
110-125: Minor: Remove unused constructor parameter or use it.modelPointer is stored but never used.
sdk/runanywhere-kotlin/native/llama-jni/src/grammar_jni.cpp (2)
181-248: Ownership contract: Kotlin must mark grammar handle as transferred before adding to chain.To prevent double‑free, ensure Kotlin side calls Grammar.markOwnedByChain() before invoking this JNI with grammar_pointer.
I can follow through in LLamaAndroid.kt to wire this correctly once available in the API. Based on learnings.
245-246: Misleading sampler count log.Message prints 1/0 based on grammar presence, ignoring other samplers. Either remove or compute chain size if API supports it.
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhereToolCalling.kt (1)
51-64: Improve deprecation ergonomics with ReplaceWith and remove unused logger/imports.Make migration clearer and clean up unused items.
@Deprecated( - message = "Grammar-based tool calling crashes with SIGABRT. See issue #182. Use prompt-based approach instead.", - level = DeprecationLevel.ERROR + message = "Grammar-based tool calling crashes with SIGABRT. See issue #182. Use prompt-based approach instead.", + replaceWith = ReplaceWith("generateWithToolsPromptBased(prompt, tools, options)"), + level = DeprecationLevel.ERROR )And remove:
- private val logger (unused)
- import com.runanywhere.sdk.llm.llamacpp.LlamaCppService (unused)
examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/tools/ToolCallingViewModel.kt (2)
201-210: Optional: honor timezone using java.time.For a better demo, actually format time for the requested ZoneId.
val zone = if (timezone.isNullOrBlank()) ZoneId.systemDefault() else runCatching { ZoneId.of(timezone) }.getOrDefault(ZoneId.systemDefault()) val zoned = ZonedDateTime.now(zone).format(DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")) "Current time (${zone.id}): $zoned"
171-177: Minor UX: non‑tool responses aren’t errors.Consider showing as info/result instead of errorMessage.
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBasedToolCalling.kt (1)
234-236: Consider using UUID for more robust unique IDs.The current timestamp + random approach works but could have collisions if called rapidly. A UUID would provide stronger uniqueness guarantees.
Apply this diff to use UUID:
private fun generateToolCallId(): String { - return "call_${System.currentTimeMillis()}_${(Math.random() * 1000).toInt()}" + return "call_${java.util.UUID.randomUUID()}" }sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt (1)
314-317: Hardcoded 2000-character limit should be configurable.The magic number at Line 314 limits all grammar-based generation. Tool calls can vary in size, and this limit may be too restrictive for complex schemas or too permissive for simpler ones.
Extract this as a configurable parameter:
fun sendWithGrammarSchema( message: String, schemaJson: String, config: LlamaModelConfig? = null, - parseSpecialTokens: Boolean = true + parseSpecialTokens: Boolean = true, + maxOutputChars: Int = 2000 ): Flow<String> = flow { ... - if (generatedText.length > 2000) { + if (generatedText.length > maxOutputChars) { - logger.warn("⚠️ Generated text exceeds 2000 chars, stopping early to prevent runaway generation") + logger.warn("⚠️ Generated text exceeds $maxOutputChars chars, stopping early to prevent runaway generation") break }sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/models/Tool.kt (1)
119-148: Arguments schema is generic in multi-tool schema.The
generateSchemamethod at Lines 137-140 definesargumentsas a generic object without per-tool parameter validation. This means the schema won't enforce correct parameters for each tool name.Consider whether per-tool argument schemas should be included in the generated schema. If validation is handled elsewhere (e.g., in
ResponseParser), this is acceptable; otherwise, the schema could be strengthened with aoneOfconstruct that maps tool names to their specific parameter schemas.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
.idea/vcs.xml(1 hunks)QUICKSTART_ANDROID.md(8 hunks)examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/navigation/AppNavigation.kt(5 hunks)examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/tools/ToolCallingScreen.kt(1 hunks)examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/tools/ToolCallingViewModel.kt(1 hunks)sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/GrammarBridge.kt(1 hunks)sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt(7 hunks)sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt(2 hunks)sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/PromptBasedToolCallingExtensions.kt(1 hunks)sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhereToolCalling.kt(1 hunks)sdk/runanywhere-kotlin/native/llama-jni/CMakeLists.txt(1 hunks)sdk/runanywhere-kotlin/native/llama-jni/GRAMMAR_IMPLEMENTATION_NOTES.md(1 hunks)sdk/runanywhere-kotlin/native/llama-jni/llama.cpp(1 hunks)sdk/runanywhere-kotlin/native/llama-jni/src/grammar_jni.cpp(1 hunks)sdk/runanywhere-kotlin/native/llama-jni/src/llama-android.cpp(7 hunks)sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/models/GenerationOptions.kt(2 hunks)sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/models/Tool.kt(1 hunks)sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/Deduplicator.kt(1 hunks)sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBasedToolCalling.kt(1 hunks)sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBuilder.kt(1 hunks)sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/ResponseParser.kt(1 hunks)
🧰 Additional context used
🪛 Clang (14.0.6)
sdk/runanywhere-kotlin/native/llama-jni/src/grammar_jni.cpp
[error] 38-38: 'android/log.h' file not found
(clang-diagnostic-error)
🪛 detekt (1.23.8)
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/ResponseParser.kt
[warning] 62-62: The caught exception is swallowed. The original exception could be lost.
(detekt.exceptions.SwallowedException)
🪛 LanguageTool
sdk/runanywhere-kotlin/native/llama-jni/GRAMMAR_IMPLEMENTATION_NOTES.md
[style] ~144-~144: Consider using a shorter alternative to avoid wordiness.
Context: ...en_id); // ❌ WRONG! ``` Result: ❌ Made it worse! Still crashed, but this was the WRONG ...
(MADE_IT_JJR)
[style] ~418-~418: To elevate your writing, try using a synonym here.
Context: ...gability is critical** - C++ crashes hard to debug on Android - Kotlin string ...
(HARD_TO)
[style] ~450-~450: Consider using a shorter alternative to avoid wordiness.
Context: ... - Day 5: Add explicit accept call (made it worse) - Day 6: Deep-dive into llama.cpp ...
(MADE_IT_JJR)
🪛 markdownlint-cli2 (0.18.1)
QUICKSTART_ANDROID.md
53-53: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
205-205: Spaces inside code span elements
(MD038, no-space-in-code)
🔇 Additional comments (16)
.idea/vcs.xml (1)
5-19: IDE VCS configuration looks good.The added Git mappings enable proper version control tracking across multiple external dependencies and submodules, consistent with a complex multi-module SDK project. No concerns with this change.
QUICKSTART_ANDROID.md (1)
129-129: AndroidX Security Crypto dependency is still in alpha.Line 129 and 254 specify
androidx.security:security-crypto:1.1.0-alpha06. Using an alpha library in production SDK documentation may cause stability concerns. Verify if:
- A stable version (1.1.0+) is available
- If alpha is intentional, document the rationale and any known issues
Also applies to: 254-254
sdk/runanywhere-kotlin/native/llama-jni/llama.cpp (1)
1-1: Clarify the purpose of the submodule update in context of deprecated grammar implementation.The investigation reveals a critical mismatch: grammar-based tool calling was implemented, investigated for ~40 hours, found to cause production crashes due to unfixed llama.cpp bugs, and replaced with prompt-based approach. Yet:
- grammar_jni.cpp is marked deprecated with status: "This implementation is no longer used in production"
- CMakeLists.txt still compiles grammar_jni.cpp despite deprecation
- llama.cpp grammar bugs remain unfixed as of October 2025 (multiple GitHub issues referenced)
The review comment assumes grammar support is the submodule's purpose, but the codebase evidence shows the team moved to prompt-based tool calling. Before approving this submodule update, clarify:
- Is this update necessary for the prompt-based tool calling implementation, or is it maintenance-only?
- If grammar is deprecated and unused, should grammar_jni.cpp be removed from the build entirely?
- What specific functionality from the new llama.cpp commit is required by the PR's actual implementation?
examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/navigation/AppNavigation.kt (2)
65-67: Route wiring looks good.TOOLS route is correctly added and points to ToolCallingScreen.
156-157: Constant addition LGTM.NavigationRoute.TOOLS looks consistent with existing routes.
sdk/runanywhere-kotlin/native/llama-jni/GRAMMAR_IMPLEMENTATION_NOTES.md (1)
209-211: No issues found—commit reference is accurate.The web search confirms that commit 6777c544 exists in the llama.cpp repository and directly addresses the grammar stability and additionalProperties handling issues as stated in the documentation. The reference is valid and requires no updates.
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt (1)
362-463: No duplicate implementations found in LlamaCppService.kt.The search reveals only one
generateWithToolsdefinition at line 362 in LlamaCppService.kt. The other occurrence found (line 38 in PromptBasedToolCalling.kt) is in a separate file and represents the interface/abstract definition, not a duplicate implementation. The code at lines 362-463 is the sole concrete implementation in this file.Likely an incorrect or invalid review comment.
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBasedToolCalling.kt (2)
21-28: LGTM: Clean initialization pattern.The orchestrator components are well-organized with appropriate separation of concerns.
182-199: LGTM: Appropriate error handling for service access.The type cast at Line 194 is intentional and provides clear error messages when preconditions aren't met.
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt (3)
60-62: LGTM: Parameterized token limit improves flexibility.Replacing the hardcoded value with a field makes the generation behavior more maintainable.
206-249: LGTM: Proper resource management in grammar-constrained generation.The try-finally block at Lines 222-242 ensures the grammar sampler is freed even if generation fails.
270-272: LGTM: Critical lifecycle documentation present.The comments at Lines 270-271 and 330-331 correctly warn about grammar ownership and double-free risks. This is essential given the complex native resource lifecycle.
Also applies to: 330-331
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/models/Tool.kt (4)
8-25: LGTM: Comprehensive type mapping.The enum covers standard JSON schema types with proper conversion to schema format strings.
89-111: LGTM: Clever use of JSON Schema "const" for validation.The schema at Lines 93-95 enforces exact tool name matching using the
constkeyword, which prevents the LLM from generating invalid tool names.
221-260: LGTM: Well-designed builder DSL.The type-specific helper methods (Lines 234-257) provide excellent developer ergonomics for defining tool parameters.
262-264: LGTM: Clean DSL entry point.The function follows standard Kotlin DSL patterns for builder construction.
| "generate_random_number" -> { | ||
| val min = toolCall.arguments["min"]?.toIntOrNull() ?: 0 | ||
| val max = toolCall.arguments["max"]?.toIntOrNull() ?: 100 | ||
| val random = Random.nextInt(min, max + 1) | ||
| "Generated: $random (range: $min-$max)" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate min/max and avoid overflow when including max.
Random.nextInt(from, until) is exclusive upper bound; adding 1 can overflow at Int.MAX_VALUE and throws if min > max.
- val min = toolCall.arguments["min"]?.toIntOrNull() ?: 0
- val max = toolCall.arguments["max"]?.toIntOrNull() ?: 100
- val random = Random.nextInt(min, max + 1)
+ val min = toolCall.arguments["min"]?.toIntOrNull() ?: 0
+ val max = toolCall.arguments["max"]?.toIntOrNull() ?: 100
+ if (min > max) return "Error: min must be <= max (got $min > $max)"
+ val upperExclusive = if (max == Int.MAX_VALUE) Int.MAX_VALUE else (max + 1)
+ val random = Random.nextInt(min, upperExclusive)
"Generated: $random (range: $min-$max)"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "generate_random_number" -> { | |
| val min = toolCall.arguments["min"]?.toIntOrNull() ?: 0 | |
| val max = toolCall.arguments["max"]?.toIntOrNull() ?: 100 | |
| val random = Random.nextInt(min, max + 1) | |
| "Generated: $random (range: $min-$max)" | |
| } | |
| "generate_random_number" -> { | |
| val min = toolCall.arguments["min"]?.toIntOrNull() ?: 0 | |
| val max = toolCall.arguments["max"]?.toIntOrNull() ?: 100 | |
| if (min > max) return "Error: min must be <= max (got $min > $max)" | |
| val upperExclusive = if (max == Int.MAX_VALUE) Int.MAX_VALUE else (max + 1) | |
| val random = Random.nextInt(min, upperExclusive) | |
| "Generated: $random (range: $min-$max)" | |
| } |
🤖 Prompt for AI Agents
In
examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/tools/ToolCallingViewModel.kt
around lines 222 to 227, validate the min/max arguments before calling
Random.nextInt: ensure min <= max (either return an error string or swap/clamp
values), handle the case where min == max by returning that fixed value, and
avoid overflow when computing the exclusive upper bound by not doing max + 1
blindly — if max == Int.MAX_VALUE, call Random.nextInt with an appropriate
exclusive bound (or use a branch that picks a random between min and max without
adding 1) or clamp the computed upper bound to a safe value; finally produce the
same result string format.
| class Grammar( | ||
| private val modelPointer: Long, | ||
| val handle: Long | ||
| ) : AutoCloseable { | ||
|
|
||
| private val logger = SDKLogger("Grammar") | ||
| private var freed = false | ||
|
|
||
| override fun close() { | ||
| if (!freed && handle != 0L) { | ||
| try { | ||
| GrammarBridge.freeGrammar(handle) | ||
| logger.debug("Grammar freed: $handle") | ||
| freed = true | ||
| } catch (e: Exception) { | ||
| logger.error("Error freeing grammar", e) | ||
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double‑free risk: Grammar.close() frees even after chain takes ownership.
C++ notes say the chain owns the grammar once added. Unconditional free here can SIGSEGV.
Add an ownership flag and a method to mark transfer before adding to the chain:
class Grammar(
- private val modelPointer: Long,
val handle: Long
) : AutoCloseable {
private val logger = SDKLogger("Grammar")
private var freed = false
+ @Volatile private var ownedByChain = false
+
+ /** Call this immediately before passing the handle into the sampler chain. */
+ fun markOwnedByChain() { ownedByChain = true }
override fun close() {
- if (!freed && handle != 0L) {
+ if (!freed && !ownedByChain && handle != 0L) {
try {
GrammarBridge.freeGrammar(handle)
logger.debug("Grammar freed: $handle")
freed = true
} catch (e: Exception) {
logger.error("Error freeing grammar", e)
}
}
}Also remove unused modelPointer param. Ensure callers invoke markOwnedByChain() right before LLamaAndroid.new_sampler_with_grammar(..., grammar.handle). Based on learnings.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class Grammar( | |
| private val modelPointer: Long, | |
| val handle: Long | |
| ) : AutoCloseable { | |
| private val logger = SDKLogger("Grammar") | |
| private var freed = false | |
| override fun close() { | |
| if (!freed && handle != 0L) { | |
| try { | |
| GrammarBridge.freeGrammar(handle) | |
| logger.debug("Grammar freed: $handle") | |
| freed = true | |
| } catch (e: Exception) { | |
| logger.error("Error freeing grammar", e) | |
| } | |
| } | |
| } | |
| class Grammar( | |
| val handle: Long | |
| ) : AutoCloseable { | |
| private val logger = SDKLogger("Grammar") | |
| private var freed = false | |
| @Volatile private var ownedByChain = false | |
| /** Call this immediately before passing the handle into the sampler chain. */ | |
| fun markOwnedByChain() { ownedByChain = true } | |
| override fun close() { | |
| if (!freed && !ownedByChain && handle != 0L) { | |
| try { | |
| GrammarBridge.freeGrammar(handle) | |
| logger.debug("Grammar freed: $handle") | |
| freed = true | |
| } catch (e: Exception) { | |
| logger.error("Error freeing grammar", e) | |
| } | |
| } | |
| } | |
🤖 Prompt for AI Agents
In
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/GrammarBridge.kt
around lines 82 to 101, the current close() unconditionally frees the grammar
which can cause a double-free when ownership is transferred to the chain; add a
private var ownedByChain = false, remove the unused modelPointer constructor
parameter, add a public markOwnedByChain() method that sets ownedByChain = true,
and change close() to only call GrammarBridge.freeGrammar(handle) if
!ownedByChain && !freed && handle != 0L (then set freed = true); update
documentation/comments to instruct callers to invoke markOwnedByChain()
immediately before calling LLamaAndroid.new_sampler_with_grammar(...,
grammar.handle).
| private val logger = SDKLogger("LLamaAndroid") | ||
|
|
||
| private val threadLocalState: ThreadLocal<State> = ThreadLocal.withInitial { State.Idle } | ||
| internal val threadLocalState: ThreadLocal<State> = ThreadLocal.withInitial { State.Idle } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Consider whether internal visibility is necessary.
Exposing threadLocalState and State as internal breaks encapsulation. Verify this is required for cross-module access and document the rationale if retained.
Run the following script to check usages outside this file:
Also applies to: 433-433
🏁 Script executed:
#!/bin/bash
# Search for references to threadLocalState or State outside LLamaAndroid.kt
rg -n --type=kt 'threadLocalState|State\.(Idle|Loaded)' -g '!**/LLamaAndroid.kt'Length of output: 26
🏁 Script executed:
#!/bin/bash
# Search for references to threadLocalState or State outside LLamaAndroid.kt
# Using glob pattern for .kt files instead of --type flag
rg -n --glob='*.kt' 'threadLocalState|State\.(Idle|Loaded)' -g '!**/LLamaAndroid.kt'Length of output: 0
🏁 Script executed:
#!/bin/bash
# Also inspect the State sealed interface and threadLocalState declaration in LLamaAndroid.kt
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | head -100Length of output: 3479
🏁 Script executed:
#!/bin/bash
# Search for any references to threadLocalState in the entire repository (any file type)
rg -n 'threadLocalState' Length of output: 1937
🏁 Script executed:
#!/bin/bash
# View State sealed interface definition around line 433
sed -n '420,445p' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.ktLength of output: 820
🏁 Script executed:
#!/bin/bash
# Search for any references to State (the sealed interface) outside LLamaAndroid.kt
rg -n 'State\s*=|sealed.*State|class State|interface State' --glob='*.kt' -g '!**/LLamaAndroid.kt'Length of output: 14299
Revert threadLocalState and State sealed interface visibility to private.
Search results confirm no external references to threadLocalState or the State sealed interface exist anywhere in the codebase. Both are used exclusively within LLamaAndroid.kt, making internal visibility unnecessary and a violation of encapsulation. Revert lines 31 and 433 from internal to private.
🤖 Prompt for AI Agents
In
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt
around line 31 and line 433, revert the visibility of threadLocalState and the
State sealed interface from internal to private; specifically change "internal
val threadLocalState: ThreadLocal<State> = ..." to "private val ..." and change
the sealed interface declaration at line 433 from "internal sealed interface
State" to "private sealed interface State", keeping all current implementations
and usages unchanged since both are only used within this file.
| kv_cache_clear(state.context) | ||
| } finally { | ||
| // Reset sampler state before freeing to clear grammar state | ||
| reset_sampler(samplerChain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify reset_sampler is required before free_sampler.
Calling reset_sampler before free_sampler at Line 325 is unusual. Confirm this order is necessary for proper grammar state cleanup in the native implementation.
Run the following script to examine the native reset_sampler implementation:
🏁 Script executed:
#!/bin/bash
# Check the native implementation of reset_sampler
fd -e cpp -e c grammar_jni | xargs rg -A10 -B2 'reset_sampler'Length of output: 0
🏁 Script executed:
#!/bin/bash
# Search for the actual Kotlin file and examine context around line 325
find . -name "LLamaAndroid.kt" -type f | head -1 | xargs -I {} sh -c 'echo "=== File: {} ===" && head -n 330 {} | tail -n 20'Length of output: 1384
🏁 Script executed:
#!/bin/bash
# Search for free_sampler to understand the cleanup sequence
rg -n 'free_sampler' -A3 -B3 --type kotlinLength of output: 5210
🏁 Script executed:
#!/bin/bash
# Search more broadly for reset_sampler in the codebase
rg -n 'reset_sampler' -C5 --type kotlin --type java --type cpp --type cLength of output: 4045
🏁 Script executed:
#!/bin/bash
# Look for external library or JNI bindings for llama.cpp
find . -name "*llama*" -o -name "*sampler*" | grep -E '\.(kt|java|cpp|c|h|so|jar)$' | head -20Length of output: 115
🏁 Script executed:
#!/bin/bash
# Search for reset_sampler and free_sampler implementations in native code
rg -n 'reset_sampler\|free_sampler' -A15 -B2 ./sdk/runanywhere-kotlin/native/llama-jni/Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check if reset is called inside free_sampler already (redundancy check)
rg -n 'llama_sampler_free\|sampler_free' -A20 -B2 ./sdk/runanywhere-kotlin/native/llama-jni/Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check the native source files directly
cat ./sdk/runanywhere-kotlin/native/llama-jni/src/llama-android.cpp | head -100Length of output: 3199
🏁 Script executed:
#!/bin/bash
# Search for JNI method implementations
rg -n 'Java_.*reset_sampler\|Java_.*free_sampler' ./sdk/runanywhere-kotlin/native/llama-jni/Length of output: 0
🏁 Script executed:
#!/bin/bash
# Look for any sampler or grammar related code in native files
rg -in 'sampler\|grammar' -C3 ./sdk/runanywhere-kotlin/native/llama-jni/ | head -80Length of output: 0
🏁 Script executed:
#!/bin/bash
# List all files in native directory
find ./sdk/runanywhere-kotlin/native -type f -name "*.cpp" -o -name "*.c" -o -name "*.h" | head -30Length of output: 409
🏁 Script executed:
#!/bin/bash
# Check entire llama-android.cpp for full content
wc -l ./sdk/runanywhere-kotlin/native/llama-jni/src/llama-android.cppLength of output: 67
🏁 Script executed:
#!/bin/bash
# Search for any occurrence of reset or sampler in all cpp/c/h files
find ./sdk/runanywhere-kotlin/native -type f \( -name "*.cpp" -o -name "*.c" -o -name "*.h" \) -exec grep -l "reset_sampler\|free_sampler" {} \;Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check if llama.h defines these functions
grep -r "reset_sampler\|free_sampler" ./sdk/runanywhere-kotlin/native/ 2>/dev/null | head -20Length of output: 132
🏁 Script executed:
#!/bin/bash
# Read the implementation notes
cat ./sdk/runanywhere-kotlin/native/llama-jni/GRAMMAR_IMPLEMENTATION_NOTES.mdLength of output: 14091
🏁 Script executed:
#!/bin/bash
# Search for reset_sampler and free_sampler in grammar_jni.cpp
rg -n 'reset_sampler\|free_sampler' -A10 -B5 ./sdk/runanywhere-kotlin/native/llama-jni/src/grammar_jni.cppLength of output: 0
🏁 Script executed:
#!/bin/bash
# Check if there are llama.cpp sampler functions being used
rg -n 'llama_sampler' ./sdk/runanywhere-kotlin/native/llama-jni/src/grammar_jni.cpp | head -30Length of output: 1193
🏁 Script executed:
#!/bin/bash
# Search entire llama-android.cpp for reset_sampler implementation
grep -n "reset_sampler" ./sdk/runanywhere-kotlin/native/llama-jni/src/llama-android.cppLength of output: 0
🏁 Script executed:
#!/bin/bash
# Check the context around line 325 in LLamaAndroid.kt to see if grammar is still active
sed -n '200,350p' ./sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | grep -n -A20 "reset_sampler\|grammar"Length of output: 6465
🏁 Script executed:
#!/bin/bash
# Look for any active grammar usage or calls in the kotlin file
rg -n 'grammar\|Grammar' ./sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.ktLength of output: 0
reset_sampler() JNI method is not implemented in native code, causing runtime failure.
The reset_sampler(samplerChain) call at line 325 references a JNI method that has no native implementation. Searching llama-android.cpp yields no Java_com_runanywhere_sdk_llm_llamacpp_LLamaAndroid_reset_1sampler binding. Calling this will crash with UnsatisfiedLinkError at runtime.
Additionally, the grammar-based implementation containing this code is marked as DEPRECATED in GRAMMAR_IMPLEMENTATION_NOTES.md and was abandoned in favor of a prompt-based approach due to "llama.cpp grammar bugs causing SIGABRT crashes."
Either implement the missing JNI binding for reset_sampler(), or remove the deprecated sendWithGrammarSchema() function and all grammar-related code paths entirely.
| suspend fun generateWithTools( | ||
| prompt: String, | ||
| tools: List<Tool>? = null, | ||
| options: RunAnywhereGenerationOptions? = null | ||
| ): ToolCallResult = withContext(Dispatchers.IO) { | ||
| if (!isInitialized) { | ||
| throw IllegalStateException("LlamaCppService not initialized") | ||
| } | ||
|
|
||
| // Get tools from parameter or options (parameter takes precedence) | ||
| val effectiveTools = tools ?: options?.tools ?: emptyList() | ||
|
|
||
| if (effectiveTools.isEmpty()) { | ||
| logger.warn("No tools provided for generateWithTools()") | ||
| return@withContext ToolCallResult( | ||
| success = false, | ||
| text = "Error: No tools provided", | ||
| toolCalls = emptyList(), | ||
| mode = ToolCallingMode.GRAMMAR_BASED | ||
| ) | ||
| } | ||
|
|
||
| logger.info("🛠️ generateWithTools() called with ${effectiveTools.size} tools") | ||
| logger.info("📊 LlamaCppService state - isInitialized: $isInitialized, modelPath: $modelPath") | ||
|
|
||
| // Ensure service is initialized | ||
| if (!isInitialized) { | ||
| throw IllegalStateException("LlamaCppService not initialized. Model must be loaded before calling generateWithTools().") | ||
| } | ||
|
|
||
| try { | ||
| // Generate JSON schema from tools | ||
| val jsonSchema = Tool.generateSchema(effectiveTools) | ||
|
|
||
| // Convert to JSON string manually (avoid serialization issues with Any type) | ||
| val schemaJson = buildJsonString(jsonSchema) | ||
|
|
||
| logger.info("📋 JSON Schema generated: ${schemaJson.take(200)}...") | ||
|
|
||
| // Build messages with tool instruction | ||
| val userMessage = Message(role = MessageRole.USER, content = prompt) | ||
| val systemMessage = Message( | ||
| role = MessageRole.SYSTEM, | ||
| content = buildToolSystemPrompt(effectiveTools) | ||
| ) | ||
|
|
||
| val formattedPrompt = buildPrompt( | ||
| listOf(systemMessage, userMessage), | ||
| systemPrompt = null | ||
| ) | ||
|
|
||
| logger.info("📝 Tool calling prompt length: ${formattedPrompt.length}") | ||
|
|
||
| // Generate with grammar constraints | ||
| val result = StringBuilder() | ||
| val effectiveOptions = options ?: RunAnywhereGenerationOptions( | ||
| maxTokens = configuration.maxTokens, | ||
| temperature = 0.7f | ||
| ) | ||
|
|
||
| val config = LlamaModelConfig( | ||
| temperature = effectiveOptions.temperature, | ||
| topK = 40, | ||
| minP = 0.05f | ||
| ) | ||
|
|
||
| // Use sendWithGrammarSchema which handles grammar creation and generation | ||
| // on the same thread, avoiding thread-local state issues | ||
| llama.sendWithGrammarSchema( | ||
| formattedPrompt, | ||
| schemaJson, | ||
| config, | ||
| parseSpecialTokens = true | ||
| ).collect { token -> | ||
| result.append(token) | ||
| } | ||
|
|
||
| val generatedText = result.toString() | ||
| logger.info("✅ Generation complete, response length: ${generatedText.length}") | ||
| logger.info("📄 Response: ${generatedText.take(300)}...") | ||
|
|
||
| // Parse tool calls from JSON response | ||
| val toolCalls = parseToolCalls(generatedText) | ||
|
|
||
| ToolCallResult( | ||
| success = true, | ||
| text = generatedText, | ||
| toolCalls = toolCalls, | ||
| mode = ToolCallingMode.GRAMMAR_BASED | ||
| ) | ||
|
|
||
| } catch (e: Exception) { | ||
| logger.error("❌ Tool calling failed", e) | ||
| ToolCallResult( | ||
| success = false, | ||
| text = "Error: ${e.message}", | ||
| toolCalls = emptyList(), | ||
| mode = ToolCallingMode.GRAMMAR_BASED | ||
| ) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Enforce maxTokens in grammar-based generation.
sendWithGrammarSchema collection path doesn’t stop at a token budget; add a counter or pass a limit if the native API supports it. Prevents long-running loops when grammar allows large outputs.
- val result = StringBuilder()
+ val result = StringBuilder()
+ var tokenCount = 0
+ val maxTokens = effectiveOptions.maxTokens
...
- ).collect { token ->
- result.append(token)
- }
+ ).collect { token ->
+ result.append(token)
+ if (++tokenCount >= maxTokens) {
+ // TODO: call llama.cancel if available; otherwise rely on producer completion
+ return@collect
+ }
+ }If LLamaAndroid.sendWithGrammarSchema already supports a length limit, prefer passing it explicitly instead.
🏁 Script executed:
#!/bin/bash
rg -nC2 'sendWithGrammarSchema\('Length of output: 2288
🏁 Script executed:
#!/bin/bash
# Get the full sendWithGrammarSchema function signature and implementation
rg -nA30 'fun sendWithGrammarSchema\(' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.ktLength of output: 1761
🏁 Script executed:
#!/bin/bash
# Inspect LlamaModelConfig class
rg -nB2 -A10 'class LlamaModelConfig\|data class LlamaModelConfig' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check if there are other usages of sendWithGrammarSchema to understand token limit handling
rg -nB5 -A15 'sendWithGrammarSchema' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt | head -50Length of output: 954
🏁 Script executed:
#!/bin/bash
# Search for token counting or max tokens handling in LLamaAndroid
rg -nP '(tokenCount|maxTokens|token.*limit)' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.ktLength of output: 1399
🏁 Script executed:
#!/bin/bash
# Look for any existing token limit enforcement in the sendWithGrammarSchema implementation
rg -nA50 'fun sendWithGrammarSchema\(' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | grep -E '(maxToken|tokenCount|token.*limit|take\(|take\s)'Length of output: 869
🏁 Script executed:
#!/bin/bash
# Get the full implementation to see if token limiting is built-in
sed -n '261,320p' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.ktLength of output: 3179
🏁 Script executed:
#!/bin/bash
# Check the native binding to see if it supports maxTokens parameter
rg -nC3 'external.*sendWithGrammarSchema|sendWithGrammarSchema.*=' sdk/runanywhere-kotlin/native/llama-jni/src/Length of output: 0
🏁 Script executed:
#!/bin/bash
# Find the actual native JNI binding or pure Kotlin implementation
fd -t f 'LLamaAndroid\.kt' -x grep -l 'sendWithGrammarSchema' {}Length of output: 132
🏁 Script executed:
#!/bin/bash
# Look at complete sendWithGrammarSchema function including return type and full body
ast-grep --pattern 'fun sendWithGrammarSchema($_, $_) { $$$ }'Length of output: 0
🏁 Script executed:
#!/bin/bash
# Simpler approach: get lines 261 to 320 to see full function
wc -l sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt && sed -n '261,350p' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.ktLength of output: 4605
🏁 Script executed:
#!/bin/bash
# Search for the close of sendWithGrammarSchema function to understand its full scope
rg -n 'fun sendWithGrammarSchema' -A100 sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | head -120Length of output: 5224
🏁 Script executed:
#!/bin/bash
# Check for token limit in Flow.take() or similar limiting operations
rg -nC5 'fun sendWithGrammarSchema' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.ktLength of output: 488
🏁 Script executed:
#!/bin/bash
# List all the functions in LLamaAndroid to understand structure
rg -n 'fun ' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | head -30Length of output: 1691
🏁 Script executed:
#!/bin/bash
# Get complete sendWithGrammarSchema by looking for the function definition and understanding its scope
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | grep -n 'fun sendWithGrammarSchema' -A 80 | head -85Length of output: 4453
🏁 Script executed:
#!/bin/bash
# Check if any Flow limiting (like take, takeWhile) is applied
rg 'sendWithGrammarSchema|\.take\(|\.takeWhile' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt -C3Length of output: 2667
🏁 Script executed:
#!/bin/bash
# Look for generateWithTools implementation details around token limiting
sed -n '362,463p' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt | grep -nE '(maxToken|tokenCount|\.take\(|maxToken)'Length of output: 211
🏁 Script executed:
#!/bin/bash
# Check the exact parameters passed to sendWithGrammarSchema
sed -n '430,436p' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.ktLength of output: 237
🏁 Script executed:
#!/bin/bash
# Find all properties of LlamaModelConfig
rg -n 'class LlamaModelConfig|data class LlamaModelConfig' -A 15 sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/Length of output: 2602
🏁 Script executed:
#!/bin/bash
# Search across entire project for LlamaModelConfig definition
fd -t f '\.kt$' sdk/runanywhere-kotlin/ -x rg -l 'class LlamaModelConfig|data class LlamaModelConfig' {}Length of output: 130
🏁 Script executed:
#!/bin/bash
# Direct lookup of LlamaModelConfig
rg 'LlamaModelConfig' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt -B2 -A8 | head -40Length of output: 820
🏁 Script executed:
#!/bin/bash
# Check what files exist in the directory
ls -la sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/Length of output: 389
🏁 Script executed:
#!/bin/bash
# Get complete LLamaAndroid.kt file to understand structure
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | head -400Length of output: 16833
🏁 Script executed:
#!/bin/bash
# Simpler: just extract lines around sendWithGrammarSchema definition
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | sed -n '250,350p'Length of output: 4943
🏁 Script executed:
#!/bin/bash
# Look at the entire generateWithTools function to understand current structure
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt | sed -n '362,463p' | grep -E '(maxToken|tokenCount|\.take|sendWithGrammarSchema)' -B2 -A2Length of output: 955
🏁 Script executed:
#!/bin/bash
# Get full context of effectiveOptions creation and maxTokens usage
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt | sed -n '415,440p'Length of output: 993
🏁 Script executed:
#!/bin/bash
# Check the full LLamaAndroid.kt around sendWithGrammarSchema
head -n 300 sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | tail -n 60Length of output: 2826
🏁 Script executed:
#!/bin/bash
# Try to get the function signature and its parameters fully
rg 'fun sendWithGrammarSchema' -A5 sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.ktLength of output: 197
🏁 Script executed:
#!/bin/bash
# Find where maxTokens is used in LlamaCppService
rg 'maxTokens' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt -nLength of output: 695
🏁 Script executed:
#!/bin/bash
# Check if sendWithGrammarSchema supports any token limit parameter - look at actual signature
python3 << 'EOF'
import re
# Read the file
with open('sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt', 'r') as f:
content = f.read()
# Find the sendWithGrammarSchema function
match = re.search(r'(fun\s+sendWithGrammarSchema\s*\([^)]*\)[^{]*\{)', content, re.DOTALL)
if match:
print("Function signature found:")
print(match.group(1)[:300])
else:
print("Function not found")
EOFLength of output: 219
🏁 Script executed:
#!/bin/bash
# Get complete sendWithGrammarSchema function definition
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | sed -n '261,310p'Length of output: 2562
🏁 Script executed:
#!/bin/bash
# Check generateWithTools for token limiting around sendWithGrammarSchema call
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt | sed -n '420,450p'Length of output: 1110
🏁 Script executed:
#!/bin/bash
# Search for LlamaModelConfig definition
rg -n 'data class LlamaModelConfig|class LlamaModelConfig' -A12 sdk/runanywhere-kotlin/Length of output: 2053
🏁 Script executed:
#!/bin/bash
# Check if .take() or token limiting is applied after sendWithGrammarSchema call
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt | sed -n '425,442p'Length of output: 665
Enforce maxTokens in tool calling flow; align with generate() pattern.
The generateWithTools() method creates effectiveOptions.maxTokens (line 418) but never enforces it. Unlike generate() and streamGenerate() which count tokens in their collectors, the generateWithTools() collector appends all tokens without checking the limit.
Although sendWithGrammarSchema() uses a hardcoded maxTokensToGenerate = 512 and enforces a 2000-character safety limit, generateWithTools() should also implement caller-level token counting for consistency and to respect the configured effectiveOptions.maxTokens:
- ).collect { token ->
- result.append(token)
+ val maxTokens = effectiveOptions.maxTokens
+ var tokenCount = 0
+ ).collect { token ->
+ result.append(token)
+ if (++tokenCount >= maxTokens) {
+ // Stop respecting caller's maxTokens preference
+ return@collect
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| suspend fun generateWithTools( | |
| prompt: String, | |
| tools: List<Tool>? = null, | |
| options: RunAnywhereGenerationOptions? = null | |
| ): ToolCallResult = withContext(Dispatchers.IO) { | |
| if (!isInitialized) { | |
| throw IllegalStateException("LlamaCppService not initialized") | |
| } | |
| // Get tools from parameter or options (parameter takes precedence) | |
| val effectiveTools = tools ?: options?.tools ?: emptyList() | |
| if (effectiveTools.isEmpty()) { | |
| logger.warn("No tools provided for generateWithTools()") | |
| return@withContext ToolCallResult( | |
| success = false, | |
| text = "Error: No tools provided", | |
| toolCalls = emptyList(), | |
| mode = ToolCallingMode.GRAMMAR_BASED | |
| ) | |
| } | |
| logger.info("🛠️ generateWithTools() called with ${effectiveTools.size} tools") | |
| logger.info("📊 LlamaCppService state - isInitialized: $isInitialized, modelPath: $modelPath") | |
| // Ensure service is initialized | |
| if (!isInitialized) { | |
| throw IllegalStateException("LlamaCppService not initialized. Model must be loaded before calling generateWithTools().") | |
| } | |
| try { | |
| // Generate JSON schema from tools | |
| val jsonSchema = Tool.generateSchema(effectiveTools) | |
| // Convert to JSON string manually (avoid serialization issues with Any type) | |
| val schemaJson = buildJsonString(jsonSchema) | |
| logger.info("📋 JSON Schema generated: ${schemaJson.take(200)}...") | |
| // Build messages with tool instruction | |
| val userMessage = Message(role = MessageRole.USER, content = prompt) | |
| val systemMessage = Message( | |
| role = MessageRole.SYSTEM, | |
| content = buildToolSystemPrompt(effectiveTools) | |
| ) | |
| val formattedPrompt = buildPrompt( | |
| listOf(systemMessage, userMessage), | |
| systemPrompt = null | |
| ) | |
| logger.info("📝 Tool calling prompt length: ${formattedPrompt.length}") | |
| // Generate with grammar constraints | |
| val result = StringBuilder() | |
| val effectiveOptions = options ?: RunAnywhereGenerationOptions( | |
| maxTokens = configuration.maxTokens, | |
| temperature = 0.7f | |
| ) | |
| val config = LlamaModelConfig( | |
| temperature = effectiveOptions.temperature, | |
| topK = 40, | |
| minP = 0.05f | |
| ) | |
| // Use sendWithGrammarSchema which handles grammar creation and generation | |
| // on the same thread, avoiding thread-local state issues | |
| llama.sendWithGrammarSchema( | |
| formattedPrompt, | |
| schemaJson, | |
| config, | |
| parseSpecialTokens = true | |
| ).collect { token -> | |
| result.append(token) | |
| } | |
| val generatedText = result.toString() | |
| logger.info("✅ Generation complete, response length: ${generatedText.length}") | |
| logger.info("📄 Response: ${generatedText.take(300)}...") | |
| // Parse tool calls from JSON response | |
| val toolCalls = parseToolCalls(generatedText) | |
| ToolCallResult( | |
| success = true, | |
| text = generatedText, | |
| toolCalls = toolCalls, | |
| mode = ToolCallingMode.GRAMMAR_BASED | |
| ) | |
| } catch (e: Exception) { | |
| logger.error("❌ Tool calling failed", e) | |
| ToolCallResult( | |
| success = false, | |
| text = "Error: ${e.message}", | |
| toolCalls = emptyList(), | |
| mode = ToolCallingMode.GRAMMAR_BASED | |
| ) | |
| } | |
| } | |
| suspend fun generateWithTools( | |
| prompt: String, | |
| tools: List<Tool>? = null, | |
| options: RunAnywhereGenerationOptions? = null | |
| ): ToolCallResult = withContext(Dispatchers.IO) { | |
| if (!isInitialized) { | |
| throw IllegalStateException("LlamaCppService not initialized") | |
| } | |
| // Get tools from parameter or options (parameter takes precedence) | |
| val effectiveTools = tools ?: options?.tools ?: emptyList() | |
| if (effectiveTools.isEmpty()) { | |
| logger.warn("No tools provided for generateWithTools()") | |
| return@withContext ToolCallResult( | |
| success = false, | |
| text = "Error: No tools provided", | |
| toolCalls = emptyList(), | |
| mode = ToolCallingMode.GRAMMAR_BASED | |
| ) | |
| } | |
| logger.info("🛠️ generateWithTools() called with ${effectiveTools.size} tools") | |
| logger.info("📊 LlamaCppService state - isInitialized: $isInitialized, modelPath: $modelPath") | |
| // Ensure service is initialized | |
| if (!isInitialized) { | |
| throw IllegalStateException("LlamaCppService not initialized. Model must be loaded before calling generateWithTools().") | |
| } | |
| try { | |
| // Generate JSON schema from tools | |
| val jsonSchema = Tool.generateSchema(effectiveTools) | |
| // Convert to JSON string manually (avoid serialization issues with Any type) | |
| val schemaJson = buildJsonString(jsonSchema) | |
| logger.info("📋 JSON Schema generated: ${schemaJson.take(200)}...") | |
| // Build messages with tool instruction | |
| val userMessage = Message(role = MessageRole.USER, content = prompt) | |
| val systemMessage = Message( | |
| role = MessageRole.SYSTEM, | |
| content = buildToolSystemPrompt(effectiveTools) | |
| ) | |
| val formattedPrompt = buildPrompt( | |
| listOf(systemMessage, userMessage), | |
| systemPrompt = null | |
| ) | |
| logger.info("📝 Tool calling prompt length: ${formattedPrompt.length}") | |
| // Generate with grammar constraints | |
| val result = StringBuilder() | |
| val effectiveOptions = options ?: RunAnywhereGenerationOptions( | |
| maxTokens = configuration.maxTokens, | |
| temperature = 0.7f | |
| ) | |
| val config = LlamaModelConfig( | |
| temperature = effectiveOptions.temperature, | |
| topK = 40, | |
| minP = 0.05f | |
| ) | |
| // Use sendWithGrammarSchema which handles grammar creation and generation | |
| // on the same thread, avoiding thread-local state issues | |
| val maxTokens = effectiveOptions.maxTokens | |
| var tokenCount = 0 | |
| llama.sendWithGrammarSchema( | |
| formattedPrompt, | |
| schemaJson, | |
| config, | |
| parseSpecialTokens = true | |
| ).collect { token -> | |
| result.append(token) | |
| if (++tokenCount >= maxTokens) { | |
| // Stop respecting caller's maxTokens preference | |
| return@collect | |
| } | |
| } | |
| val generatedText = result.toString() | |
| logger.info("✅ Generation complete, response length: ${generatedText.length}") | |
| logger.info("📄 Response: ${generatedText.take(300)}...") | |
| // Parse tool calls from JSON response | |
| val toolCalls = parseToolCalls(generatedText) | |
| ToolCallResult( | |
| success = true, | |
| text = generatedText, | |
| toolCalls = toolCalls, | |
| mode = ToolCallingMode.GRAMMAR_BASED | |
| ) | |
| } catch (e: Exception) { | |
| logger.error("❌ Tool calling failed", e) | |
| ToolCallResult( | |
| success = false, | |
| text = "Error: ${e.message}", | |
| toolCalls = emptyList(), | |
| mode = ToolCallingMode.GRAMMAR_BASED | |
| ) | |
| } | |
| } |
| private val logger = SDKLogger("PromptBasedToolCalling") | ||
| private val promptBuilder = PromptBuilder() | ||
| private val responseParser = ResponseParser() | ||
| private val deduplicator = Deduplicator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify thread-safety of Deduplicator.
If multiple coroutines invoke generateWithTools concurrently, the shared deduplicator instance could have race conditions. Confirm that Deduplicator is thread-safe or consider synchronization.
Run the following script to examine the Deduplicator implementation:
Also applies to: 71-71, 243-243
🏁 Script executed:
#!/bin/bash
# Check Deduplicator implementation for thread-safety annotations or synchronization
fd -e kt Deduplicator | xargs catLength of output: 2622
Thread-safety issue confirmed: Deduplicator is not thread-safe.
The Deduplicator class uses a plain mutableSetOf<String>() with no synchronization mechanisms (no locks, @Volatile, or thread-safe collections). The isDuplicate() method performs unsafe read-check-write operations: two concurrent coroutines could both read seenToolCalls simultaneously, both see the hash as new, and both add it. Additionally, the excess removal logic (seenToolCalls.drop() followed by remove()) is vulnerable to ConcurrentModificationException.
Required fix: Synchronize access to seenToolCalls using either:
Collections.synchronizedSet()to wrap the mutable set, orsynchronizedblocks around all read-write operations inisDuplicate(),clear(), andsize()
Also applies to: 71-71, 243-243
| if (deduplicator.isDuplicate(parseResult.name, parseResult.arguments)) { | ||
| logger.warn("Duplicate tool call detected, treating as regular response") | ||
| return ToolCallResult( | ||
| success = true, | ||
| text = "I've already attempted that action. Let me try a different approach: $response", | ||
| toolCalls = emptyList(), | ||
| mode = ToolCallingMode.PROMPT_BASED | ||
| ) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider excluding raw response from user-visible text.
Including the raw response in the user-facing message at Line 75 could expose internal model outputs, malformed JSON, or debug information.
Apply this diff to sanitize the output:
- text = "I've already attempted that action. Let me try a different approach: $response",
+ text = "I've already attempted that action. Let me try a different approach.",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (deduplicator.isDuplicate(parseResult.name, parseResult.arguments)) { | |
| logger.warn("Duplicate tool call detected, treating as regular response") | |
| return ToolCallResult( | |
| success = true, | |
| text = "I've already attempted that action. Let me try a different approach: $response", | |
| toolCalls = emptyList(), | |
| mode = ToolCallingMode.PROMPT_BASED | |
| ) | |
| } | |
| if (deduplicator.isDuplicate(parseResult.name, parseResult.arguments)) { | |
| logger.warn("Duplicate tool call detected, treating as regular response") | |
| return ToolCallResult( | |
| success = true, | |
| text = "I've already attempted that action. Let me try a different approach.", | |
| toolCalls = emptyList(), | |
| mode = ToolCallingMode.PROMPT_BASED | |
| ) | |
| } |
🤖 Prompt for AI Agents
In
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBasedToolCalling.kt
around lines 71 to 79, remove inclusion of the raw `response` in the user-facing
message and replace it with a sanitized or generic phrase (e.g., "I'll try a
different approach" or a short, sanitized summary of the previous attempt); keep
the detailed raw `response` only in an internal debug log (logger.debug or
logger.trace) so developers can inspect it without exposing it to users, and
ensure any summary is sanitized to strip or escape malformed JSON or sensitive
tokens before display.
| is ParseResult.InvalidFormat -> { | ||
| lastError = parseResult.reason | ||
| logger.warn("Invalid format (attempt ${retryCount + 1}/$maxRetries): ${parseResult.reason}") | ||
|
|
||
| if (retryCount < maxRetries) { | ||
| retryCount++ | ||
| // Add error feedback to help model self-correct | ||
| val errorFeedback = buildErrorFeedback(parseResult.reason, parseResult.rawResponse) | ||
| // We could append this to the conversation and retry, but for now just retry | ||
| continue | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error feedback is built but never used.
The buildErrorFeedback result is computed but not passed to subsequent retry attempts. The retry simply reuses the same fullPrompt, so the model won't receive self-correction guidance.
Consider either:
- Appending
errorFeedbackto the prompt for the next iteration, or - Removing the unused
buildErrorFeedbackcall to avoid confusion.
if (retryCount < maxRetries) {
retryCount++
- // Add error feedback to help model self-correct
- val errorFeedback = buildErrorFeedback(parseResult.reason, parseResult.rawResponse)
- // We could append this to the conversation and retry, but for now just retry
continue
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| is ParseResult.InvalidFormat -> { | |
| lastError = parseResult.reason | |
| logger.warn("Invalid format (attempt ${retryCount + 1}/$maxRetries): ${parseResult.reason}") | |
| if (retryCount < maxRetries) { | |
| retryCount++ | |
| // Add error feedback to help model self-correct | |
| val errorFeedback = buildErrorFeedback(parseResult.reason, parseResult.rawResponse) | |
| // We could append this to the conversation and retry, but for now just retry | |
| continue | |
| } | |
| is ParseResult.InvalidFormat -> { | |
| lastError = parseResult.reason | |
| logger.warn("Invalid format (attempt ${retryCount + 1}/$maxRetries): ${parseResult.reason}") | |
| if (retryCount < maxRetries) { | |
| retryCount++ | |
| continue | |
| } |
|
|
||
| import com.runanywhere.sdk.models.Tool | ||
| import com.runanywhere.sdk.models.ToolParameter | ||
| import kotlinx.serialization.json.Json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid manual JSON; ensure proper escaping.
String‑building will break on quotes/backslashes and forces all values to strings. Use kotlinx.serialization’s JsonObject builder.
Apply:
-import kotlinx.serialization.json.Json
+import kotlinx.serialization.json.buildJsonObject
+import kotlinx.serialization.json.put
@@
- private fun formatExampleCall(toolName: String, arguments: Map<String, String>): String {
- // Manual JSON formatting for simple cases
- val argsJson = arguments.entries.joinToString(", ") { (key, value) ->
- """"$key": "$value""""
- }
- return """{"name": "$toolName", "arguments": {$argsJson}}"""
- }
+ private fun formatExampleCall(toolName: String, arguments: Map<String, String>): String {
+ val argsObj = buildJsonObject {
+ arguments.forEach { (k, v) -> put(k, v) }
+ }
+ val obj = buildJsonObject {
+ put("name", toolName)
+ put("arguments", argsObj)
+ }
+ return obj.toString()
+ }Also applies to: 90-96
🤖 Prompt for AI Agents
In
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBuilder.kt
around lines 5 and 90-96, the code manually builds JSON strings which breaks on
quotes/backslashes and coerces everything to strings; replace the manual
string-building with kotlinx.serialization.json builders: add imports for
kotlinx.serialization.json.buildJsonObject, buildJsonArray, JsonPrimitive (and
Json if not already imported), construct the payloads using buildJsonObject {
put("key", JsonPrimitive(value)) / put("nested", buildJsonObject { ... }) /
put("arr", buildJsonArray { add(JsonPrimitive(...)) }) so types are preserved
and escaping is automatic, then serialize with Json.encodeToString(JsonObject)
(or Json.encodeToString on the built object) where needed; remove the string
concatenation code and ensure arrays/objects use the appropriate builders.
| logger.debug("Parsing response: ${response.take(200)}...") | ||
|
|
||
| // Strategy 1: Try direct JSON parse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log hygiene: response may contain PII; keep debug and cap length.
You already cap to 200 chars; ensure logger config keeps this at debug only and redact if needed.
🤖 Prompt for AI Agents
In
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/ResponseParser.kt
around lines 31 to 33, the debug log of the response risks emitting PII even
though it's capped; update the code so the response string is only constructed
when debug logging is enabled (use logger.isDebugEnabled or equivalent) and
apply a redaction function that strips or masks known sensitive fields before
truncating to 200 chars, then log the redacted, capped string; ensure the logger
call stays at debug level and do not perform the substring/templating work when
debug is disabled.
This replaces the grammar-based approach which was crashing (issue #182). **New Implementation:** - PromptBuilder: Builds system prompts with tool definitions and examples - ResponseParser: Extracts JSON with 3 fallback strategies (direct, code blocks, embedded) - Deduplicator: Prevents infinite loops via hash-based tracking - PromptBasedToolCalling: Main orchestrator with retry logic (up to 2 retries) **API Changes:** - Deprecated: generateWithTools() - grammar-based (crashes after ~10 tokens) - New: generateWithToolsPromptBased() - production-ready prompt-based approach - Added: Tool.withExamples() - DSL for adding few-shot examples - Added: ToolExample data class for few-shot learning **Example Usage:** ```kotlin val calculator = tool("calculate", "Perform math") { stringParameter("expression", "Math expression", required = true) }.withExamples( example("What is 15 * 7?", mapOf("expression" to "15 * 7")) ) val result = RunAnywhere.generateWithToolsPromptBased( prompt = "Calculate 123 + 456", tools = listOf(calculator), options = RunAnywhereGenerationOptions(temperature = 0.3f) ) ``` **Success Rate:** 85-95% with Qwen 2.5 0.5B (vs 70-85% with crashes for grammar) **Android Example Updated:** - ToolCallingViewModel now uses prompt-based approach - Added few-shot examples to all demo tools (time, calculator, random number) - Updated UI descriptions **Files Added:** - PromptBuilder.kt - System prompt generation with few-shot examples - ResponseParser.kt - JSON extraction with multiple fallback strategies - Deduplicator.kt - Hash-based duplicate detection - PromptBasedToolCalling.kt - Main orchestrator with retry logic - PromptBasedToolCallingExtensions.kt - Public API extensions **Documentation:** - Removed old TOOL_CALLING.md (contained deprecated grammar approach) - All grammar code preserved with issue #182 references **Related:** Issue #182
bc017e4 to
1f7774b
Compare
Description
Brief description of the changes made.
Type of Change
Testing
Labels
Please add the appropriate label(s):
iOS SDK- Changes to iOS/Swift SDKAndroid SDK- Changes to Android/Kotlin SDKiOS Sample- Changes to iOS example appAndroid Sample- Changes to Android example appChecklist
Screenshots - Attach all the relevant UI changes screenshots for iOS/Android and MacOS/Tablet/large screen sizes
Summary by CodeRabbit
Release Notes
New Features
Documentation