Skip to content

Conversation

@sanchitmonga22
Copy link
Contributor

@sanchitmonga22 sanchitmonga22 commented Oct 25, 2025

  • Introduced Tool and ToolCall models for defining and executing tools.
  • Implemented grammar-based constrained generation in LlamaCppService.
  • Added RunAnywhere.generateWithTools() for generating text with tool calls.
  • Created GrammarBridge for JNI integration with grammar functions.
  • Updated GenerationOptions to include tools and tool calling mode.
  • Added example usage in documentation for clarity.

Description

Brief description of the changes made.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Refactoring

Testing

  • Tests pass locally
  • Tested on Macbook if swift changes
  • Tested on Tablet/iPad if swift changes
  • Added/updated tests for changes

Labels

Please add the appropriate label(s):

  • iOS SDK - Changes to iOS/Swift SDK
  • Android SDK - Changes to Android/Kotlin SDK
  • iOS Sample - Changes to iOS example app
  • Android Sample - Changes to Android example app

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Documentation updated (if needed)

Screenshots - Attach all the relevant UI changes screenshots for iOS/Android and MacOS/Tablet/large screen sizes

Summary by CodeRabbit

Release Notes

  • New Features

    • Added tool calling functionality, enabling the LLM to invoke predefined tools and functions within conversations.
    • Added a new Tools tab to the Android demo application showcasing tool calling in action.
    • Introduced prompt-based tool calling as the primary approach for production use.
  • Documentation

    • Updated Android Quick Start guide with local AAR installation instructions and improved dependency management.

- Introduced Tool and ToolCall models for defining and executing tools.
- Implemented grammar-based constrained generation in LlamaCppService.
- Added RunAnywhere.generateWithTools() for generating text with tool calls.
- Created GrammarBridge for JNI integration with grammar functions.
- Updated GenerationOptions to include tools and tool calling mode.
- Added example usage in documentation for clarity.
@coderabbitai
Copy link

coderabbitai bot commented Oct 25, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR introduces tool-calling capabilities to the RunAnywhere SDK, transitioning from a deprecated grammar-based approach to a prompt-based implementation. It adds Tool model definitions, orchestration components, Android UI examples, native JNI bindings for grammar support, and comprehensive documentation including troubleshooting and setup guides.

Changes

Cohort / File(s) Summary
IDE & Project Configuration
.idea/vcs.xml
Expanded VCS directory mappings by adding multiple external and SDK Git paths
Android Documentation
QUICKSTART_ANDROID.md
Comprehensive rewrite shifting from JitPack to local AAR distribution; added Critical Dependencies Checklist, explicit Kotlin 2.1.0+ requirements, updated installation steps, dependency blocks, manifest improvements, and expanded troubleshooting sections for Kotlin/Ktor interoperability
Android Example App: Navigation & Screens
examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/navigation/AppNavigation.kt, examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/tools/ToolCallingScreen.kt
Added new Tools tab to navigation route; introduced ToolCallingScreen composable with user input, tool listing, results display, and error handling; added ToolResultCard for rendering execution results
Android Example App: View Model
examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/tools/ToolCallingViewModel.kt
New ViewModel managing tool-calling UI state; exposes tools, input, results, and loading states; implements tool execution with mock implementations for time, arithmetic, and random generation; includes model status tracking and error handling
Kotlin SDK: Tool Models
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/models/Tool.kt, sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/models/GenerationOptions.kt
New data classes for Tool, ToolParameter, ToolExample, ToolCall, ToolCallResult with JSON schema generation; new ToolCallingMode enum; added tools and toolCallingMode properties to RunAnywhereGenerationOptions; introduced ToolBuilder DSL and createTool helper
Kotlin SDK: Grammar Bridge (JNI)
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/GrammarBridge.kt
New JNI bridge exposing native grammar functions (jsonSchemaToGBNF, createGrammar, freeGrammar); RAII-style Grammar wrapper class with AutoCloseable and factory methods for schema/GBNF creation
Kotlin SDK: Grammar-based Generation (LLamaAndroid)
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt
Added sendWithGrammar and sendWithGrammarSchema public APIs for grammar-constrained text generation; updated threadLocalState and State visibility to internal; replaced hardcoded token limit (256) with maxTokensToGenerate (512); added new_sampler_with_grammar and reset_sampler native bindings
Kotlin SDK: Tool-calling Service (LlamaCppService)
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt
New public suspend function generateWithTools performing grammar-based constrained tool-calling; includes system prompt construction, tool schema validation, JSON response parsing, and ToolCallResult assembly (note: implementation appears duplicated in diff)
Kotlin SDK: Prompt-based Tool Calling Extension
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/PromptBasedToolCallingExtensions.kt
Production-ready extension RunAnywhere.generateWithToolsPromptBased validating initialization and tool lists; wraps PromptBasedToolCalling orchestrator with retry behavior; introduces Tool.withExamples DSL for attaching usage examples
Kotlin SDK: Tool-calling Deprecation & DSL
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhereToolCalling.kt
Deprecated extension RunAnywhere.generateWithTools (marked ERROR level) signaling transition from grammar-based to prompt-based; includes commented implementation reference and createTool DSL helper
Kotlin SDK: Tool-calling Orchestration
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBasedToolCalling.kt, sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBuilder.kt, sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/ResponseParser.kt
New orchestration layer: PromptBasedToolCalling manages retry logic and tool execution; PromptBuilder constructs system prompts with tool descriptions; ResponseParser extracts ToolCall or RegularResponse from LLM output with multi-strategy JSON extraction and validation
Kotlin SDK: Deduplication
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/Deduplicator.kt
New Deduplicator class tracking and preventing duplicate tool calls using hash-based history; configurable max history size (default 50)
Native: CMake Configuration
sdk/runanywhere-kotlin/native/llama-jni/CMakeLists.txt
Added src/grammar_jni.cpp to library variant sources for all generated llama-android builds
Native: Grammar JNI Implementation
sdk/runanywhere-kotlin/native/llama-jni/src/grammar_jni.cpp
New C++ JNI bindings: jsonSchemaToGBNF (JSON schema to GBNF conversion), createGrammar (grammar sampler creation), freeGrammar (resource cleanup), new_sampler_with_grammar (sampler chain construction with grammar); includes error handling, resource ownership semantics, and detailed lifecycle notes; marked as deprecated in production favor of prompt-based approach
Native: LLamaAndroid Updates
sdk/runanywhere-kotlin/native/llama-jni/src/llama-android.cpp
Updated llama.cpp API calls (llama_load_model_from_file, llama_new_context_with_model, llama_free_model); added llama_token_is_eog check replacing vocab-based EOG detection; added periodic logging every 50 tokens; replaced llama_memory_clear with llama_kv_cache_clear; introduced reset_sampler JNI method; refactored completion loop with improved safety checks
Native: Submodule & Documentation
sdk/runanywhere-kotlin/native/llama-jni/llama.cpp, sdk/runanywhere-kotlin/native/llama-jni/GRAMMAR_IMPLEMENTATION_NOTES.md
Updated llama.cpp submodule commit pointer; added comprehensive implementation notes documenting grammar-based tool-calling investigation, crash analysis, root causes (JSON schema issues, chat template interference), decision to transition to prompt-based approach, and recommended future steps

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Android as Android App
    participant PromptBased as PromptBasedToolCalling
    participant PromptBuilder as PromptBuilder
    participant LLM as LLM Service
    participant ResponseParser as ResponseParser
    participant Dedup as Deduplicator

    User->>Android: Enter prompt & select tools
    Android->>PromptBased: generateWithTools(userMessage, tools)
    PromptBased->>PromptBuilder: buildSystemPrompt(tools)
    PromptBuilder-->>PromptBased: system prompt with tool descriptions
    PromptBased->>LLM: send(systemPrompt + userMessage)
    LLM-->>PromptBased: LLM response (JSON or text)
    PromptBased->>ResponseParser: parse(response, tools)
    ResponseParser-->>PromptBased: ParseResult (ToolCall or RegularResponse)
    
    alt ToolCall detected
        PromptBased->>Dedup: isDuplicate(toolName, args)
        Dedup-->>PromptBased: false (new call)
        PromptBased->>Android: ToolCallResult with tool execution
    else RegularResponse detected
        PromptBased->>Android: ToolCallResult with text response
    else Invalid or error
        PromptBased->>Android: retry or error result
    end
    
    Android->>User: Display results & tool execution output
Loading
sequenceDiagram
    participant Android as Android App
    participant LLamaAndroid as LLamaAndroid (Grammar Path)
    participant GrammarBridge as GrammarBridge
    participant NativeGrammar as Native llama.cpp
    participant TextStream as Token Stream

    Android->>GrammarBridge: Grammar.fromSchema(modelPtr, jsonSchema)
    GrammarBridge->>NativeGrammar: jsonSchemaToGBNF(jsonSchema)
    NativeGrammar-->>GrammarBridge: GBNF string
    GrammarBridge->>NativeGrammar: createGrammar(modelPtr, gbnfRules)
    NativeGrammar-->>GrammarBridge: grammar handle
    GrammarBridge-->>Android: Grammar object (AutoCloseable)
    
    Android->>LLamaAndroid: sendWithGrammar(message, grammarHandle)
    LLamaAndroid->>NativeGrammar: new_sampler_with_grammar(temp, minP, topK, grammar)
    NativeGrammar-->>LLamaAndroid: sampler chain (owns grammar)
    LLamaAndroid->>NativeGrammar: generate tokens (constrained by grammar)
    NativeGrammar-->>TextStream: constrained token stream
    TextStream-->>Android: Flow<String> with grammar-constrained output
    
    Note over NativeGrammar: Grammar lifecycle owned by sampler chain
    Note over Android: Note: Grammar-based approach deprecated
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

  • Grammar JNI/Native code (grammar_jni.cpp, llama-android.cpp, LLamaAndroid.kt): Dense native interop with resource ownership semantics, sampler chain management, and detailed API migrations; requires careful validation of lifecycle and double-free prevention. Pay special attention to:

    • Grammar ownership semantics in sampler chains (noted as critical in implementation)
    • llama.cpp API migration correctness (model loading, context initialization, EOG detection)
    • JNI resource cleanup and error handling paths
  • Tool-calling orchestration (PromptBasedToolCalling.kt, PromptBuilder.kt, ResponseParser.kt): Multiple interrelated classes with complex logic for retry, validation, and multi-strategy JSON extraction. Requires verification of:

    • JSON parsing robustness and edge case handling
    • Tool validation and parameter schema matching
    • Retry logic and error feedback flow
  • Deprecation and API transitions (RunAnywhereToolCalling.kt): Grammar-based API marked @Deprecated(level=ERROR) while grammar infrastructure remains; verify deprecation strategy and migration path clarity

  • Duplication issue: LlamaCppService.kt summary indicates duplicated implementation blocks requiring deduplication review

  • Android example integration (AppNavigation.kt, ToolCallingScreen.kt, ToolCallingViewModel.kt): Integration touchpoints and ViewModel state management patterns should align with SDK changes

Possibly related issues

Possibly related PRs

Suggested labels

kotlin-sdk, kotlin-sample

Poem

🐰 Tools now call with grace divine,
Prompts and grammars intertwine,
Parse the JSON, find the way,
Deduplicate throughout the day,
Android hops with tabs anew,
RunAnywhere dreams come true! ✨

Pre-merge checks and finishing touches

❌ Failed checks (3 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The PR title states "Add grammar-based tool calling support" and references a real component present in the changeset. However, examining the full scope of changes reveals a more nuanced situation. The raw summary indicates that while grammar-based infrastructure (GrammarBridge.kt, grammar_jni.cpp) is included in the changeset, the GRAMMAR_IMPLEMENTATION_NOTES.md documents an investigation that led to a decision to switch from grammar-based to prompt-based approaches due to crashes and instability. The public-facing API being promoted is generateWithToolsPromptBased() (prompt-based), while the grammar-based generateWithTools() method is explicitly marked as @deprecated with DeprecationLevel.ERROR. This creates a misleading title that emphasizes the deprecated grammar-based approach rather than the actual working solution being delivered (prompt-based tool calling). Update the PR title to reflect the primary feature being delivered. Consider revising to something like "Add tool calling support to RunAnywhere SDK (prompt-based)" or "Add prompt-based and grammar-based tool calling to RunAnywhere SDK" to more accurately represent the actual production-ready solution being introduced. Alternatively, if the grammar-based approach is meant to be primary, clarify why the implementation is marked as deprecated with DeprecationLevel.ERROR.
Description Check ⚠️ Warning The PR description follows the required template structure with sections for Description, Type of Change, Testing, Labels, and Checklist. However, the critical "Description" section is not actually filled in—it only contains the template placeholder text "Brief description of the changes made." The bullet-point summary above the template provides some context about what was introduced (Tool models, grammar-based generation, GrammarBridge, GenerationOptions updates), but this is insufficient for a complete PR description. Additionally, the Testing section is entirely empty with no tests marked as added, run, or passing, and all Checklist items remain unchecked (no confirmation of code style compliance, self-review, or documentation updates). Complete the PR description by replacing the placeholder text with a substantive explanation of the changes. Provide clarity on the goals, what was changed, and why. Additionally, address the Testing section by indicating whether tests were added/run, and complete the Checklist items by either checking boxes for completed actions or documenting any gaps. Consider expanding on the architecture decisions, especially the transition from grammar-based to prompt-based approaches as documented in GRAMMAR_IMPLEMENTATION_NOTES.md.
Docstring Coverage ⚠️ Warning Docstring coverage is 56.47% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

This commit preserves our grammar-based tool calling investigation before
switching to the prompt-based approach.

## What was implemented:
- grammar_jni.cpp: JNI bindings for JSON schema → GBNF conversion
- GrammarBridge.kt: Kotlin wrapper for grammar operations
- Grammar-constrained sampler chains with proper ordering
- Tool calling API with schema generation
- Integration with LlamaCppService

## Issues encountered:
1. SIGABRT crashes in llama_grammar_accept_impl after ~10 tokens
2. Root causes:
   - llama.cpp grammar stack bugs (unfixed as of Oct 2025)
   - Chat template interference with Qwen2 EOG tokens
   - Missing additionalProperties: false in schemas
   - Double-accept bug (fixed but crash persisted)

## Investigation summary:
- 40+ hours of debugging and deep-diving into llama.cpp source
- Fixed double-free bug, empty response issue, double-accept issue
- Crash persisted due to upstream llama.cpp grammar bugs
- Success rate: ~70-85% with crashes (unacceptable for production)

## Decision:
Switch to prompt-based tool calling with few-shot examples:
- No crashes - pure Kotlin/JSON parsing
- 85-95% success rate (proven in production)
- Works with small models (0.5B) via few-shot examples
- Universal compatibility, easier debugging

## Files added:
- GRAMMAR_IMPLEMENTATION_NOTES.md: Full investigation documentation
- TOOL_CALLING.md: Grammar-based approach docs

## Next steps:
Implement prompt-based tool calling per prompt_based_tool_calling_implementation.md
@sanchitmonga22
Copy link
Contributor Author

@coderabbitai please review this

@coderabbitai
Copy link

coderabbitai bot commented Oct 27, 2025

@sanchitmonga22: I'll review the changes in this PR that adds grammar-based tool calling support to the RunAnywhere SDK.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
QUICKSTART_ANDROID.md (1)

815-844: API Reference missing new tool-calling APIs.

This PR introduces tool-calling capabilities (generateWithTools(), Tool models, etc.), but the API Reference section doesn't document these new public APIs. Users won't know how to use the feature without this guidance.

Add documentation for:

  • RunAnywhere.generateWithTools() and its parameters
  • Tool model definition and schema
  • Tool calling modes (grammar-based vs. prompt-based)
  • Response parsing and tool execution
sdk/runanywhere-kotlin/native/llama-jni/CMakeLists.txt (1)

63-68: Fix ABI‑specific define leaking to non‑arm64 builds.

GGML_USE_CPU_AARCH64 should only be defined for arm64‑v8a.

Apply:

 target_compile_definitions(${target_name} PRIVATE
     GGML_COMMIT=""
     GGML_VERSION=""
     GGML_USE_CPU
-    GGML_USE_CPU_AARCH64
+    $<$<STREQUAL:${ANDROID_ABI},arm64-v8a>:GGML_USE_CPU_AARCH64>
 )
sdk/runanywhere-kotlin/native/llama-jni/src/llama-android.cpp (4)

48-53: Fix format-string misuse in log callback (potential crash/UB).

fmt is a message, not a printf format. Print with "%s".

 static void log_callback(ggml_log_level level, const char * fmt, void * data) {
-    if (level == GGML_LOG_LEVEL_ERROR)     __android_log_print(ANDROID_LOG_ERROR, TAG, fmt, data);
-    else if (level == GGML_LOG_LEVEL_INFO) __android_log_print(ANDROID_LOG_INFO, TAG, fmt, data);
-    else if (level == GGML_LOG_LEVEL_WARN) __android_log_print(ANDROID_LOG_WARN, TAG, fmt, data);
-    else __android_log_print(ANDROID_LOG_DEFAULT, TAG, fmt, data);
+    int prio = ANDROID_LOG_DEFAULT;
+    if (level == GGML_LOG_LEVEL_ERROR)      prio = ANDROID_LOG_ERROR;
+    else if (level == GGML_LOG_LEVEL_INFO)  prio = ANDROID_LOG_INFO;
+    else if (level == GGML_LOG_LEVEL_WARN)  prio = ANDROID_LOG_WARN;
+    __android_log_print(prio, TAG, "%s", fmt);
 }

160-181: Replace manual batch allocation/free with llama.cpp API (heap corruption risk).

free_batch frees 512 seq entries regardless of actual n_tokens; this can double-free or leak. Use llama_batch_init/llama_batch_free to avoid mismatched alloc/free.

 JNIEXPORT jlong JNICALL
 Java_com_runanywhere_sdk_llm_llamacpp_LLamaAndroid_new_1batch(JNIEnv *, jobject, jint n_tokens, jint embd, jint n_seq_max) {
-    llama_batch *batch = new llama_batch {
-        0, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr,
-    };
-    if (embd) {
-        batch->embd = (float *) malloc(sizeof(float) * n_tokens * embd);
-    } else {
-        batch->token = (llama_token *) malloc(sizeof(llama_token) * n_tokens);
-    }
-    batch->pos      = (llama_pos *)     malloc(sizeof(llama_pos)      * n_tokens);
-    batch->n_seq_id = (int32_t *)       malloc(sizeof(int32_t)        * n_tokens);
-    batch->seq_id   = (llama_seq_id **) malloc(sizeof(llama_seq_id *) * n_tokens);
-    for (int i = 0; i < n_tokens; ++i) {
-        batch->seq_id[i] = (llama_seq_id *) malloc(sizeof(llama_seq_id) * n_seq_max);
-    }
-    batch->logits   = (int8_t *)        malloc(sizeof(int8_t)         * n_tokens);
-    return reinterpret_cast<jlong>(batch);
+    auto batch = new llama_batch(llama_batch_init(n_tokens, embd, n_seq_max));
+    return reinterpret_cast<jlong>(batch);
 }
 ...
 JNIEXPORT void JNICALL
 Java_com_runanywhere_sdk_llm_llamacpp_LLamaAndroid_free_1batch(JNIEnv *, jobject, jlong batch_pointer) {
-    const auto batch = reinterpret_cast<llama_batch *>(batch_pointer);
-    free(batch->pos);
-    free(batch->n_seq_id);
-    for (int i = 0; i < 512; ++i) {  // Match batch size
-        free(batch->seq_id[i]);
-    }
-    free(batch->seq_id);
-    free(batch->logits);
-    if (batch->token) free(batch->token);
-    if (batch->embd) free(batch->embd);
-    delete batch;
+    auto batch = reinterpret_cast<llama_batch *>(batch_pointer);
+    llama_batch_free(*batch);
+    delete batch;
 }

If your llama.cpp version exposes different helpers, adapt accordingly (the key point is: don’t hardcode 512 and don’t hand-roll alloc/free).

Also applies to: 187-199


11-15: Avoid caching local jclass globally; cache only method IDs.

la_int_var holds a local ref that becomes invalid. Resolve class per-call, cache method IDs, and release the local ref.

-// Global variables for JNI method caching
-jclass la_int_var;
-jmethodID la_int_var_value;
-jmethodID la_int_var_inc;
+// Global method IDs (valid across JNIEnv calls)
+static jmethodID la_int_var_value = nullptr;
+static jmethodID la_int_var_inc = nullptr;
 ...
-    if (!la_int_var) la_int_var = env->GetObjectClass(intvar_ncur);
-    if (!la_int_var_value) la_int_var_value = env->GetMethodID(la_int_var, "getValue", "()I");
-    if (!la_int_var_inc) la_int_var_inc = env->GetMethodID(la_int_var, "inc", "()V");
+    jclass cls = env->GetObjectClass(intvar_ncur);
+    if (!la_int_var_value) la_int_var_value = env->GetMethodID(cls, "getValue", "()I");
+    if (!la_int_var_inc)   la_int_var_inc   = env->GetMethodID(cls, "inc", "()V");
+    env->DeleteLocalRef(cls);

Also applies to: 316-319


1-6: Add missing header for std::transform.

Ensure is included to avoid build issues.

 #include <android/log.h>
 #include <jni.h>
 #include <string>
+#include <algorithm>
 #include "llama.h"
 #include "common.h"

Also applies to: 420-423

sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt (1)

151-164: Reduce sensitive logging (PII risk) and verbosity.

Logging full messages and full prompts can leak user data; trim lengths or guard behind a debug flag.

-logger.info("📨 Messages:")
-input.messages.forEach { msg ->
-    logger.info("  - ${msg.role}: ${msg.content.take(100)}")
-}
-logger.info("📝 Full prompt:\n$prompt")
+logger.debug("📨 ${input.messages.size} messages (content redacted; showing first 100 chars each in debug builds)")
+input.messages.forEach { msg ->
+    logger.debug("  - ${msg.role}: ${msg.content.take(100)}")
+}
+logger.debug("📝 Prompt preview (first 500 chars): ${prompt.take(500)}")

If SDKLogger supports levels, prefer debug for previews; otherwise gate with a configuration flag.

Also applies to: 165-175, 316-319

🧹 Nitpick comments (31)
QUICKSTART_ANDROID.md (4)

9-52: Critical Dependencies section is comprehensive but may need versioning strategy.

The new dependencies checklist provides excellent clarity. However, consider documenting:

  1. Which dependencies are truly required vs. recommended
  2. Compatibility matrix for different Kotlin/Ktor versions
  3. Migration path if versions need updates

This will reduce future maintenance burden and support burden.


54-75: Hard-coded release URLs may create maintenance friction.

The AAR download links are pinned to android-v0.1.0-alpha. Consider:

  • Adding a "Check for updates" note or link to releases page
  • Documenting how/when to update URLs for new versions
  • Adding checksums or signatures for AAR integrity verification (security best practice)

53-53: Fix static analysis linting issues.

Per markdownlint-cli2 output:

  • Line 53 (MD040): Fenced code blocks should have a language specified (e.g., ```toml instead of ```)
  • Line 205 (MD038): Remove extra spaces inside inline code spans

These are formatting nits but fixing them will improve documentation consistency and pass linting checks.

Also applies to: 205-205


728-801: Troubleshooting section is strong on dependencies but lacks tool-calling guidance.

The expanded troubleshooting for Ktor/Kotlin compatibility is excellent. However, given that this PR introduces tool-calling capabilities to the SDK, consider adding a troubleshooting section for:

  • Tool schema validation errors
  • Tool execution failures
  • Grammar-based vs. prompt-based tool calling mode selection/debugging
  • Common tool-calling issues with specific error messages

This ensures users have adequate guidance for the new feature set introduced in this PR.

sdk/runanywhere-kotlin/native/llama-jni/CMakeLists.txt (2)

52-60: Scope compile options to this target; avoid PUBLIC propagation.

These flags shouldn’t leak to dependents.

-    target_compile_options(${target_name} PUBLIC
+    target_compile_options(${target_name} PRIVATE
         -O3
         -fvisibility=hidden -fvisibility-inlines-hidden
         -ffunction-sections -fdata-sections
         ${cpu_flags}
     )

Also confirm NDK Clang accepts -ffile-prefix-map; if not, prefer -fdebug-prefix-map.


71-77: Multiple CPU‑feature variants require runtime selection logic.

Building 7 differently‑named .so files under the same ABI won’t auto‑select at load time. Ensure the app chooses the right library by name based on detected CPU features, or only package a baseline.

Please point to (or add) the code that:

  • Detects CPU features at runtime.
  • Loads the best‑fit lib via System.loadLibrary("<matching‑name>").
  • Excludes unused variants from release artifacts if not selected.

Also applies to: 81-97

sdk/runanywhere-kotlin/native/llama-jni/GRAMMAR_IMPLEMENTATION_NOTES.md (1)

3-6: Align docs and build: deprecate in code, too.

Since status is DEPRECATED, add a note pointing to the CMake flag (or disable building by default). Keeps artifacts but avoids shipping them unintentionally.

examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/navigation/AppNavigation.kt (1)

109-114: Minor: use string resources and explicit content descriptions.

For i18n and accessibility, move "Tools" to a string resource and ensure icon content descriptions are localized.

sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBuilder.kt (1)

54-75: Optional: allow non‑string argument types.

If ToolParameter types include numbers/booleans/arrays, consider Map<String, JsonElement> (or a serializable data class) for examples to preserve types.

sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/models/GenerationOptions.kt (1)

56-61: Add validation and parity helpers for tool-calling fields.

  • When toolCallingMode == GRAMMAR_BASED and tools is empty, fail fast with a clear message in validate().
  • Consider convenience methods withTools(...) and withToolCallingMode(...) to mirror other withX helpers.

Apply this diff to tighten validation:

 fun validate() {
   require(maxTokens > 0) { "Max tokens must be positive" }
   require(temperature >= 0f && temperature <= 2f) { "Temperature must be between 0 and 2" }
   require(topP >= 0f && topP <= 1f) { "Top-p must be between 0 and 1" }
   topK?.let { require(it > 0) { "Top-K must be positive" } }
   repetitionPenalty?.let { require(it > 0f) { "Repetition penalty must be positive" } }
   frequencyPenalty?.let { require(it >= -2f && it <= 2f) { "Frequency penalty must be between -2 and 2" } }
   presencePenalty?.let { require(it >= -2f && it <= 2f) { "Presence penalty must be between -2 and 2" } }
   contextLength?.let { require(it > 0) { "Context length must be positive" } }
+  if (toolCallingMode == ToolCallingMode.GRAMMAR_BASED) {
+    require(tools.isNotEmpty()) { "Tools must be provided when using GRAMMAR_BASED tool calling" }
+  }
 }
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/PromptBasedToolCallingExtensions.kt (2)

64-88: Early fallback when tools list is empty.

Avoid invoking the orchestrator with no tools; return regular generation to callers.

     if (tools.isEmpty()) {
-        logger.warn("No tools provided, falling back to regular generation")
+        logger.warn("No tools provided, falling back to regular generation")
+        val text = this.generate(
+            prompt = prompt,
+            options = options ?: RunAnywhereGenerationOptions(temperature = 0.3f)
+        )
+        return ToolCallResult(success = true, text = text, toolCalls = emptyList(), mode = ToolCallingMode.PROMPT_BASED)
     }

Note: adjust generate(...) call to your actual RunAnywhere API if named differently.


130-135: Prevent silent no‑op: deprecate ToolBuilder.example.

Mark as @deprecated with a ReplaceWith to steer devs to Tool.withExamples(...).

-fun ToolBuilder.example(userQuery: String, arguments: Map<String, String>) {
+@Deprecated(
+    message = "ToolBuilder.example() is not supported. Use Tool.withExamples() on the built Tool.",
+    replaceWith = ReplaceWith("/* build tool first */")
+)
+fun ToolBuilder.example(userQuery: String, arguments: Map<String, String>) {
     // Note: ToolBuilder doesn't currently support examples directly,
     // so users should use .withExamples() after building the tool.
     // This function is provided for future extensibility.
     logger.warn("ToolBuilder.example() is not yet supported. Use Tool.withExamples() instead.")
 }
examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/tools/ToolCallingScreen.kt (3)

176-181: Add content descriptions for icons (accessibility).

Provide non-null contentDescription for assistive tech; keep emojis in text but make icons screen-reader friendly.

-Icon(
-    imageVector = Icons.Default.Error,
-    contentDescription = null,
+Icon(
+    imageVector = Icons.Default.Error,
+    contentDescription = "Error",
     tint = MaterialTheme.colorScheme.error
)
...
-Icon(
-    imageVector = if (result.success) Icons.Default.CheckCircle else Icons.Default.Error,
-    contentDescription = null,
+Icon(
+    imageVector = if (result.success) Icons.Default.CheckCircle else Icons.Default.Error,
+    contentDescription = if (result.success) "Success" else "Error",
     tint = if (result.success) {
         MaterialTheme.colorScheme.primary
     } else {
         MaterialTheme.colorScheme.error
     }
)

Also applies to: 216-223


44-46: Externalize hardcoded strings to resources.

Move UI strings to string resources for i18n and consistency.

Example (not exhaustive): "Tool Calling Demo", "Available Tools", tool descriptions, placeholders, "Processing...", "Call Tools", and the model warning.

Also applies to: 80-91, 98-107, 126-127, 133-137


153-160: Provide a stable key for LazyColumn items.

Avoid recomposition jitter by supplying keys.

- items(uiState.results) { result ->
+ items(uiState.results, key = { it.id /* or index if no stable id */ }) { result ->
     ToolResultCard(result)
 }

If ToolExecutionResult has no id, use itemsIndexed and key = { idx, _ -> idx } as a fallback.

sdk/runanywhere-kotlin/native/llama-jni/src/llama-android.cpp (1)

112-119: Correct error message for context creation.

The call is llama_new_context_with_model(); update the message to match for easier debugging.

-        LOGe("llama_init_from_model() returned null");
-        env->ThrowNew(env->FindClass("java/lang/IllegalStateException"),
-                      "llama_init_from_model() returned null");
+        LOGe("llama_new_context_with_model() returned null");
+        env->ThrowNew(env->FindClass("java/lang/IllegalStateException"),
+                      "llama_new_context_with_model() returned null");
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt (2)

542-566: Replace manual JSON string building with kotlinx.serialization builders.

Manual escaping is brittle (e.g., control chars, unicode). Build a JsonElement and encode it.

-private fun buildJsonString(obj: Any?): String {
-    return when (obj) {
-        null -> "null"
-        is String -> "\"${obj.replace("\\", "\\\\").replace("\"", "\\\"")}\""
-        is Number -> obj.toString()
-        is Boolean -> obj.toString()
-        is Map<*, *> -> {
-            obj.entries.joinToString(separator = ",", prefix = "{", postfix = "}") { (key, value) ->
-                "\"$key\":${buildJsonString(value)}"
-            }
-        }
-        is List<*> -> {
-            obj.joinToString(separator = ",", prefix = "[", postfix = "]") { buildJsonString(it) }
-        }
-        else -> "\"$obj\""
-    }
-}
+private fun buildJsonString(obj: Any?): String {
+    fun toJsonElement(value: Any?): JsonElement = when (value) {
+        null -> JsonNull
+        is JsonElement -> value
+        is String -> JsonPrimitive(value)
+        is Number -> JsonPrimitive(value)
+        is Boolean -> JsonPrimitive(value)
+        is Map<*, *> -> JsonObject(value.entries.associate { (k, v) ->
+            k.toString() to toJsonElement(v)
+        })
+        is List<*> -> JsonArray(value.map { toJsonElement(it) })
+        else -> JsonPrimitive(value.toString())
+    }
+    return Json.encodeToString(toJsonElement(obj))
+}

467-499: System prompt rendering: handle empty parameter lists gracefully.

When a tool has no parameters, avoid rendering “Parameters: ” with trailing space; it confuses models.

 val toolDescriptions = tools.joinToString("\n") { tool ->
-    val params = tool.parameters.properties.entries.joinToString(", ") { (name, param) ->
+    val props = tool.parameters.properties
+    val params = if (props.isEmpty()) "(none)" else props.entries.joinToString(", ") { (name, param) ->
         val req = if (param.required) "(required)" else "(optional)"
         "$name: ${param.type.toJsonSchemaType()} $req - ${param.description}"
     }
     "- ${tool.name}: ${tool.description}\n  Parameters: $params"
 }
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/ResponseParser.kt (4)

59-65: Don't swallow parse exceptions silently; log at debug to aid diagnostics.

Falling back is fine, but detekt is right: we should at least log the failure reason.

-    private fun tryParseJson(text: String): JsonObject? {
-        return try {
-            json.parseToJsonElement(text).jsonObject
-        } catch (e: Exception) {
-            null
-        }
-    }
+    private fun tryParseJson(text: String): JsonObject? =
+        try {
+            json.parseToJsonElement(text).jsonObject
+        } catch (e: Exception) {
+            logger.debug("Direct JSON parse failed: ${e.message}")
+            null
+        }

Based on learnings.


70-86: Precompile code‑block regexes and support case‑insensitive ```JSON fences; also scan all blocks.

Reduces allocations and improves accuracy when multiple fenced blocks exist.

Add near Line 18:

private val codeBlockRegexes = listOf(
    Regex("```json\\s*(.+?)\\s*```", setOf(RegexOption.DOT_MATCHES_ALL, RegexOption.IGNORE_CASE)),
    Regex("```\\s*(.+?)\\s*```", RegexOption.DOT_MATCHES_ALL)
)

Then refactor function:

-    private fun extractFromCodeBlock(response: String): JsonObject? {
-        val patterns = listOf(
-            "```json\\s*(.+?)\\s*```",
-            "```\\s*(.+?)\\s*```"
-        )
-
-        for (pattern in patterns) {
-            val regex = Regex(pattern, RegexOption.DOT_MATCHES_ALL)
-            val match = regex.find(response)
-            if (match != null) {
-                val json = tryParseJson(match.groupValues[1].trim())
-                if (json != null) return json
-            }
-        }
-        return null
-    }
+    private fun extractFromCodeBlock(response: String): JsonObject? {
+        codeBlockRegexes.forEach { regex ->
+            regex.findAll(response).forEach { m ->
+                tryParseJson(m.groupValues[1].trim())?.let { return it }
+            }
+        }
+        return null
+    }

91-100: Naive first/last brace slice can over‑capture; prefer balanced scanning.

Current approach breaks with prose containing multiple JSON fragments. Consider a simple depth scan.

private fun extractJsonFromText(response: String): JsonObject? {
    var depth = 0
    var start = -1
    for ((i, ch) in response.withIndex()) {
        if (ch == '{') { if (depth++ == 0) start = i }
        else if (ch == '}' && --depth == 0 && start >= 0) {
            tryParseJson(response.substring(start, i + 1))?.let { return it }
        }
    }
    return null
}

169-174: Potential NPEs if schema fields are nullable; guard or confirm non‑null.

If Tool.parameters.required or .properties can be null from deserialization, this can NPE.

If nullable, refactor:

-        val requiredParams = tool.parameters.required
-        val allParams = tool.parameters.properties.keys
+        val requiredParams = tool.parameters.required ?: emptyList()
+        val allParams = tool.parameters.properties?.keys ?: emptySet()

Please confirm types in sdk/.../models/Tool.kt. Based on learnings.

Also applies to: 171-172, 185-191

sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/GrammarBridge.kt (1)

110-125: Minor: Remove unused constructor parameter or use it.

modelPointer is stored but never used.

sdk/runanywhere-kotlin/native/llama-jni/src/grammar_jni.cpp (2)

181-248: Ownership contract: Kotlin must mark grammar handle as transferred before adding to chain.

To prevent double‑free, ensure Kotlin side calls Grammar.markOwnedByChain() before invoking this JNI with grammar_pointer.

I can follow through in LLamaAndroid.kt to wire this correctly once available in the API. Based on learnings.


245-246: Misleading sampler count log.

Message prints 1/0 based on grammar presence, ignoring other samplers. Either remove or compute chain size if API supports it.

sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhereToolCalling.kt (1)

51-64: Improve deprecation ergonomics with ReplaceWith and remove unused logger/imports.

Make migration clearer and clean up unused items.

 @Deprecated(
-    message = "Grammar-based tool calling crashes with SIGABRT. See issue #182. Use prompt-based approach instead.",
-    level = DeprecationLevel.ERROR
+    message = "Grammar-based tool calling crashes with SIGABRT. See issue #182. Use prompt-based approach instead.",
+    replaceWith = ReplaceWith("generateWithToolsPromptBased(prompt, tools, options)"),
+    level = DeprecationLevel.ERROR
 )

And remove:

  • private val logger (unused)
  • import com.runanywhere.sdk.llm.llamacpp.LlamaCppService (unused)
examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/tools/ToolCallingViewModel.kt (2)

201-210: Optional: honor timezone using java.time.

For a better demo, actually format time for the requested ZoneId.

val zone = if (timezone.isNullOrBlank()) ZoneId.systemDefault() else runCatching { ZoneId.of(timezone) }.getOrDefault(ZoneId.systemDefault())
val zoned = ZonedDateTime.now(zone).format(DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss"))
"Current time (${zone.id}): $zoned"

171-177: Minor UX: non‑tool responses aren’t errors.

Consider showing as info/result instead of errorMessage.

sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBasedToolCalling.kt (1)

234-236: Consider using UUID for more robust unique IDs.

The current timestamp + random approach works but could have collisions if called rapidly. A UUID would provide stronger uniqueness guarantees.

Apply this diff to use UUID:

     private fun generateToolCallId(): String {
-        return "call_${System.currentTimeMillis()}_${(Math.random() * 1000).toInt()}"
+        return "call_${java.util.UUID.randomUUID()}"
     }
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt (1)

314-317: Hardcoded 2000-character limit should be configurable.

The magic number at Line 314 limits all grammar-based generation. Tool calls can vary in size, and this limit may be too restrictive for complex schemas or too permissive for simpler ones.

Extract this as a configurable parameter:

     fun sendWithGrammarSchema(
         message: String,
         schemaJson: String,
         config: LlamaModelConfig? = null,
-        parseSpecialTokens: Boolean = true
+        parseSpecialTokens: Boolean = true,
+        maxOutputChars: Int = 2000
     ): Flow<String> = flow {
         ...
-                        if (generatedText.length > 2000) {
+                        if (generatedText.length > maxOutputChars) {
-                            logger.warn("⚠️ Generated text exceeds 2000 chars, stopping early to prevent runaway generation")
+                            logger.warn("⚠️ Generated text exceeds $maxOutputChars chars, stopping early to prevent runaway generation")
                             break
                         }
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/models/Tool.kt (1)

119-148: Arguments schema is generic in multi-tool schema.

The generateSchema method at Lines 137-140 defines arguments as a generic object without per-tool parameter validation. This means the schema won't enforce correct parameters for each tool name.

Consider whether per-tool argument schemas should be included in the generated schema. If validation is handled elsewhere (e.g., in ResponseParser), this is acceptable; otherwise, the schema could be strengthened with a oneOf construct that maps tool names to their specific parameter schemas.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e2d1aa and bc017e4.

📒 Files selected for processing (21)
  • .idea/vcs.xml (1 hunks)
  • QUICKSTART_ANDROID.md (8 hunks)
  • examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/navigation/AppNavigation.kt (5 hunks)
  • examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/tools/ToolCallingScreen.kt (1 hunks)
  • examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/tools/ToolCallingViewModel.kt (1 hunks)
  • sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/GrammarBridge.kt (1 hunks)
  • sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt (7 hunks)
  • sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt (2 hunks)
  • sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/PromptBasedToolCallingExtensions.kt (1 hunks)
  • sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhereToolCalling.kt (1 hunks)
  • sdk/runanywhere-kotlin/native/llama-jni/CMakeLists.txt (1 hunks)
  • sdk/runanywhere-kotlin/native/llama-jni/GRAMMAR_IMPLEMENTATION_NOTES.md (1 hunks)
  • sdk/runanywhere-kotlin/native/llama-jni/llama.cpp (1 hunks)
  • sdk/runanywhere-kotlin/native/llama-jni/src/grammar_jni.cpp (1 hunks)
  • sdk/runanywhere-kotlin/native/llama-jni/src/llama-android.cpp (7 hunks)
  • sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/models/GenerationOptions.kt (2 hunks)
  • sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/models/Tool.kt (1 hunks)
  • sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/Deduplicator.kt (1 hunks)
  • sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBasedToolCalling.kt (1 hunks)
  • sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBuilder.kt (1 hunks)
  • sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/ResponseParser.kt (1 hunks)
🧰 Additional context used
🪛 Clang (14.0.6)
sdk/runanywhere-kotlin/native/llama-jni/src/grammar_jni.cpp

[error] 38-38: 'android/log.h' file not found

(clang-diagnostic-error)

🪛 detekt (1.23.8)
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/ResponseParser.kt

[warning] 62-62: The caught exception is swallowed. The original exception could be lost.

(detekt.exceptions.SwallowedException)

🪛 LanguageTool
sdk/runanywhere-kotlin/native/llama-jni/GRAMMAR_IMPLEMENTATION_NOTES.md

[style] ~144-~144: Consider using a shorter alternative to avoid wordiness.
Context: ...en_id); // ❌ WRONG! ``` Result: ❌ Made it worse! Still crashed, but this was the WRONG ...

(MADE_IT_JJR)


[style] ~418-~418: To elevate your writing, try using a synonym here.
Context: ...gability is critical** - C++ crashes hard to debug on Android - Kotlin string ...

(HARD_TO)


[style] ~450-~450: Consider using a shorter alternative to avoid wordiness.
Context: ... - Day 5: Add explicit accept call (made it worse) - Day 6: Deep-dive into llama.cpp ...

(MADE_IT_JJR)

🪛 markdownlint-cli2 (0.18.1)
QUICKSTART_ANDROID.md

53-53: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


205-205: Spaces inside code span elements

(MD038, no-space-in-code)

🔇 Additional comments (16)
.idea/vcs.xml (1)

5-19: IDE VCS configuration looks good.

The added Git mappings enable proper version control tracking across multiple external dependencies and submodules, consistent with a complex multi-module SDK project. No concerns with this change.

QUICKSTART_ANDROID.md (1)

129-129: AndroidX Security Crypto dependency is still in alpha.

Line 129 and 254 specify androidx.security:security-crypto:1.1.0-alpha06. Using an alpha library in production SDK documentation may cause stability concerns. Verify if:

  1. A stable version (1.1.0+) is available
  2. If alpha is intentional, document the rationale and any known issues

Also applies to: 254-254

sdk/runanywhere-kotlin/native/llama-jni/llama.cpp (1)

1-1: Clarify the purpose of the submodule update in context of deprecated grammar implementation.

The investigation reveals a critical mismatch: grammar-based tool calling was implemented, investigated for ~40 hours, found to cause production crashes due to unfixed llama.cpp bugs, and replaced with prompt-based approach. Yet:

  1. grammar_jni.cpp is marked deprecated with status: "This implementation is no longer used in production"
  2. CMakeLists.txt still compiles grammar_jni.cpp despite deprecation
  3. llama.cpp grammar bugs remain unfixed as of October 2025 (multiple GitHub issues referenced)

The review comment assumes grammar support is the submodule's purpose, but the codebase evidence shows the team moved to prompt-based tool calling. Before approving this submodule update, clarify:

  • Is this update necessary for the prompt-based tool calling implementation, or is it maintenance-only?
  • If grammar is deprecated and unused, should grammar_jni.cpp be removed from the build entirely?
  • What specific functionality from the new llama.cpp commit is required by the PR's actual implementation?
examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/navigation/AppNavigation.kt (2)

65-67: Route wiring looks good.

TOOLS route is correctly added and points to ToolCallingScreen.


156-157: Constant addition LGTM.

NavigationRoute.TOOLS looks consistent with existing routes.

sdk/runanywhere-kotlin/native/llama-jni/GRAMMAR_IMPLEMENTATION_NOTES.md (1)

209-211: No issues found—commit reference is accurate.

The web search confirms that commit 6777c544 exists in the llama.cpp repository and directly addresses the grammar stability and additionalProperties handling issues as stated in the documentation. The reference is valid and requires no updates.

sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt (1)

362-463: No duplicate implementations found in LlamaCppService.kt.

The search reveals only one generateWithTools definition at line 362 in LlamaCppService.kt. The other occurrence found (line 38 in PromptBasedToolCalling.kt) is in a separate file and represents the interface/abstract definition, not a duplicate implementation. The code at lines 362-463 is the sole concrete implementation in this file.

Likely an incorrect or invalid review comment.

sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBasedToolCalling.kt (2)

21-28: LGTM: Clean initialization pattern.

The orchestrator components are well-organized with appropriate separation of concerns.


182-199: LGTM: Appropriate error handling for service access.

The type cast at Line 194 is intentional and provides clear error messages when preconditions aren't met.

sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt (3)

60-62: LGTM: Parameterized token limit improves flexibility.

Replacing the hardcoded value with a field makes the generation behavior more maintainable.


206-249: LGTM: Proper resource management in grammar-constrained generation.

The try-finally block at Lines 222-242 ensures the grammar sampler is freed even if generation fails.


270-272: LGTM: Critical lifecycle documentation present.

The comments at Lines 270-271 and 330-331 correctly warn about grammar ownership and double-free risks. This is essential given the complex native resource lifecycle.

Also applies to: 330-331

sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/models/Tool.kt (4)

8-25: LGTM: Comprehensive type mapping.

The enum covers standard JSON schema types with proper conversion to schema format strings.


89-111: LGTM: Clever use of JSON Schema "const" for validation.

The schema at Lines 93-95 enforces exact tool name matching using the const keyword, which prevents the LLM from generating invalid tool names.


221-260: LGTM: Well-designed builder DSL.

The type-specific helper methods (Lines 234-257) provide excellent developer ergonomics for defining tool parameters.


262-264: LGTM: Clean DSL entry point.

The function follows standard Kotlin DSL patterns for builder construction.

Comment on lines +222 to +227
"generate_random_number" -> {
val min = toolCall.arguments["min"]?.toIntOrNull() ?: 0
val max = toolCall.arguments["max"]?.toIntOrNull() ?: 100
val random = Random.nextInt(min, max + 1)
"Generated: $random (range: $min-$max)"
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate min/max and avoid overflow when including max.

Random.nextInt(from, until) is exclusive upper bound; adding 1 can overflow at Int.MAX_VALUE and throws if min > max.

-                val min = toolCall.arguments["min"]?.toIntOrNull() ?: 0
-                val max = toolCall.arguments["max"]?.toIntOrNull() ?: 100
-                val random = Random.nextInt(min, max + 1)
+                val min = toolCall.arguments["min"]?.toIntOrNull() ?: 0
+                val max = toolCall.arguments["max"]?.toIntOrNull() ?: 100
+                if (min > max) return "Error: min must be <= max (got $min > $max)"
+                val upperExclusive = if (max == Int.MAX_VALUE) Int.MAX_VALUE else (max + 1)
+                val random = Random.nextInt(min, upperExclusive)
                 "Generated: $random (range: $min-$max)"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"generate_random_number" -> {
val min = toolCall.arguments["min"]?.toIntOrNull() ?: 0
val max = toolCall.arguments["max"]?.toIntOrNull() ?: 100
val random = Random.nextInt(min, max + 1)
"Generated: $random (range: $min-$max)"
}
"generate_random_number" -> {
val min = toolCall.arguments["min"]?.toIntOrNull() ?: 0
val max = toolCall.arguments["max"]?.toIntOrNull() ?: 100
if (min > max) return "Error: min must be <= max (got $min > $max)"
val upperExclusive = if (max == Int.MAX_VALUE) Int.MAX_VALUE else (max + 1)
val random = Random.nextInt(min, upperExclusive)
"Generated: $random (range: $min-$max)"
}
🤖 Prompt for AI Agents
In
examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/presentation/tools/ToolCallingViewModel.kt
around lines 222 to 227, validate the min/max arguments before calling
Random.nextInt: ensure min <= max (either return an error string or swap/clamp
values), handle the case where min == max by returning that fixed value, and
avoid overflow when computing the exclusive upper bound by not doing max + 1
blindly — if max == Int.MAX_VALUE, call Random.nextInt with an appropriate
exclusive bound (or use a branch that picks a random between min and max without
adding 1) or clamp the computed upper bound to a safe value; finally produce the
same result string format.

Comment on lines +82 to +101
class Grammar(
private val modelPointer: Long,
val handle: Long
) : AutoCloseable {

private val logger = SDKLogger("Grammar")
private var freed = false

override fun close() {
if (!freed && handle != 0L) {
try {
GrammarBridge.freeGrammar(handle)
logger.debug("Grammar freed: $handle")
freed = true
} catch (e: Exception) {
logger.error("Error freeing grammar", e)
}
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Double‑free risk: Grammar.close() frees even after chain takes ownership.

C++ notes say the chain owns the grammar once added. Unconditional free here can SIGSEGV.

Add an ownership flag and a method to mark transfer before adding to the chain:

 class Grammar(
-    private val modelPointer: Long,
     val handle: Long
 ) : AutoCloseable {
 
     private val logger = SDKLogger("Grammar")
     private var freed = false
+    @Volatile private var ownedByChain = false
+
+    /** Call this immediately before passing the handle into the sampler chain. */
+    fun markOwnedByChain() { ownedByChain = true }
 
     override fun close() {
-        if (!freed && handle != 0L) {
+        if (!freed && !ownedByChain && handle != 0L) {
             try {
                 GrammarBridge.freeGrammar(handle)
                 logger.debug("Grammar freed: $handle")
                 freed = true
             } catch (e: Exception) {
                 logger.error("Error freeing grammar", e)
             }
         }
     }

Also remove unused modelPointer param. Ensure callers invoke markOwnedByChain() right before LLamaAndroid.new_sampler_with_grammar(..., grammar.handle). Based on learnings.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class Grammar(
private val modelPointer: Long,
val handle: Long
) : AutoCloseable {
private val logger = SDKLogger("Grammar")
private var freed = false
override fun close() {
if (!freed && handle != 0L) {
try {
GrammarBridge.freeGrammar(handle)
logger.debug("Grammar freed: $handle")
freed = true
} catch (e: Exception) {
logger.error("Error freeing grammar", e)
}
}
}
class Grammar(
val handle: Long
) : AutoCloseable {
private val logger = SDKLogger("Grammar")
private var freed = false
@Volatile private var ownedByChain = false
/** Call this immediately before passing the handle into the sampler chain. */
fun markOwnedByChain() { ownedByChain = true }
override fun close() {
if (!freed && !ownedByChain && handle != 0L) {
try {
GrammarBridge.freeGrammar(handle)
logger.debug("Grammar freed: $handle")
freed = true
} catch (e: Exception) {
logger.error("Error freeing grammar", e)
}
}
}
🤖 Prompt for AI Agents
In
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/GrammarBridge.kt
around lines 82 to 101, the current close() unconditionally frees the grammar
which can cause a double-free when ownership is transferred to the chain; add a
private var ownedByChain = false, remove the unused modelPointer constructor
parameter, add a public markOwnedByChain() method that sets ownedByChain = true,
and change close() to only call GrammarBridge.freeGrammar(handle) if
!ownedByChain && !freed && handle != 0L (then set freed = true); update
documentation/comments to instruct callers to invoke markOwnedByChain()
immediately before calling LLamaAndroid.new_sampler_with_grammar(...,
grammar.handle).

private val logger = SDKLogger("LLamaAndroid")

private val threadLocalState: ThreadLocal<State> = ThreadLocal.withInitial { State.Idle }
internal val threadLocalState: ThreadLocal<State> = ThreadLocal.withInitial { State.Idle }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Consider whether internal visibility is necessary.

Exposing threadLocalState and State as internal breaks encapsulation. Verify this is required for cross-module access and document the rationale if retained.

Run the following script to check usages outside this file:

Also applies to: 433-433


🏁 Script executed:

#!/bin/bash
# Search for references to threadLocalState or State outside LLamaAndroid.kt
rg -n --type=kt 'threadLocalState|State\.(Idle|Loaded)' -g '!**/LLamaAndroid.kt'

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Search for references to threadLocalState or State outside LLamaAndroid.kt
# Using glob pattern for .kt files instead of --type flag
rg -n --glob='*.kt' 'threadLocalState|State\.(Idle|Loaded)' -g '!**/LLamaAndroid.kt'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Also inspect the State sealed interface and threadLocalState declaration in LLamaAndroid.kt
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | head -100

Length of output: 3479


🏁 Script executed:

#!/bin/bash
# Search for any references to threadLocalState in the entire repository (any file type)
rg -n 'threadLocalState' 

Length of output: 1937


🏁 Script executed:

#!/bin/bash
# View State sealed interface definition around line 433
sed -n '420,445p' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt

Length of output: 820


🏁 Script executed:

#!/bin/bash
# Search for any references to State (the sealed interface) outside LLamaAndroid.kt
rg -n 'State\s*=|sealed.*State|class State|interface State' --glob='*.kt' -g '!**/LLamaAndroid.kt'

Length of output: 14299


Revert threadLocalState and State sealed interface visibility to private.

Search results confirm no external references to threadLocalState or the State sealed interface exist anywhere in the codebase. Both are used exclusively within LLamaAndroid.kt, making internal visibility unnecessary and a violation of encapsulation. Revert lines 31 and 433 from internal to private.

🤖 Prompt for AI Agents
In
sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt
around line 31 and line 433, revert the visibility of threadLocalState and the
State sealed interface from internal to private; specifically change "internal
val threadLocalState: ThreadLocal<State> = ..." to "private val ..." and change
the sealed interface declaration at line 433 from "internal sealed interface
State" to "private sealed interface State", keeping all current implementations
and usages unchanged since both are only used within this file.

kv_cache_clear(state.context)
} finally {
// Reset sampler state before freeing to clear grammar state
reset_sampler(samplerChain)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify reset_sampler is required before free_sampler.

Calling reset_sampler before free_sampler at Line 325 is unusual. Confirm this order is necessary for proper grammar state cleanup in the native implementation.

Run the following script to examine the native reset_sampler implementation:


🏁 Script executed:

#!/bin/bash
# Check the native implementation of reset_sampler
fd -e cpp -e c grammar_jni | xargs rg -A10 -B2 'reset_sampler'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Search for the actual Kotlin file and examine context around line 325
find . -name "LLamaAndroid.kt" -type f | head -1 | xargs -I {} sh -c 'echo "=== File: {} ===" && head -n 330 {} | tail -n 20'

Length of output: 1384


🏁 Script executed:

#!/bin/bash
# Search for free_sampler to understand the cleanup sequence
rg -n 'free_sampler' -A3 -B3 --type kotlin

Length of output: 5210


🏁 Script executed:

#!/bin/bash
# Search more broadly for reset_sampler in the codebase
rg -n 'reset_sampler' -C5 --type kotlin --type java --type cpp --type c

Length of output: 4045


🏁 Script executed:

#!/bin/bash
# Look for external library or JNI bindings for llama.cpp
find . -name "*llama*" -o -name "*sampler*" | grep -E '\.(kt|java|cpp|c|h|so|jar)$' | head -20

Length of output: 115


🏁 Script executed:

#!/bin/bash
# Search for reset_sampler and free_sampler implementations in native code
rg -n 'reset_sampler\|free_sampler' -A15 -B2 ./sdk/runanywhere-kotlin/native/llama-jni/

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if reset is called inside free_sampler already (redundancy check)
rg -n 'llama_sampler_free\|sampler_free' -A20 -B2 ./sdk/runanywhere-kotlin/native/llama-jni/

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check the native source files directly
cat ./sdk/runanywhere-kotlin/native/llama-jni/src/llama-android.cpp | head -100

Length of output: 3199


🏁 Script executed:

#!/bin/bash
# Search for JNI method implementations
rg -n 'Java_.*reset_sampler\|Java_.*free_sampler' ./sdk/runanywhere-kotlin/native/llama-jni/

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for any sampler or grammar related code in native files
rg -in 'sampler\|grammar' -C3 ./sdk/runanywhere-kotlin/native/llama-jni/ | head -80

Length of output: 0


🏁 Script executed:

#!/bin/bash
# List all files in native directory
find ./sdk/runanywhere-kotlin/native -type f -name "*.cpp" -o -name "*.c" -o -name "*.h" | head -30

Length of output: 409


🏁 Script executed:

#!/bin/bash
# Check entire llama-android.cpp for full content
wc -l ./sdk/runanywhere-kotlin/native/llama-jni/src/llama-android.cpp

Length of output: 67


🏁 Script executed:

#!/bin/bash
# Search for any occurrence of reset or sampler in all cpp/c/h files
find ./sdk/runanywhere-kotlin/native -type f \( -name "*.cpp" -o -name "*.c" -o -name "*.h" \) -exec grep -l "reset_sampler\|free_sampler" {} \;

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if llama.h defines these functions
grep -r "reset_sampler\|free_sampler" ./sdk/runanywhere-kotlin/native/ 2>/dev/null | head -20

Length of output: 132


🏁 Script executed:

#!/bin/bash
# Read the implementation notes
cat ./sdk/runanywhere-kotlin/native/llama-jni/GRAMMAR_IMPLEMENTATION_NOTES.md

Length of output: 14091


🏁 Script executed:

#!/bin/bash
# Search for reset_sampler and free_sampler in grammar_jni.cpp
rg -n 'reset_sampler\|free_sampler' -A10 -B5 ./sdk/runanywhere-kotlin/native/llama-jni/src/grammar_jni.cpp

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if there are llama.cpp sampler functions being used
rg -n 'llama_sampler' ./sdk/runanywhere-kotlin/native/llama-jni/src/grammar_jni.cpp | head -30

Length of output: 1193


🏁 Script executed:

#!/bin/bash
# Search entire llama-android.cpp for reset_sampler implementation
grep -n "reset_sampler" ./sdk/runanywhere-kotlin/native/llama-jni/src/llama-android.cpp

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check the context around line 325 in LLamaAndroid.kt to see if grammar is still active
sed -n '200,350p' ./sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | grep -n -A20 "reset_sampler\|grammar"

Length of output: 6465


🏁 Script executed:

#!/bin/bash
# Look for any active grammar usage or calls in the kotlin file
rg -n 'grammar\|Grammar' ./sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt

Length of output: 0


reset_sampler() JNI method is not implemented in native code, causing runtime failure.

The reset_sampler(samplerChain) call at line 325 references a JNI method that has no native implementation. Searching llama-android.cpp yields no Java_com_runanywhere_sdk_llm_llamacpp_LLamaAndroid_reset_1sampler binding. Calling this will crash with UnsatisfiedLinkError at runtime.

Additionally, the grammar-based implementation containing this code is marked as DEPRECATED in GRAMMAR_IMPLEMENTATION_NOTES.md and was abandoned in favor of a prompt-based approach due to "llama.cpp grammar bugs causing SIGABRT crashes."

Either implement the missing JNI binding for reset_sampler(), or remove the deprecated sendWithGrammarSchema() function and all grammar-related code paths entirely.

Comment on lines +362 to +463
suspend fun generateWithTools(
prompt: String,
tools: List<Tool>? = null,
options: RunAnywhereGenerationOptions? = null
): ToolCallResult = withContext(Dispatchers.IO) {
if (!isInitialized) {
throw IllegalStateException("LlamaCppService not initialized")
}

// Get tools from parameter or options (parameter takes precedence)
val effectiveTools = tools ?: options?.tools ?: emptyList()

if (effectiveTools.isEmpty()) {
logger.warn("No tools provided for generateWithTools()")
return@withContext ToolCallResult(
success = false,
text = "Error: No tools provided",
toolCalls = emptyList(),
mode = ToolCallingMode.GRAMMAR_BASED
)
}

logger.info("🛠️ generateWithTools() called with ${effectiveTools.size} tools")
logger.info("📊 LlamaCppService state - isInitialized: $isInitialized, modelPath: $modelPath")

// Ensure service is initialized
if (!isInitialized) {
throw IllegalStateException("LlamaCppService not initialized. Model must be loaded before calling generateWithTools().")
}

try {
// Generate JSON schema from tools
val jsonSchema = Tool.generateSchema(effectiveTools)

// Convert to JSON string manually (avoid serialization issues with Any type)
val schemaJson = buildJsonString(jsonSchema)

logger.info("📋 JSON Schema generated: ${schemaJson.take(200)}...")

// Build messages with tool instruction
val userMessage = Message(role = MessageRole.USER, content = prompt)
val systemMessage = Message(
role = MessageRole.SYSTEM,
content = buildToolSystemPrompt(effectiveTools)
)

val formattedPrompt = buildPrompt(
listOf(systemMessage, userMessage),
systemPrompt = null
)

logger.info("📝 Tool calling prompt length: ${formattedPrompt.length}")

// Generate with grammar constraints
val result = StringBuilder()
val effectiveOptions = options ?: RunAnywhereGenerationOptions(
maxTokens = configuration.maxTokens,
temperature = 0.7f
)

val config = LlamaModelConfig(
temperature = effectiveOptions.temperature,
topK = 40,
minP = 0.05f
)

// Use sendWithGrammarSchema which handles grammar creation and generation
// on the same thread, avoiding thread-local state issues
llama.sendWithGrammarSchema(
formattedPrompt,
schemaJson,
config,
parseSpecialTokens = true
).collect { token ->
result.append(token)
}

val generatedText = result.toString()
logger.info("✅ Generation complete, response length: ${generatedText.length}")
logger.info("📄 Response: ${generatedText.take(300)}...")

// Parse tool calls from JSON response
val toolCalls = parseToolCalls(generatedText)

ToolCallResult(
success = true,
text = generatedText,
toolCalls = toolCalls,
mode = ToolCallingMode.GRAMMAR_BASED
)

} catch (e: Exception) {
logger.error("❌ Tool calling failed", e)
ToolCallResult(
success = false,
text = "Error: ${e.message}",
toolCalls = emptyList(),
mode = ToolCallingMode.GRAMMAR_BASED
)
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Enforce maxTokens in grammar-based generation.

sendWithGrammarSchema collection path doesn’t stop at a token budget; add a counter or pass a limit if the native API supports it. Prevents long-running loops when grammar allows large outputs.

-            val result = StringBuilder()
+            val result = StringBuilder()
+            var tokenCount = 0
+            val maxTokens = effectiveOptions.maxTokens
 ...
-            ).collect { token ->
-                result.append(token)
-            }
+            ).collect { token ->
+                result.append(token)
+                if (++tokenCount >= maxTokens) {
+                    // TODO: call llama.cancel if available; otherwise rely on producer completion
+                    return@collect
+                }
+            }

If LLamaAndroid.sendWithGrammarSchema already supports a length limit, prefer passing it explicitly instead.


🏁 Script executed:

#!/bin/bash
rg -nC2 'sendWithGrammarSchema\('

Length of output: 2288



🏁 Script executed:

#!/bin/bash
# Get the full sendWithGrammarSchema function signature and implementation
rg -nA30 'fun sendWithGrammarSchema\(' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt

Length of output: 1761


🏁 Script executed:

#!/bin/bash
# Inspect LlamaModelConfig class
rg -nB2 -A10 'class LlamaModelConfig\|data class LlamaModelConfig' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if there are other usages of sendWithGrammarSchema to understand token limit handling
rg -nB5 -A15 'sendWithGrammarSchema' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt | head -50

Length of output: 954


🏁 Script executed:

#!/bin/bash
# Search for token counting or max tokens handling in LLamaAndroid
rg -nP '(tokenCount|maxTokens|token.*limit)' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt

Length of output: 1399


🏁 Script executed:

#!/bin/bash
# Look for any existing token limit enforcement in the sendWithGrammarSchema implementation
rg -nA50 'fun sendWithGrammarSchema\(' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | grep -E '(maxToken|tokenCount|token.*limit|take\(|take\s)'

Length of output: 869


🏁 Script executed:

#!/bin/bash
# Get the full implementation to see if token limiting is built-in
sed -n '261,320p' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt

Length of output: 3179


🏁 Script executed:

#!/bin/bash
# Check the native binding to see if it supports maxTokens parameter
rg -nC3 'external.*sendWithGrammarSchema|sendWithGrammarSchema.*=' sdk/runanywhere-kotlin/native/llama-jni/src/

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Find the actual native JNI binding or pure Kotlin implementation
fd -t f 'LLamaAndroid\.kt' -x grep -l 'sendWithGrammarSchema' {}

Length of output: 132


🏁 Script executed:

#!/bin/bash
# Look at complete sendWithGrammarSchema function including return type and full body
ast-grep --pattern 'fun sendWithGrammarSchema($_, $_) { $$$ }'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Simpler approach: get lines 261 to 320 to see full function
wc -l sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt && sed -n '261,350p' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt

Length of output: 4605


🏁 Script executed:

#!/bin/bash
# Search for the close of sendWithGrammarSchema function to understand its full scope
rg -n 'fun sendWithGrammarSchema' -A100 sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | head -120

Length of output: 5224


🏁 Script executed:

#!/bin/bash
# Check for token limit in Flow.take() or similar limiting operations
rg -nC5 'fun sendWithGrammarSchema' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt

Length of output: 488


🏁 Script executed:

#!/bin/bash
# List all the functions in LLamaAndroid to understand structure
rg -n 'fun ' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | head -30

Length of output: 1691


🏁 Script executed:

#!/bin/bash
# Get complete sendWithGrammarSchema by looking for the function definition and understanding its scope
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | grep -n 'fun sendWithGrammarSchema' -A 80 | head -85

Length of output: 4453


🏁 Script executed:

#!/bin/bash
# Check if any Flow limiting (like take, takeWhile) is applied
rg 'sendWithGrammarSchema|\.take\(|\.takeWhile' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt -C3

Length of output: 2667


🏁 Script executed:

#!/bin/bash
# Look for generateWithTools implementation details around token limiting
sed -n '362,463p' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt | grep -nE '(maxToken|tokenCount|\.take\(|maxToken)'

Length of output: 211


🏁 Script executed:

#!/bin/bash
# Check the exact parameters passed to sendWithGrammarSchema
sed -n '430,436p' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt

Length of output: 237


🏁 Script executed:

#!/bin/bash
# Find all properties of LlamaModelConfig
rg -n 'class LlamaModelConfig|data class LlamaModelConfig' -A 15 sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/

Length of output: 2602


🏁 Script executed:

#!/bin/bash
# Search across entire project for LlamaModelConfig definition
fd -t f '\.kt$' sdk/runanywhere-kotlin/ -x rg -l 'class LlamaModelConfig|data class LlamaModelConfig' {}

Length of output: 130


🏁 Script executed:

#!/bin/bash
# Direct lookup of LlamaModelConfig
rg 'LlamaModelConfig' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt -B2 -A8 | head -40

Length of output: 820


🏁 Script executed:

#!/bin/bash
# Check what files exist in the directory
ls -la sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/

Length of output: 389


🏁 Script executed:

#!/bin/bash
# Get complete LLamaAndroid.kt file to understand structure
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | head -400

Length of output: 16833


🏁 Script executed:

#!/bin/bash
# Simpler: just extract lines around sendWithGrammarSchema definition
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | sed -n '250,350p'

Length of output: 4943


🏁 Script executed:

#!/bin/bash
# Look at the entire generateWithTools function to understand current structure
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt | sed -n '362,463p' | grep -E '(maxToken|tokenCount|\.take|sendWithGrammarSchema)' -B2 -A2

Length of output: 955


🏁 Script executed:

#!/bin/bash
# Get full context of effectiveOptions creation and maxTokens usage
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt | sed -n '415,440p'

Length of output: 993


🏁 Script executed:

#!/bin/bash
# Check the full LLamaAndroid.kt around sendWithGrammarSchema
head -n 300 sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | tail -n 60

Length of output: 2826


🏁 Script executed:

#!/bin/bash
# Try to get the function signature and its parameters fully
rg 'fun sendWithGrammarSchema' -A5 sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt

Length of output: 197


🏁 Script executed:

#!/bin/bash
# Find where maxTokens is used in LlamaCppService
rg 'maxTokens' sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt -n

Length of output: 695


🏁 Script executed:

#!/bin/bash
# Check if sendWithGrammarSchema supports any token limit parameter - look at actual signature
python3 << 'EOF'
import re

# Read the file
with open('sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt', 'r') as f:
    content = f.read()

# Find the sendWithGrammarSchema function
match = re.search(r'(fun\s+sendWithGrammarSchema\s*\([^)]*\)[^{]*\{)', content, re.DOTALL)
if match:
    print("Function signature found:")
    print(match.group(1)[:300])
else:
    print("Function not found")
EOF

Length of output: 219


🏁 Script executed:

#!/bin/bash
# Get complete sendWithGrammarSchema function definition
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LLamaAndroid.kt | sed -n '261,310p'

Length of output: 2562


🏁 Script executed:

#!/bin/bash
# Check generateWithTools for token limiting around sendWithGrammarSchema call
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt | sed -n '420,450p'

Length of output: 1110


🏁 Script executed:

#!/bin/bash
# Search for LlamaModelConfig definition
rg -n 'data class LlamaModelConfig|class LlamaModelConfig' -A12 sdk/runanywhere-kotlin/

Length of output: 2053


🏁 Script executed:

#!/bin/bash
# Check if .take() or token limiting is applied after sendWithGrammarSchema call
cat sdk/runanywhere-kotlin/modules/runanywhere-llm-llamacpp/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/llm/llamacpp/LlamaCppService.kt | sed -n '425,442p'

Length of output: 665


Enforce maxTokens in tool calling flow; align with generate() pattern.

The generateWithTools() method creates effectiveOptions.maxTokens (line 418) but never enforces it. Unlike generate() and streamGenerate() which count tokens in their collectors, the generateWithTools() collector appends all tokens without checking the limit.

Although sendWithGrammarSchema() uses a hardcoded maxTokensToGenerate = 512 and enforces a 2000-character safety limit, generateWithTools() should also implement caller-level token counting for consistency and to respect the configured effectiveOptions.maxTokens:

-            ).collect { token ->
-                result.append(token)
+            val maxTokens = effectiveOptions.maxTokens
+            var tokenCount = 0
+            ).collect { token ->
+                result.append(token)
+                if (++tokenCount >= maxTokens) {
+                    // Stop respecting caller's maxTokens preference
+                    return@collect
+                }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
suspend fun generateWithTools(
prompt: String,
tools: List<Tool>? = null,
options: RunAnywhereGenerationOptions? = null
): ToolCallResult = withContext(Dispatchers.IO) {
if (!isInitialized) {
throw IllegalStateException("LlamaCppService not initialized")
}
// Get tools from parameter or options (parameter takes precedence)
val effectiveTools = tools ?: options?.tools ?: emptyList()
if (effectiveTools.isEmpty()) {
logger.warn("No tools provided for generateWithTools()")
return@withContext ToolCallResult(
success = false,
text = "Error: No tools provided",
toolCalls = emptyList(),
mode = ToolCallingMode.GRAMMAR_BASED
)
}
logger.info("🛠️ generateWithTools() called with ${effectiveTools.size} tools")
logger.info("📊 LlamaCppService state - isInitialized: $isInitialized, modelPath: $modelPath")
// Ensure service is initialized
if (!isInitialized) {
throw IllegalStateException("LlamaCppService not initialized. Model must be loaded before calling generateWithTools().")
}
try {
// Generate JSON schema from tools
val jsonSchema = Tool.generateSchema(effectiveTools)
// Convert to JSON string manually (avoid serialization issues with Any type)
val schemaJson = buildJsonString(jsonSchema)
logger.info("📋 JSON Schema generated: ${schemaJson.take(200)}...")
// Build messages with tool instruction
val userMessage = Message(role = MessageRole.USER, content = prompt)
val systemMessage = Message(
role = MessageRole.SYSTEM,
content = buildToolSystemPrompt(effectiveTools)
)
val formattedPrompt = buildPrompt(
listOf(systemMessage, userMessage),
systemPrompt = null
)
logger.info("📝 Tool calling prompt length: ${formattedPrompt.length}")
// Generate with grammar constraints
val result = StringBuilder()
val effectiveOptions = options ?: RunAnywhereGenerationOptions(
maxTokens = configuration.maxTokens,
temperature = 0.7f
)
val config = LlamaModelConfig(
temperature = effectiveOptions.temperature,
topK = 40,
minP = 0.05f
)
// Use sendWithGrammarSchema which handles grammar creation and generation
// on the same thread, avoiding thread-local state issues
llama.sendWithGrammarSchema(
formattedPrompt,
schemaJson,
config,
parseSpecialTokens = true
).collect { token ->
result.append(token)
}
val generatedText = result.toString()
logger.info("✅ Generation complete, response length: ${generatedText.length}")
logger.info("📄 Response: ${generatedText.take(300)}...")
// Parse tool calls from JSON response
val toolCalls = parseToolCalls(generatedText)
ToolCallResult(
success = true,
text = generatedText,
toolCalls = toolCalls,
mode = ToolCallingMode.GRAMMAR_BASED
)
} catch (e: Exception) {
logger.error("❌ Tool calling failed", e)
ToolCallResult(
success = false,
text = "Error: ${e.message}",
toolCalls = emptyList(),
mode = ToolCallingMode.GRAMMAR_BASED
)
}
}
suspend fun generateWithTools(
prompt: String,
tools: List<Tool>? = null,
options: RunAnywhereGenerationOptions? = null
): ToolCallResult = withContext(Dispatchers.IO) {
if (!isInitialized) {
throw IllegalStateException("LlamaCppService not initialized")
}
// Get tools from parameter or options (parameter takes precedence)
val effectiveTools = tools ?: options?.tools ?: emptyList()
if (effectiveTools.isEmpty()) {
logger.warn("No tools provided for generateWithTools()")
return@withContext ToolCallResult(
success = false,
text = "Error: No tools provided",
toolCalls = emptyList(),
mode = ToolCallingMode.GRAMMAR_BASED
)
}
logger.info("🛠️ generateWithTools() called with ${effectiveTools.size} tools")
logger.info("📊 LlamaCppService state - isInitialized: $isInitialized, modelPath: $modelPath")
// Ensure service is initialized
if (!isInitialized) {
throw IllegalStateException("LlamaCppService not initialized. Model must be loaded before calling generateWithTools().")
}
try {
// Generate JSON schema from tools
val jsonSchema = Tool.generateSchema(effectiveTools)
// Convert to JSON string manually (avoid serialization issues with Any type)
val schemaJson = buildJsonString(jsonSchema)
logger.info("📋 JSON Schema generated: ${schemaJson.take(200)}...")
// Build messages with tool instruction
val userMessage = Message(role = MessageRole.USER, content = prompt)
val systemMessage = Message(
role = MessageRole.SYSTEM,
content = buildToolSystemPrompt(effectiveTools)
)
val formattedPrompt = buildPrompt(
listOf(systemMessage, userMessage),
systemPrompt = null
)
logger.info("📝 Tool calling prompt length: ${formattedPrompt.length}")
// Generate with grammar constraints
val result = StringBuilder()
val effectiveOptions = options ?: RunAnywhereGenerationOptions(
maxTokens = configuration.maxTokens,
temperature = 0.7f
)
val config = LlamaModelConfig(
temperature = effectiveOptions.temperature,
topK = 40,
minP = 0.05f
)
// Use sendWithGrammarSchema which handles grammar creation and generation
// on the same thread, avoiding thread-local state issues
val maxTokens = effectiveOptions.maxTokens
var tokenCount = 0
llama.sendWithGrammarSchema(
formattedPrompt,
schemaJson,
config,
parseSpecialTokens = true
).collect { token ->
result.append(token)
if (++tokenCount >= maxTokens) {
// Stop respecting caller's maxTokens preference
return@collect
}
}
val generatedText = result.toString()
logger.info("✅ Generation complete, response length: ${generatedText.length}")
logger.info("📄 Response: ${generatedText.take(300)}...")
// Parse tool calls from JSON response
val toolCalls = parseToolCalls(generatedText)
ToolCallResult(
success = true,
text = generatedText,
toolCalls = toolCalls,
mode = ToolCallingMode.GRAMMAR_BASED
)
} catch (e: Exception) {
logger.error("❌ Tool calling failed", e)
ToolCallResult(
success = false,
text = "Error: ${e.message}",
toolCalls = emptyList(),
mode = ToolCallingMode.GRAMMAR_BASED
)
}
}

private val logger = SDKLogger("PromptBasedToolCalling")
private val promptBuilder = PromptBuilder()
private val responseParser = ResponseParser()
private val deduplicator = Deduplicator()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify thread-safety of Deduplicator.

If multiple coroutines invoke generateWithTools concurrently, the shared deduplicator instance could have race conditions. Confirm that Deduplicator is thread-safe or consider synchronization.

Run the following script to examine the Deduplicator implementation:

Also applies to: 71-71, 243-243


🏁 Script executed:

#!/bin/bash
# Check Deduplicator implementation for thread-safety annotations or synchronization
fd -e kt Deduplicator | xargs cat

Length of output: 2622


Thread-safety issue confirmed: Deduplicator is not thread-safe.

The Deduplicator class uses a plain mutableSetOf<String>() with no synchronization mechanisms (no locks, @Volatile, or thread-safe collections). The isDuplicate() method performs unsafe read-check-write operations: two concurrent coroutines could both read seenToolCalls simultaneously, both see the hash as new, and both add it. Additionally, the excess removal logic (seenToolCalls.drop() followed by remove()) is vulnerable to ConcurrentModificationException.

Required fix: Synchronize access to seenToolCalls using either:

  • Collections.synchronizedSet() to wrap the mutable set, or
  • synchronized blocks around all read-write operations in isDuplicate(), clear(), and size()

Also applies to: 71-71, 243-243

Comment on lines +71 to +79
if (deduplicator.isDuplicate(parseResult.name, parseResult.arguments)) {
logger.warn("Duplicate tool call detected, treating as regular response")
return ToolCallResult(
success = true,
text = "I've already attempted that action. Let me try a different approach: $response",
toolCalls = emptyList(),
mode = ToolCallingMode.PROMPT_BASED
)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider excluding raw response from user-visible text.

Including the raw response in the user-facing message at Line 75 could expose internal model outputs, malformed JSON, or debug information.

Apply this diff to sanitize the output:

-                            text = "I've already attempted that action. Let me try a different approach: $response",
+                            text = "I've already attempted that action. Let me try a different approach.",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (deduplicator.isDuplicate(parseResult.name, parseResult.arguments)) {
logger.warn("Duplicate tool call detected, treating as regular response")
return ToolCallResult(
success = true,
text = "I've already attempted that action. Let me try a different approach: $response",
toolCalls = emptyList(),
mode = ToolCallingMode.PROMPT_BASED
)
}
if (deduplicator.isDuplicate(parseResult.name, parseResult.arguments)) {
logger.warn("Duplicate tool call detected, treating as regular response")
return ToolCallResult(
success = true,
text = "I've already attempted that action. Let me try a different approach.",
toolCalls = emptyList(),
mode = ToolCallingMode.PROMPT_BASED
)
}
🤖 Prompt for AI Agents
In
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBasedToolCalling.kt
around lines 71 to 79, remove inclusion of the raw `response` in the user-facing
message and replace it with a sanitized or generic phrase (e.g., "I'll try a
different approach" or a short, sanitized summary of the previous attempt); keep
the detailed raw `response` only in an internal debug log (logger.debug or
logger.trace) so developers can inspect it without exposing it to users, and
ensure any summary is sanitized to strip or escape malformed JSON or sensitive
tokens before display.

Comment on lines +108 to +118
is ParseResult.InvalidFormat -> {
lastError = parseResult.reason
logger.warn("Invalid format (attempt ${retryCount + 1}/$maxRetries): ${parseResult.reason}")

if (retryCount < maxRetries) {
retryCount++
// Add error feedback to help model self-correct
val errorFeedback = buildErrorFeedback(parseResult.reason, parseResult.rawResponse)
// We could append this to the conversation and retry, but for now just retry
continue
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Error feedback is built but never used.

The buildErrorFeedback result is computed but not passed to subsequent retry attempts. The retry simply reuses the same fullPrompt, so the model won't receive self-correction guidance.

Consider either:

  1. Appending errorFeedback to the prompt for the next iteration, or
  2. Removing the unused buildErrorFeedback call to avoid confusion.
                    if (retryCount < maxRetries) {
                        retryCount++
-                       // Add error feedback to help model self-correct
-                       val errorFeedback = buildErrorFeedback(parseResult.reason, parseResult.rawResponse)
-                       // We could append this to the conversation and retry, but for now just retry
                        continue
                    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
is ParseResult.InvalidFormat -> {
lastError = parseResult.reason
logger.warn("Invalid format (attempt ${retryCount + 1}/$maxRetries): ${parseResult.reason}")
if (retryCount < maxRetries) {
retryCount++
// Add error feedback to help model self-correct
val errorFeedback = buildErrorFeedback(parseResult.reason, parseResult.rawResponse)
// We could append this to the conversation and retry, but for now just retry
continue
}
is ParseResult.InvalidFormat -> {
lastError = parseResult.reason
logger.warn("Invalid format (attempt ${retryCount + 1}/$maxRetries): ${parseResult.reason}")
if (retryCount < maxRetries) {
retryCount++
continue
}


import com.runanywhere.sdk.models.Tool
import com.runanywhere.sdk.models.ToolParameter
import kotlinx.serialization.json.Json
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid manual JSON; ensure proper escaping.

String‑building will break on quotes/backslashes and forces all values to strings. Use kotlinx.serialization’s JsonObject builder.

Apply:

-import kotlinx.serialization.json.Json
+import kotlinx.serialization.json.buildJsonObject
+import kotlinx.serialization.json.put
@@
-    private fun formatExampleCall(toolName: String, arguments: Map<String, String>): String {
-        // Manual JSON formatting for simple cases
-        val argsJson = arguments.entries.joinToString(", ") { (key, value) ->
-            """"$key": "$value""""
-        }
-        return """{"name": "$toolName", "arguments": {$argsJson}}"""
-    }
+    private fun formatExampleCall(toolName: String, arguments: Map<String, String>): String {
+        val argsObj = buildJsonObject {
+            arguments.forEach { (k, v) -> put(k, v) }
+        }
+        val obj = buildJsonObject {
+            put("name", toolName)
+            put("arguments", argsObj)
+        }
+        return obj.toString()
+    }

Also applies to: 90-96

🤖 Prompt for AI Agents
In
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/PromptBuilder.kt
around lines 5 and 90-96, the code manually builds JSON strings which breaks on
quotes/backslashes and coerces everything to strings; replace the manual
string-building with kotlinx.serialization.json builders: add imports for
kotlinx.serialization.json.buildJsonObject, buildJsonArray, JsonPrimitive (and
Json if not already imported), construct the payloads using buildJsonObject {
put("key", JsonPrimitive(value)) / put("nested", buildJsonObject { ... }) /
put("arr", buildJsonArray { add(JsonPrimitive(...)) }) so types are preserved
and escaping is automatic, then serialize with Json.encodeToString(JsonObject)
(or Json.encodeToString on the built object) where needed; remove the string
concatenation code and ensure arrays/objects use the appropriate builders.

Comment on lines +31 to +33
logger.debug("Parsing response: ${response.take(200)}...")

// Strategy 1: Try direct JSON parse
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log hygiene: response may contain PII; keep debug and cap length.

You already cap to 200 chars; ensure logger config keeps this at debug only and redact if needed.

🤖 Prompt for AI Agents
In
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/toolcalling/ResponseParser.kt
around lines 31 to 33, the debug log of the response risks emitting PII even
though it's capped; update the code so the response string is only constructed
when debug logging is enabled (use logger.isDebugEnabled or equivalent) and
apply a redaction function that strips or masks known sensitive fields before
truncating to 200 chars, then log the redacted, capped string; ensure the logger
call stays at debug level and do not perform the substring/templating work when
debug is disabled.

This replaces the grammar-based approach which was crashing (issue #182).

**New Implementation:**
- PromptBuilder: Builds system prompts with tool definitions and examples
- ResponseParser: Extracts JSON with 3 fallback strategies (direct, code blocks, embedded)
- Deduplicator: Prevents infinite loops via hash-based tracking
- PromptBasedToolCalling: Main orchestrator with retry logic (up to 2 retries)

**API Changes:**
- Deprecated: generateWithTools() - grammar-based (crashes after ~10 tokens)
- New: generateWithToolsPromptBased() - production-ready prompt-based approach
- Added: Tool.withExamples() - DSL for adding few-shot examples
- Added: ToolExample data class for few-shot learning

**Example Usage:**
```kotlin
val calculator = tool("calculate", "Perform math") {
    stringParameter("expression", "Math expression", required = true)
}.withExamples(
    example("What is 15 * 7?", mapOf("expression" to "15 * 7"))
)

val result = RunAnywhere.generateWithToolsPromptBased(
    prompt = "Calculate 123 + 456",
    tools = listOf(calculator),
    options = RunAnywhereGenerationOptions(temperature = 0.3f)
)
```

**Success Rate:** 85-95% with Qwen 2.5 0.5B (vs 70-85% with crashes for grammar)

**Android Example Updated:**
- ToolCallingViewModel now uses prompt-based approach
- Added few-shot examples to all demo tools (time, calculator, random number)
- Updated UI descriptions

**Files Added:**
- PromptBuilder.kt - System prompt generation with few-shot examples
- ResponseParser.kt - JSON extraction with multiple fallback strategies
- Deduplicator.kt - Hash-based duplicate detection
- PromptBasedToolCalling.kt - Main orchestrator with retry logic
- PromptBasedToolCallingExtensions.kt - Public API extensions

**Documentation:**
- Removed old TOOL_CALLING.md (contained deprecated grammar approach)
- All grammar code preserved with issue #182 references

**Related:** Issue #182
@sanchitmonga22 sanchitmonga22 force-pushed the smonga/tool_calling_kotlin branch from bc017e4 to 1f7774b Compare October 27, 2025 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants