From ee34f74de1cec99f34b1d5cf03bcf33322bff81e Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 21 Jan 2026 00:48:08 +0000 Subject: [PATCH 01/13] Add prompt_runtime setting to rules for Claude Code headless execution Adds a new `prompt_runtime` frontmatter setting for DeepWork rules: - `send_to_stopping_agent` (default): Returns prompt to the triggering agent - `claude`: Invokes Claude Code in headless mode to process the rule This enables rules to be processed by a dedicated Claude instance instead of being returned to the agent that triggered the hook, useful for: - Cross-platform rule processing (e.g., when Gemini triggers a rule) - Autonomous rule handling without blocking the main agent - Consistent rule processing regardless of triggering agent Changes: - Add prompt_runtime field to rules schema - Add PromptRuntime enum and parsing logic to rules_parser.py - Implement Claude headless invocation in rules_check.py - Update all existing rules with prompt_runtime: send_to_stopping_agent - Update documentation with examples and field reference --- .../architecture-documentation-accuracy.md | 1 + .deepwork/rules/manual-test-command-action.md | 1 + .deepwork/rules/manual-test-created-mode.md | 1 + .../manual-test-infinite-block-command.md | 1 + .../manual-test-infinite-block-prompt.md | 1 + .deepwork/rules/manual-test-multi-safety.md | 1 + .deepwork/rules/manual-test-pair-mode.md | 1 + .deepwork/rules/manual-test-set-mode.md | 1 + .deepwork/rules/manual-test-trigger-safety.md | 1 + .deepwork/rules/readme-accuracy.md | 1 + .../rules/skill-template-best-practices.md | 1 + .../rules/standard-jobs-source-of-truth.md | 1 + .deepwork/rules/uv-lock-sync.md | 1 + .../rules/version-and-changelog-update.md | 1 + doc/rules_syntax.md | 83 ++++++++ src/deepwork/core/rules_parser.py | 24 +++ src/deepwork/hooks/rules_check.py | 180 +++++++++++++++++- src/deepwork/schemas/rules_schema.py | 6 + 18 files changed, 304 insertions(+), 3 deletions(-) diff --git a/.deepwork/rules/architecture-documentation-accuracy.md b/.deepwork/rules/architecture-documentation-accuracy.md index 91798109..5e77acd1 100644 --- a/.deepwork/rules/architecture-documentation-accuracy.md +++ b/.deepwork/rules/architecture-documentation-accuracy.md @@ -3,6 +3,7 @@ name: Architecture Documentation Accuracy trigger: src/**/* safety: doc/architecture.md compare_to: base +prompt_runtime: send_to_stopping_agent --- Source code in src/ has been modified. Please review doc/architecture.md for accuracy: 1. Verify the documented architecture matches the current implementation diff --git a/.deepwork/rules/manual-test-command-action.md b/.deepwork/rules/manual-test-command-action.md index 966ab2de..31f1992a 100644 --- a/.deepwork/rules/manual-test-command-action.md +++ b/.deepwork/rules/manual-test-command-action.md @@ -5,6 +5,7 @@ action: command: echo "$(date '+%Y-%m-%d %H:%M:%S') - Command triggered by edit to {file}" >> manual_tests/test_command_action/test_command_action_log.txt run_for: each_match compare_to: prompt +prompt_runtime: send_to_stopping_agent --- # Manual Test: Command Action diff --git a/.deepwork/rules/manual-test-created-mode.md b/.deepwork/rules/manual-test-created-mode.md index abb6108d..8c9fb33d 100644 --- a/.deepwork/rules/manual-test-created-mode.md +++ b/.deepwork/rules/manual-test-created-mode.md @@ -2,6 +2,7 @@ name: "Manual Test: Created Mode" created: manual_tests/test_created_mode/*.yml compare_to: prompt +prompt_runtime: send_to_stopping_agent --- # Manual Test: Created Mode (File Creation Trigger) diff --git a/.deepwork/rules/manual-test-infinite-block-command.md b/.deepwork/rules/manual-test-infinite-block-command.md index 8f8b24b4..85438e96 100644 --- a/.deepwork/rules/manual-test-infinite-block-command.md +++ b/.deepwork/rules/manual-test-infinite-block-command.md @@ -5,6 +5,7 @@ action: command: "false" run_for: each_match compare_to: prompt +prompt_runtime: send_to_stopping_agent --- # Manual Test: Infinite Block Command (Promise Required) diff --git a/.deepwork/rules/manual-test-infinite-block-prompt.md b/.deepwork/rules/manual-test-infinite-block-prompt.md index 67c97414..7f9d629a 100644 --- a/.deepwork/rules/manual-test-infinite-block-prompt.md +++ b/.deepwork/rules/manual-test-infinite-block-prompt.md @@ -2,6 +2,7 @@ name: "Manual Test: Infinite Block Prompt" trigger: manual_tests/test_infinite_block_prompt/test_infinite_block_prompt.py compare_to: prompt +prompt_runtime: send_to_stopping_agent --- # Manual Test: Infinite Block Prompt (Promise Required) diff --git a/.deepwork/rules/manual-test-multi-safety.md b/.deepwork/rules/manual-test-multi-safety.md index 4ce978cb..3e19a710 100644 --- a/.deepwork/rules/manual-test-multi-safety.md +++ b/.deepwork/rules/manual-test-multi-safety.md @@ -5,6 +5,7 @@ safety: - manual_tests/test_multi_safety/test_multi_safety_changelog.md - manual_tests/test_multi_safety/test_multi_safety_version.txt compare_to: prompt +prompt_runtime: send_to_stopping_agent --- # Manual Test: Multiple Safety Patterns diff --git a/.deepwork/rules/manual-test-pair-mode.md b/.deepwork/rules/manual-test-pair-mode.md index 9c2379bf..d0ed65ef 100644 --- a/.deepwork/rules/manual-test-pair-mode.md +++ b/.deepwork/rules/manual-test-pair-mode.md @@ -4,6 +4,7 @@ pair: trigger: manual_tests/test_pair_mode/test_pair_mode_trigger.py expects: manual_tests/test_pair_mode/test_pair_mode_expected.md compare_to: prompt +prompt_runtime: send_to_stopping_agent --- # Manual Test: Pair Mode (Directional Correspondence) diff --git a/.deepwork/rules/manual-test-set-mode.md b/.deepwork/rules/manual-test-set-mode.md index abe504ec..41e38b63 100644 --- a/.deepwork/rules/manual-test-set-mode.md +++ b/.deepwork/rules/manual-test-set-mode.md @@ -4,6 +4,7 @@ set: - manual_tests/test_set_mode/test_set_mode_source.py - manual_tests/test_set_mode/test_set_mode_test.py compare_to: prompt +prompt_runtime: send_to_stopping_agent --- # Manual Test: Set Mode (Bidirectional Correspondence) diff --git a/.deepwork/rules/manual-test-trigger-safety.md b/.deepwork/rules/manual-test-trigger-safety.md index b144a2a0..be391dd6 100644 --- a/.deepwork/rules/manual-test-trigger-safety.md +++ b/.deepwork/rules/manual-test-trigger-safety.md @@ -3,6 +3,7 @@ name: "Manual Test: Trigger Safety" trigger: manual_tests/test_trigger_safety_mode/test_trigger_safety_mode.py safety: manual_tests/test_trigger_safety_mode/test_trigger_safety_mode_doc.md compare_to: prompt +prompt_runtime: send_to_stopping_agent --- # Manual Test: Trigger/Safety Mode diff --git a/.deepwork/rules/readme-accuracy.md b/.deepwork/rules/readme-accuracy.md index 9e75c596..ccc1218f 100644 --- a/.deepwork/rules/readme-accuracy.md +++ b/.deepwork/rules/readme-accuracy.md @@ -3,6 +3,7 @@ name: README Accuracy trigger: src/**/* safety: README.md compare_to: base +prompt_runtime: send_to_stopping_agent --- Source code in src/ has been modified. Please review README.md for accuracy: 1. Verify project overview still reflects current functionality diff --git a/.deepwork/rules/skill-template-best-practices.md b/.deepwork/rules/skill-template-best-practices.md index ff33ecfd..941cff57 100644 --- a/.deepwork/rules/skill-template-best-practices.md +++ b/.deepwork/rules/skill-template-best-practices.md @@ -2,6 +2,7 @@ name: Skill Template Best Practices trigger: src/deepwork/templates/**/skill-job*.jinja compare_to: prompt +prompt_runtime: send_to_stopping_agent --- Skill template files are being modified. Ensure the generated skills follow these best practices: diff --git a/.deepwork/rules/standard-jobs-source-of-truth.md b/.deepwork/rules/standard-jobs-source-of-truth.md index 2d0092c9..086b5707 100644 --- a/.deepwork/rules/standard-jobs-source-of-truth.md +++ b/.deepwork/rules/standard-jobs-source-of-truth.md @@ -7,6 +7,7 @@ safety: - src/deepwork/standard_jobs/deepwork_jobs/**/* - src/deepwork/standard_jobs/deepwork_rules/**/* compare_to: base +prompt_runtime: send_to_stopping_agent --- You modified files in `.deepwork/jobs/deepwork_jobs/` or `.deepwork/jobs/deepwork_rules/`. diff --git a/.deepwork/rules/uv-lock-sync.md b/.deepwork/rules/uv-lock-sync.md index 75cca269..1d5279eb 100644 --- a/.deepwork/rules/uv-lock-sync.md +++ b/.deepwork/rules/uv-lock-sync.md @@ -4,6 +4,7 @@ trigger: pyproject.toml action: command: uv sync compare_to: prompt +prompt_runtime: send_to_stopping_agent --- # UV Lock Sync diff --git a/.deepwork/rules/version-and-changelog-update.md b/.deepwork/rules/version-and-changelog-update.md index ac617f8e..9d0497af 100644 --- a/.deepwork/rules/version-and-changelog-update.md +++ b/.deepwork/rules/version-and-changelog-update.md @@ -5,6 +5,7 @@ safety: - pyproject.toml - CHANGELOG.md compare_to: base +prompt_runtime: send_to_stopping_agent --- Source code in src/ has been modified. **You MUST evaluate whether version and changelog updates are needed.** diff --git a/doc/rules_syntax.md b/doc/rules_syntax.md index 2ab86be1..eba0dbfc 100644 --- a/doc/rules_syntax.md +++ b/doc/rules_syntax.md @@ -269,6 +269,19 @@ If an existing file `src/api/users.py` is modified: The markdown body after frontmatter serves as instructions shown to the agent. This is the default when no `action` field is specified. +**Prompt Runtime:** + +Prompt actions can be executed in two ways, controlled by the `prompt_runtime` field: + +| Runtime | Description | +|---------|-------------| +| `send_to_stopping_agent` | Return prompt to the triggering agent (default) | +| `claude` | Invoke Claude Code in headless mode | + +The default (`send_to_stopping_agent`) returns the rule's markdown instructions to whatever agent triggered the hook. The agent sees the instructions and responds accordingly. + +With `claude` runtime, the system invokes Claude Code in headless mode to process the rule autonomously. Claude receives the instructions, performs the requested task, and returns a structured result indicating success or failure. + **Template Variables in Instructions:** | Variable | Description | @@ -483,6 +496,50 @@ compare_to: base --- ``` +### prompt_runtime (optional) + +Determines how prompt actions are executed. Only applies to rules with prompt actions (no `action` field). + +| Value | Description | +|-------|-------------| +| `send_to_stopping_agent` | Return the prompt to the agent that triggered the rule (default) | +| `claude` | Invoke Claude Code in headless mode to process the prompt | + +```yaml +--- +prompt_runtime: send_to_stopping_agent +--- +``` + +**Default behavior (`send_to_stopping_agent`):** + +The rule's markdown body is returned to the agent that triggered the hook. The agent sees the instructions and can respond accordingly, using promise tags to acknowledge the rule. + +**Claude runtime (`claude`):** + +Instead of returning instructions to the triggering agent, Claude Code is invoked in headless mode with the rule's instructions. Claude processes the prompt autonomously and returns a structured response indicating whether the rule was satisfied. + +This is useful when: +- You want rules to be handled by a dedicated Claude instance +- The triggering agent is not Claude (e.g., Gemini) +- You want consistent rule processing regardless of which agent triggered it + +Example with Claude runtime: +```yaml +--- +name: Auto Code Review +trigger: src/**/*.py +compare_to: prompt +prompt_runtime: claude +--- +Review the following Python code changes for: +1. Type safety issues +2. Missing error handling +3. Code style violations + +If issues found, fix them directly. +``` + ## Complete Examples ### Example 1: Test Coverage Rule @@ -623,6 +680,32 @@ action: Automatically lints newly created React components. ``` +### Example 8: Claude-Powered Code Review + +`.deepwork/rules/security-review.md`: +```markdown +--- +name: Security Review +trigger: + - src/auth/**/* + - src/api/**/* +compare_to: prompt +prompt_runtime: claude +--- +Security-sensitive code has been modified. Review for: + +1. **Input validation**: All user inputs are validated and sanitized +2. **Authentication**: Auth checks are properly implemented +3. **Authorization**: Access controls are correctly applied +4. **Secrets**: No hardcoded credentials or API keys +5. **SQL/Injection**: Parameterized queries used, no string concatenation + +If you find any issues, fix them directly in the code. +If the code passes review, confirm it meets security standards. +``` + +This rule invokes Claude Code in headless mode to perform an autonomous security review when auth or API code changes. Claude will analyze the changes and either fix issues directly or confirm the code is secure. + ## Promise Tags When a rule fires but should be dismissed, use promise tags in the conversation. The tag content should be human-readable, using the rule's `name` field: diff --git a/src/deepwork/core/rules_parser.py b/src/deepwork/core/rules_parser.py index 04b1e3d2..7f516592 100644 --- a/src/deepwork/core/rules_parser.py +++ b/src/deepwork/core/rules_parser.py @@ -39,9 +39,19 @@ class ActionType(Enum): COMMAND = "command" # Run an idempotent command +class PromptRuntime(Enum): + """Runtime for executing prompt actions.""" + + SEND_TO_STOPPING_AGENT = "send_to_stopping_agent" # Return prompt to agent (default) + CLAUDE = "claude" # Invoke Claude Code in headless mode + + # Valid compare_to values COMPARE_TO_VALUES = frozenset({"base", "default_tip", "prompt"}) +# Valid prompt_runtime values +PROMPT_RUNTIME_VALUES = frozenset({"send_to_stopping_agent", "claude"}) + @dataclass class CommandAction: @@ -85,6 +95,9 @@ class Rule: instructions: str = "" # For PROMPT action (markdown body) command_action: CommandAction | None = None # For COMMAND action + # Prompt runtime (only relevant for PROMPT action type) + prompt_runtime: PromptRuntime = PromptRuntime.SEND_TO_STOPPING_AGENT + @classmethod def from_frontmatter( cls, @@ -179,6 +192,16 @@ def from_frontmatter( # Get compare_to (required field) compare_to = frontmatter["compare_to"] + # Get prompt_runtime (optional, defaults to send_to_stopping_agent) + prompt_runtime_str = frontmatter.get("prompt_runtime", "send_to_stopping_agent") + try: + prompt_runtime = PromptRuntime(prompt_runtime_str) + except ValueError: + raise RulesParseError( + f"Rule '{name}' has invalid prompt_runtime '{prompt_runtime_str}'. " + f"Valid values: {', '.join(PROMPT_RUNTIME_VALUES)}" + ) from None + return cls( name=name, filename=filename, @@ -192,6 +215,7 @@ def from_frontmatter( instructions=markdown_body.strip(), command_action=command_action, compare_to=compare_to, + prompt_runtime=prompt_runtime, ) diff --git a/src/deepwork/hooks/rules_check.py b/src/deepwork/hooks/rules_check.py index 38a37606..2bd98257 100644 --- a/src/deepwork/hooks/rules_check.py +++ b/src/deepwork/hooks/rules_check.py @@ -31,6 +31,7 @@ from deepwork.core.rules_parser import ( ActionType, DetectionMode, + PromptRuntime, Rule, RuleEvaluationResult, RulesParseError, @@ -531,6 +532,121 @@ def format_rules_message(results: list[RuleEvaluationResult]) -> str: return "\n".join(lines) +def format_claude_prompt(result: RuleEvaluationResult) -> str: + """ + Format a rule evaluation result as a prompt for Claude Code headless mode. + + The prompt includes the rule instructions and expects Claude to return + a structured response indicating whether to block or allow. + """ + rule = result.rule + lines = [ + "# DeepWork Rule Evaluation", + "", + f"Rule: {rule.name}", + "", + ] + + # Add trigger file context + if result.trigger_files: + lines.append("Trigger files:") + for f in result.trigger_files: + lines.append(f" - {f}") + lines.append("") + + # For set/pair modes, show missing files + if result.missing_files: + lines.append("Expected files (not changed):") + for f in result.missing_files: + lines.append(f" - {f}") + lines.append("") + + # Add the rule instructions + lines.append("## Instructions") + lines.append("") + if rule.instructions: + lines.append(rule.instructions.strip()) + lines.append("") + + # Add response format instructions + lines.extend( + [ + "## Response Format", + "", + "After completing the task above, you MUST end your response with a structured block:", + "", + "```", + "---RULE_RESULT---", + 'decision: <"block" or "allow">', + "reason: ", + "---END_RULE_RESULT---", + "```", + "", + "Use 'block' if the rule violation was not resolved, 'allow' if it was resolved.", + ] + ) + + return "\n".join(lines) + + +def parse_claude_response(output: str) -> tuple[str, str]: + """ + Parse the structured response from Claude Code headless mode. + + Returns (decision, reason) tuple. Defaults to ("block", "No response") if parsing fails. + """ + # Look for the structured result block + pattern = r"---RULE_RESULT---\s*\n\s*decision:\s*[\"']?(\w+)[\"']?\s*\n\s*reason:\s*(.+?)\s*\n\s*---END_RULE_RESULT---" + match = re.search(pattern, output, re.IGNORECASE | re.DOTALL) + + if match: + decision = match.group(1).lower().strip() + reason = match.group(2).strip() + # Normalize decision + if decision not in ("block", "allow"): + decision = "block" + return decision, reason + + # If no structured block found, default to block + return "block", "Claude did not return a structured response" + + +def invoke_claude_headless(prompt: str, rule_name: str) -> tuple[str, str]: + """ + Invoke Claude Code in headless mode with the given prompt. + + Args: + prompt: The prompt to send to Claude + rule_name: Name of the rule being evaluated (for error messages) + + Returns: + Tuple of (decision, reason) where decision is "block" or "allow" + """ + try: + # Run claude in headless mode with --print flag to get output + result = subprocess.run( + ["claude", "--print", "--dangerously-skip-permissions", "-p", prompt], + capture_output=True, + text=True, + timeout=300, # 5 minute timeout + cwd=Path.cwd(), + ) + + if result.returncode != 0: + error_msg = result.stderr.strip() or "Unknown error" + return "block", f"Claude execution failed: {error_msg}" + + output = result.stdout.strip() + return parse_claude_response(output) + + except subprocess.TimeoutExpired: + return "block", f"Claude timed out while processing rule '{rule_name}'" + except FileNotFoundError: + return "block", "Claude CLI not found. Please ensure 'claude' is installed and in PATH" + except Exception as e: + return "block", f"Error invoking Claude: {str(e)}" + + def rules_check_hook(hook_input: HookInput) -> HookOutput: """ Main hook logic for rules evaluation (v2). @@ -662,6 +778,57 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput: # Collect for prompt output prompt_results.append(result) + # Separate prompt results by runtime + agent_prompt_results: list[RuleEvaluationResult] = [] + claude_prompt_results: list[RuleEvaluationResult] = [] + + for result in prompt_results: + if result.rule.prompt_runtime == PromptRuntime.CLAUDE: + claude_prompt_results.append(result) + else: + agent_prompt_results.append(result) + + # Process Claude runtime rules + claude_errors: list[str] = [] + for result in claude_prompt_results: + rule = result.rule + + # Compute trigger hash for queue + baseline_ref = get_baseline_ref(rule.compare_to) + trigger_hash = compute_trigger_hash( + rule.name, + result.trigger_files, + baseline_ref, + ) + + # Invoke Claude in headless mode + prompt = format_claude_prompt(result) + decision, reason = invoke_claude_headless(prompt, rule.name) + + if decision == "allow": + # Claude resolved the issue + queue.update_status( + trigger_hash, + QueueEntryStatus.PASSED, + ActionResult( + type="claude", + output=reason, + exit_code=0, + ), + ) + else: + # Claude could not resolve or blocked + claude_errors.append(f"## {rule.name}\n{reason}\n") + queue.update_status( + trigger_hash, + QueueEntryStatus.FAILED, + ActionResult( + type="claude", + output=reason, + exit_code=1, + ), + ) + # Build response messages: list[str] = [] @@ -672,9 +839,16 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput: messages.extend(command_errors) messages.append("") - # Add prompt rules if any - if prompt_results: - messages.append(format_rules_message(prompt_results)) + # Add Claude errors if any + if claude_errors: + messages.append("## Claude Rule Errors\n") + messages.append("The following rules were processed by Claude but require attention.\n") + messages.extend(claude_errors) + messages.append("") + + # Add prompt rules if any (send_to_stopping_agent runtime) + if agent_prompt_results: + messages.append(format_rules_message(agent_prompt_results)) if messages: return HookOutput(decision="block", reason="\n".join(messages)) diff --git a/src/deepwork/schemas/rules_schema.py b/src/deepwork/schemas/rules_schema.py index bf091ab9..64e8501c 100644 --- a/src/deepwork/schemas/rules_schema.py +++ b/src/deepwork/schemas/rules_schema.py @@ -87,6 +87,12 @@ "enum": ["base", "default_tip", "prompt"], "description": "Baseline for detecting file changes", }, + "prompt_runtime": { + "type": "string", + "enum": ["send_to_stopping_agent", "claude"], + "default": "send_to_stopping_agent", + "description": "Runtime for prompt action: 'send_to_stopping_agent' returns prompt to the agent that triggered the rule, 'claude' invokes Claude Code in headless mode", + }, }, "additionalProperties": False, # Detection mode must be exactly one of: trigger, set, pair, or created From 496605078f1a2f0fb18d7d6ca1ddc90936a7e1e2 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 21 Jan 2026 00:55:14 +0000 Subject: [PATCH 02/13] Add automated tests for prompt_runtime and include transcript path in Claude prompt Tests: - Add TestPromptRuntime class for PromptRuntime enum behavior - Add TestLoadPromptRuntimeFromFile class for parsing from rule files - Add TestFormatClaudePrompt class for Claude prompt formatting - Add TestParseClaudeResponse class for parsing Claude's structured output - Add tests for transcript_path parameter in format_claude_prompt Implementation: - Update format_claude_prompt to accept optional transcript_path parameter - Include conversation context section in Claude prompt when transcript provided - Pass hook_input.transcript_path to format_claude_prompt in hook --- src/deepwork/hooks/rules_check.py | 19 +- tests/unit/test_rules_check.py | 311 +++++++++++++++++++++++++++++- tests/unit/test_rules_parser.py | 245 +++++++++++++++++++++++ 3 files changed, 572 insertions(+), 3 deletions(-) diff --git a/src/deepwork/hooks/rules_check.py b/src/deepwork/hooks/rules_check.py index 2bd98257..dd13ab5b 100644 --- a/src/deepwork/hooks/rules_check.py +++ b/src/deepwork/hooks/rules_check.py @@ -532,12 +532,19 @@ def format_rules_message(results: list[RuleEvaluationResult]) -> str: return "\n".join(lines) -def format_claude_prompt(result: RuleEvaluationResult) -> str: +def format_claude_prompt(result: RuleEvaluationResult, transcript_path: str | None = None) -> str: """ Format a rule evaluation result as a prompt for Claude Code headless mode. The prompt includes the rule instructions and expects Claude to return a structured response indicating whether to block or allow. + + Args: + result: The rule evaluation result + transcript_path: Optional path to the conversation transcript file + + Returns: + Formatted prompt string for Claude """ rule = result.rule lines = [ @@ -547,6 +554,14 @@ def format_claude_prompt(result: RuleEvaluationResult) -> str: "", ] + # Add transcript location for conversation context + if transcript_path: + lines.append("## Conversation Context") + lines.append("") + lines.append(f"The conversation transcript is located at: {transcript_path}") + lines.append("You can read this file to understand the context of the changes being made.") + lines.append("") + # Add trigger file context if result.trigger_files: lines.append("Trigger files:") @@ -802,7 +817,7 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput: ) # Invoke Claude in headless mode - prompt = format_claude_prompt(result) + prompt = format_claude_prompt(result, hook_input.transcript_path) decision, reason = invoke_claude_headless(prompt, rule.name) if decision == "allow": diff --git a/tests/unit/test_rules_check.py b/tests/unit/test_rules_check.py index e672fd94..34d2852f 100644 --- a/tests/unit/test_rules_check.py +++ b/tests/unit/test_rules_check.py @@ -1,6 +1,17 @@ """Tests for rules_check hook module.""" -from deepwork.hooks.rules_check import extract_promise_tags +from deepwork.core.rules_parser import ( + DetectionMode, + PairConfig, + PromptRuntime, + Rule, + RuleEvaluationResult, +) +from deepwork.hooks.rules_check import ( + extract_promise_tags, + format_claude_prompt, + parse_claude_response, +) class TestExtractPromiseTags: @@ -103,3 +114,301 @@ def test_promise_embedded_in_markdown(self) -> None: """ result = extract_promise_tags(text) assert result == {"Architecture Documentation Accuracy", "README Accuracy"} + + +class TestFormatClaudePrompt: + """Tests for format_claude_prompt function.""" + + def test_formats_basic_trigger_safety_rule(self) -> None: + """Test formatting a basic trigger/safety rule for Claude.""" + rule = Rule( + name="Security Review", + filename="security-review", + detection_mode=DetectionMode.TRIGGER_SAFETY, + triggers=["src/auth/**/*"], + safety=[], + instructions="Review the code for security issues.", + compare_to="prompt", + prompt_runtime=PromptRuntime.CLAUDE, + ) + result = RuleEvaluationResult( + rule=rule, + should_fire=True, + trigger_files=["src/auth/login.py"], + ) + + prompt = format_claude_prompt(result) + + assert "Security Review" in prompt + assert "src/auth/login.py" in prompt + assert "Review the code for security issues" in prompt + assert "---RULE_RESULT---" in prompt + assert 'decision: <"block" or "allow">' in prompt + + def test_formats_set_mode_rule_with_missing_files(self) -> None: + """Test formatting a set mode rule showing missing files.""" + rule = Rule( + name="Source/Test Pairing", + filename="source-test-pairing", + detection_mode=DetectionMode.SET, + set_patterns=["src/{path}.py", "tests/{path}_test.py"], + instructions="Update the corresponding test file.", + compare_to="base", + prompt_runtime=PromptRuntime.CLAUDE, + ) + result = RuleEvaluationResult( + rule=rule, + should_fire=True, + trigger_files=["src/auth/login.py"], + missing_files=["tests/auth/login_test.py"], + ) + + prompt = format_claude_prompt(result) + + assert "Source/Test Pairing" in prompt + assert "src/auth/login.py" in prompt + assert "tests/auth/login_test.py" in prompt + assert "Expected files (not changed)" in prompt + assert "Update the corresponding test file" in prompt + + def test_formats_pair_mode_rule(self) -> None: + """Test formatting a pair mode rule.""" + rule = Rule( + name="API Documentation", + filename="api-documentation", + detection_mode=DetectionMode.PAIR, + pair_config=PairConfig( + trigger="api/{path}.py", + expects=["docs/api/{path}.md"], + ), + instructions="Update the API documentation.", + compare_to="base", + prompt_runtime=PromptRuntime.CLAUDE, + ) + result = RuleEvaluationResult( + rule=rule, + should_fire=True, + trigger_files=["api/users.py"], + missing_files=["docs/api/users.md"], + ) + + prompt = format_claude_prompt(result) + + assert "API Documentation" in prompt + assert "api/users.py" in prompt + assert "docs/api/users.md" in prompt + assert "Update the API documentation" in prompt + + def test_includes_response_format_instructions(self) -> None: + """Test that prompt includes response format instructions.""" + rule = Rule( + name="Test Rule", + filename="test-rule", + detection_mode=DetectionMode.TRIGGER_SAFETY, + triggers=["src/**/*"], + safety=[], + instructions="Check the code.", + compare_to="base", + prompt_runtime=PromptRuntime.CLAUDE, + ) + result = RuleEvaluationResult( + rule=rule, + should_fire=True, + trigger_files=["src/main.py"], + ) + + prompt = format_claude_prompt(result) + + assert "Response Format" in prompt + assert "---RULE_RESULT---" in prompt + assert "---END_RULE_RESULT---" in prompt + assert "block" in prompt + assert "allow" in prompt + + def test_includes_transcript_path_when_provided(self) -> None: + """Test that prompt includes transcript path when provided.""" + rule = Rule( + name="Test Rule", + filename="test-rule", + detection_mode=DetectionMode.TRIGGER_SAFETY, + triggers=["src/**/*"], + safety=[], + instructions="Check the code.", + compare_to="base", + prompt_runtime=PromptRuntime.CLAUDE, + ) + result = RuleEvaluationResult( + rule=rule, + should_fire=True, + trigger_files=["src/main.py"], + ) + + prompt = format_claude_prompt(result, transcript_path="/tmp/conversation.jsonl") + + assert "Conversation Context" in prompt + assert "/tmp/conversation.jsonl" in prompt + assert "transcript" in prompt.lower() + + def test_omits_transcript_section_when_not_provided(self) -> None: + """Test that prompt omits transcript section when path is None.""" + rule = Rule( + name="Test Rule", + filename="test-rule", + detection_mode=DetectionMode.TRIGGER_SAFETY, + triggers=["src/**/*"], + safety=[], + instructions="Check the code.", + compare_to="base", + prompt_runtime=PromptRuntime.CLAUDE, + ) + result = RuleEvaluationResult( + rule=rule, + should_fire=True, + trigger_files=["src/main.py"], + ) + + prompt = format_claude_prompt(result, transcript_path=None) + + assert "Conversation Context" not in prompt + # But instructions and other parts should still be present + assert "Check the code" in prompt + assert "---RULE_RESULT---" in prompt + + +class TestParseClaudeResponse: + """Tests for parse_claude_response function.""" + + def test_parses_allow_decision(self) -> None: + """Test parsing an allow decision.""" + output = """ +I've reviewed the code and it looks good. + +---RULE_RESULT--- +decision: allow +reason: Code follows security best practices +---END_RULE_RESULT--- +""" + decision, reason = parse_claude_response(output) + + assert decision == "allow" + assert reason == "Code follows security best practices" + + def test_parses_block_decision(self) -> None: + """Test parsing a block decision.""" + output = """ +There are security issues in the code. + +---RULE_RESULT--- +decision: block +reason: Found hardcoded credentials on line 42 +---END_RULE_RESULT--- +""" + decision, reason = parse_claude_response(output) + + assert decision == "block" + assert reason == "Found hardcoded credentials on line 42" + + def test_parses_quoted_decision(self) -> None: + """Test parsing decision with quotes.""" + output = """ +---RULE_RESULT--- +decision: "allow" +reason: All tests pass +---END_RULE_RESULT--- +""" + decision, reason = parse_claude_response(output) + + assert decision == "allow" + assert reason == "All tests pass" + + def test_parses_single_quoted_decision(self) -> None: + """Test parsing decision with single quotes.""" + output = """ +---RULE_RESULT--- +decision: 'block' +reason: Missing test coverage +---END_RULE_RESULT--- +""" + decision, reason = parse_claude_response(output) + + assert decision == "block" + assert reason == "Missing test coverage" + + def test_defaults_to_block_when_no_result_block(self) -> None: + """Test defaults to block when no result block found.""" + output = "I reviewed the code but forgot to include the result block." + + decision, reason = parse_claude_response(output) + + assert decision == "block" + assert "did not return a structured response" in reason + + def test_defaults_to_block_for_empty_output(self) -> None: + """Test defaults to block for empty output.""" + decision, reason = parse_claude_response("") + + assert decision == "block" + assert "did not return a structured response" in reason + + def test_handles_invalid_decision_value(self) -> None: + """Test handles invalid decision value by defaulting to block.""" + output = """ +---RULE_RESULT--- +decision: maybe +reason: Not sure about this +---END_RULE_RESULT--- +""" + decision, reason = parse_claude_response(output) + + # Invalid decision should default to block + assert decision == "block" + + def test_case_insensitive_decision(self) -> None: + """Test that decision parsing is case-insensitive.""" + output = """ +---RULE_RESULT--- +decision: ALLOW +reason: Everything looks good +---END_RULE_RESULT--- +""" + decision, reason = parse_claude_response(output) + + assert decision == "allow" + assert reason == "Everything looks good" + + def test_handles_multiline_reason(self) -> None: + """Test handling of reason that spans context before end marker.""" + output = """ +---RULE_RESULT--- +decision: block +reason: Multiple issues found including security vulnerabilities +---END_RULE_RESULT--- +""" + decision, reason = parse_claude_response(output) + + assert decision == "block" + assert "Multiple issues found" in reason + + def test_parses_result_embedded_in_longer_output(self) -> None: + """Test parsing result block embedded in longer output.""" + output = """ +I've completed the security review of the authentication code. + +Here are my findings: +1. The password hashing uses bcrypt which is good +2. Input validation is properly implemented +3. No SQL injection vulnerabilities found + +Overall, the code follows security best practices. + +---RULE_RESULT--- +decision: allow +reason: Code passes security review - no vulnerabilities found +---END_RULE_RESULT--- + +Let me know if you need any clarification. +""" + decision, reason = parse_claude_response(output) + + assert decision == "allow" + assert "passes security review" in reason diff --git a/tests/unit/test_rules_parser.py b/tests/unit/test_rules_parser.py index ee8a2375..5e31c462 100644 --- a/tests/unit/test_rules_parser.py +++ b/tests/unit/test_rules_parser.py @@ -2,11 +2,16 @@ from pathlib import Path +import pytest + from deepwork.core.pattern_matcher import matches_any_pattern as matches_pattern from deepwork.core.rules_parser import ( + ActionType, DetectionMode, PairConfig, + PromptRuntime, Rule, + RulesParseError, evaluate_rule, evaluate_rules, load_rules_from_directory, @@ -993,3 +998,243 @@ def test_loads_created_rule_with_command_action(self, temp_dir: Path) -> None: assert rules[0].action_type == ActionType.COMMAND assert rules[0].command_action is not None assert rules[0].command_action.command == "ruff check {file}" + + +class TestPromptRuntime: + """Tests for prompt_runtime field parsing and behavior.""" + + def test_default_prompt_runtime_is_send_to_stopping_agent(self) -> None: + """Test that default prompt_runtime is send_to_stopping_agent.""" + rule = Rule( + name="Test Rule", + filename="test-rule", + detection_mode=DetectionMode.TRIGGER_SAFETY, + triggers=["src/**/*"], + safety=[], + instructions="Check it", + compare_to="base", + ) + assert rule.prompt_runtime == PromptRuntime.SEND_TO_STOPPING_AGENT + + def test_explicit_send_to_stopping_agent_runtime(self) -> None: + """Test explicit send_to_stopping_agent runtime.""" + rule = Rule( + name="Test Rule", + filename="test-rule", + detection_mode=DetectionMode.TRIGGER_SAFETY, + triggers=["src/**/*"], + safety=[], + instructions="Check it", + compare_to="base", + prompt_runtime=PromptRuntime.SEND_TO_STOPPING_AGENT, + ) + assert rule.prompt_runtime == PromptRuntime.SEND_TO_STOPPING_AGENT + + def test_claude_runtime(self) -> None: + """Test claude runtime.""" + rule = Rule( + name="Test Rule", + filename="test-rule", + detection_mode=DetectionMode.TRIGGER_SAFETY, + triggers=["src/**/*"], + safety=[], + instructions="Check it", + compare_to="base", + prompt_runtime=PromptRuntime.CLAUDE, + ) + assert rule.prompt_runtime == PromptRuntime.CLAUDE + + +class TestLoadPromptRuntimeFromFile: + """Tests for loading rules with prompt_runtime from files.""" + + def test_loads_rule_without_prompt_runtime_defaults(self, temp_dir: Path) -> None: + """Test loading a rule without prompt_runtime defaults to send_to_stopping_agent.""" + rules_dir = temp_dir / "rules" + rules_dir.mkdir() + + rule_file = rules_dir / "test-rule.md" + rule_file.write_text( + """--- +name: Test Rule +trigger: "src/**/*" +compare_to: base +--- +Please check the source files. +""" + ) + + rules = load_rules_from_directory(rules_dir) + + assert len(rules) == 1 + assert rules[0].prompt_runtime == PromptRuntime.SEND_TO_STOPPING_AGENT + + def test_loads_rule_with_send_to_stopping_agent_runtime(self, temp_dir: Path) -> None: + """Test loading a rule with explicit send_to_stopping_agent runtime.""" + rules_dir = temp_dir / "rules" + rules_dir.mkdir() + + rule_file = rules_dir / "test-rule.md" + rule_file.write_text( + """--- +name: Test Rule +trigger: "src/**/*" +compare_to: base +prompt_runtime: send_to_stopping_agent +--- +Please check the source files. +""" + ) + + rules = load_rules_from_directory(rules_dir) + + assert len(rules) == 1 + assert rules[0].prompt_runtime == PromptRuntime.SEND_TO_STOPPING_AGENT + + def test_loads_rule_with_claude_runtime(self, temp_dir: Path) -> None: + """Test loading a rule with claude runtime.""" + rules_dir = temp_dir / "rules" + rules_dir.mkdir() + + rule_file = rules_dir / "test-rule.md" + rule_file.write_text( + """--- +name: Security Review +trigger: "src/auth/**/*" +compare_to: prompt +prompt_runtime: claude +--- +Review the security-sensitive code for vulnerabilities. +""" + ) + + rules = load_rules_from_directory(rules_dir) + + assert len(rules) == 1 + assert rules[0].name == "Security Review" + assert rules[0].prompt_runtime == PromptRuntime.CLAUDE + assert rules[0].action_type == ActionType.PROMPT + + def test_loads_command_action_rule_with_prompt_runtime(self, temp_dir: Path) -> None: + """Test loading a command action rule with prompt_runtime (ignored for command actions).""" + rules_dir = temp_dir / "rules" + rules_dir.mkdir() + + rule_file = rules_dir / "format-python.md" + rule_file.write_text( + """--- +name: Format Python +trigger: "**/*.py" +action: + command: "ruff format {file}" + run_for: each_match +compare_to: prompt +prompt_runtime: send_to_stopping_agent +--- +""" + ) + + rules = load_rules_from_directory(rules_dir) + + assert len(rules) == 1 + assert rules[0].action_type == ActionType.COMMAND + # prompt_runtime is still parsed even for command actions + assert rules[0].prompt_runtime == PromptRuntime.SEND_TO_STOPPING_AGENT + + def test_invalid_prompt_runtime_raises_error(self, temp_dir: Path) -> None: + """Test that invalid prompt_runtime value raises an error.""" + rules_dir = temp_dir / "rules" + rules_dir.mkdir() + + rule_file = rules_dir / "test-rule.md" + rule_file.write_text( + """--- +name: Test Rule +trigger: "src/**/*" +compare_to: base +prompt_runtime: invalid_value +--- +Please check the source files. +""" + ) + + with pytest.raises(RulesParseError) as exc_info: + load_rules_from_directory(rules_dir) + + # Schema validation catches invalid enum values + error_message = str(exc_info.value) + assert "invalid_value" in error_message + assert "send_to_stopping_agent" in error_message or "prompt_runtime" in error_message + + def test_loads_set_mode_rule_with_claude_runtime(self, temp_dir: Path) -> None: + """Test loading a set mode rule with claude runtime.""" + rules_dir = temp_dir / "rules" + rules_dir.mkdir() + + rule_file = rules_dir / "source-test-pairing.md" + rule_file.write_text( + """--- +name: Source/Test Pairing +set: + - src/{path}.py + - tests/{path}_test.py +compare_to: base +prompt_runtime: claude +--- +Source and test files should change together. +""" + ) + + rules = load_rules_from_directory(rules_dir) + + assert len(rules) == 1 + assert rules[0].detection_mode == DetectionMode.SET + assert rules[0].prompt_runtime == PromptRuntime.CLAUDE + + def test_loads_pair_mode_rule_with_claude_runtime(self, temp_dir: Path) -> None: + """Test loading a pair mode rule with claude runtime.""" + rules_dir = temp_dir / "rules" + rules_dir.mkdir() + + rule_file = rules_dir / "api-docs.md" + rule_file.write_text( + """--- +name: API Documentation +pair: + trigger: src/api/{name}.py + expects: docs/api/{name}.md +compare_to: base +prompt_runtime: claude +--- +API code requires documentation. +""" + ) + + rules = load_rules_from_directory(rules_dir) + + assert len(rules) == 1 + assert rules[0].detection_mode == DetectionMode.PAIR + assert rules[0].prompt_runtime == PromptRuntime.CLAUDE + + def test_loads_created_mode_rule_with_claude_runtime(self, temp_dir: Path) -> None: + """Test loading a created mode rule with claude runtime.""" + rules_dir = temp_dir / "rules" + rules_dir.mkdir() + + rule_file = rules_dir / "new-module-review.md" + rule_file.write_text( + """--- +name: New Module Review +created: src/**/*.py +compare_to: prompt +prompt_runtime: claude +--- +Review the new module for best practices. +""" + ) + + rules = load_rules_from_directory(rules_dir) + + assert len(rules) == 1 + assert rules[0].detection_mode == DetectionMode.CREATED + assert rules[0].prompt_runtime == PromptRuntime.CLAUDE From 6118a892f9f831eddd0f53e59d7becb02b100b2e Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 21 Jan 2026 01:02:36 +0000 Subject: [PATCH 03/13] Update readme and architecture accuracy rules to use claude runtime These rules now invoke Claude Code in headless mode to autonomously check and update documentation when source code changes, rather than returning prompts to the triggering agent. --- .deepwork/rules/architecture-documentation-accuracy.md | 4 +++- .deepwork/rules/readme-accuracy.md | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/.deepwork/rules/architecture-documentation-accuracy.md b/.deepwork/rules/architecture-documentation-accuracy.md index 5e77acd1..eafa3c47 100644 --- a/.deepwork/rules/architecture-documentation-accuracy.md +++ b/.deepwork/rules/architecture-documentation-accuracy.md @@ -3,10 +3,12 @@ name: Architecture Documentation Accuracy trigger: src/**/* safety: doc/architecture.md compare_to: base -prompt_runtime: send_to_stopping_agent +prompt_runtime: claude --- Source code in src/ has been modified. Please review doc/architecture.md for accuracy: 1. Verify the documented architecture matches the current implementation 2. Check that file paths and directory structures are still correct 3. Ensure component descriptions reflect actual behavior 4. Update any diagrams or flows that may have changed + +If the architecture documentation needs updates, make the changes directly. If the documentation is accurate, confirm it matches the current implementation. diff --git a/.deepwork/rules/readme-accuracy.md b/.deepwork/rules/readme-accuracy.md index ccc1218f..e04672a3 100644 --- a/.deepwork/rules/readme-accuracy.md +++ b/.deepwork/rules/readme-accuracy.md @@ -3,10 +3,12 @@ name: README Accuracy trigger: src/**/* safety: README.md compare_to: base -prompt_runtime: send_to_stopping_agent +prompt_runtime: claude --- Source code in src/ has been modified. Please review README.md for accuracy: 1. Verify project overview still reflects current functionality 2. Check that usage examples are still correct 3. Ensure installation/setup instructions remain valid 4. Update any sections that reference changed code + +If the README needs updates, make the changes directly. If the README is accurate, confirm it matches the current implementation. From ab89730acb369e49ff8513e877f2ad052f73063d Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 21 Jan 2026 01:03:50 +0000 Subject: [PATCH 04/13] Add manual test for claude runtime feature Creates test_claude_runtime/ directory with a simple Python file that triggers the new prompt_runtime: claude feature. When edited, the rule invokes Claude Code in headless mode to review the code changes. Updates manual tests README with test documentation. --- .deepwork/rules/manual-test-claude-runtime.md | 27 +++++++++++++++++++ manual_tests/README.md | 4 +++ .../test_claude_runtime_code.py | 27 +++++++++++++++++++ 3 files changed, 58 insertions(+) create mode 100644 .deepwork/rules/manual-test-claude-runtime.md create mode 100644 manual_tests/test_claude_runtime/test_claude_runtime_code.py diff --git a/.deepwork/rules/manual-test-claude-runtime.md b/.deepwork/rules/manual-test-claude-runtime.md new file mode 100644 index 00000000..c58c4b14 --- /dev/null +++ b/.deepwork/rules/manual-test-claude-runtime.md @@ -0,0 +1,27 @@ +--- +name: "Manual Test: Claude Runtime" +trigger: manual_tests/test_claude_runtime/test_claude_runtime_code.py +compare_to: prompt +prompt_runtime: claude +--- + +# Manual Test: Claude Runtime + +You are evaluating code changes as part of an automated rule check. + +**Review the code in the trigger file for:** +1. Basic code quality (clear variable names, proper structure) +2. Presence of docstrings or comments +3. No obvious bugs or issues + +**This is a test rule.** For testing purposes: +- If the code looks reasonable, respond with `allow` +- If there are obvious issues (syntax errors, missing functions, etc.), respond with `block` + +Since this is a manual test, the code is intentionally simple and should pass review. + +## This tests: + +The `prompt_runtime: claude` feature where instead of returning the prompt to +the triggering agent, Claude Code is invoked in headless mode to process +the rule autonomously. diff --git a/manual_tests/README.md b/manual_tests/README.md index 42569421..213eebb2 100644 --- a/manual_tests/README.md +++ b/manual_tests/README.md @@ -37,6 +37,7 @@ Each test has two cases: one where the rule SHOULD fire, and one where it should | **Infinite Block Prompt** | Edit `.py` (always blocks) | Provide `` tag | Manual Test: Infinite Block Prompt | | **Infinite Block Command** | Edit `.py` (command fails) | Provide `` tag | Manual Test: Infinite Block Command | | **Created Mode** | Create NEW `.yml` file | Modify EXISTING `.yml` file | Manual Test: Created Mode | +| **Claude Runtime** | Edit `.py` → Claude invoked | Claude returns `allow` | Manual Test: Claude Runtime | ## Test Results Tracking @@ -51,6 +52,7 @@ Each test has two cases: one where the rule SHOULD fire, and one where it should | Infinite Block Prompt | ☐ | ☐ | | Infinite Block Command | ☐ | ☐ | | Created Mode | ☐ | ☐ | +| Claude Runtime | ☐ | ☐ | ## Test Folders @@ -64,6 +66,7 @@ Each test has two cases: one where the rule SHOULD fire, and one where it should | `test_infinite_block_prompt/` | Infinite Block (Prompt) | Always blocks with prompt; only promise can bypass | | `test_infinite_block_command/` | Infinite Block (Command) | Command always fails; tests if promise skips command | | `test_created_mode/` | Created (New Files Only) | Fires ONLY when NEW files are created, not when existing modified | +| `test_claude_runtime/` | Claude Runtime | Invokes Claude Code in headless mode instead of returning prompt | ## Corresponding Rules @@ -76,3 +79,4 @@ Rules are defined in `.deepwork/rules/`: - `manual-test-infinite-block-prompt.md` - `manual-test-infinite-block-command.md` - `manual-test-created-mode.md` +- `manual-test-claude-runtime.md` diff --git a/manual_tests/test_claude_runtime/test_claude_runtime_code.py b/manual_tests/test_claude_runtime/test_claude_runtime_code.py new file mode 100644 index 00000000..ef0cd682 --- /dev/null +++ b/manual_tests/test_claude_runtime/test_claude_runtime_code.py @@ -0,0 +1,27 @@ +# Manual Test: Claude Runtime +# This file triggers a rule that uses the 'claude' prompt_runtime. +# +# When this file is edited, the rule should: +# 1. Invoke Claude Code in headless mode +# 2. Claude reviews the code and responds with a structured result +# 3. The hook parses Claude's response (block/allow) +# +# To test: +# 1. Edit this file (e.g., add a comment or change the function) +# 2. Run the rules_check hook +# 3. Verify Claude is invoked and returns a structured response + + +def calculate_sum(numbers: list[int]) -> int: + """Calculate the sum of a list of numbers.""" + total = 0 + for num in numbers: + total += num + return total + + +def calculate_average(numbers: list[int]) -> float: + """Calculate the average of a list of numbers.""" + if not numbers: + return 0.0 + return calculate_sum(numbers) / len(numbers) From a9e774f897c4624aeb686d0137caacd66defb33d Mon Sep 17 00:00:00 2001 From: Noah Horton Date: Tue, 20 Jan 2026 18:30:43 -0700 Subject: [PATCH 05/13] Add documentation and version bump for prompt_runtime feature - Bump version to 0.5.0 - Add CHANGELOG entry for prompt_runtime setting - Document prompt_runtime in README and architecture docs - Add pytest and gitpython as dev dependencies Co-Authored-By: Claude Opus 4.5 --- CHANGELOG.md | 11 +++++++ README.md | 15 ++++++++++ doc/architecture.md | 70 +++++++++++++++++++++++++++++++++++++++------ pyproject.toml | 8 +++++- uv.lock | 14 ++++++++- 5 files changed, 107 insertions(+), 11 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index fabeddbc..2280119d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,17 @@ All notable changes to DeepWork will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [0.5.0] - 2026-01-20 + +### Added +- `prompt_runtime` setting for rules to control how prompt-type actions are executed + - `send_to_stopping_agent` (default): Returns prompt to the agent that triggered the rule + - `claude`: Invokes Claude Code in headless mode to handle the rule independently +- Claude headless mode execution for automated rule remediation + - Rules with `prompt_runtime: claude` spawn a separate Claude process + - Claude performs required actions and returns structured `block`/`allow` decision + - Useful for automated tasks like documentation updates without blocking the main agent + ## [0.4.0] - 2026-01-20 ### Added diff --git a/README.md b/README.md index 96d7e740..3501cbf6 100644 --- a/README.md +++ b/README.md @@ -282,6 +282,21 @@ compare_to: prompt --- ``` +**Example Rule with Claude Runtime** (`.deepwork/rules/readme-accuracy.md`): +```markdown +--- +name: README Accuracy +trigger: "src/**/*.py" +compare_to: prompt +prompt_runtime: claude +--- +Source code has been modified. Review README.md for accuracy and update if needed. +``` + +The `prompt_runtime` setting controls how prompt-based rules are executed: +- `send_to_stopping_agent` (default): Returns the rule prompt to the agent that triggered it +- `claude`: Invokes Claude Code in headless mode to evaluate the rule independently + ### Multi-Platform Support Generate native commands and skills tailored for your AI coding assistant. - **Native Integration**: Works directly with the skill/command formats of supported agents. diff --git a/doc/architecture.md b/doc/architecture.md index f11d84db..f709eb60 100644 --- a/doc/architecture.md +++ b/doc/architecture.md @@ -1043,7 +1043,7 @@ Please create or update tests for the modified source files. ### Detection Modes -Rules support three detection modes: +Rules support four detection modes: **1. Trigger/Safety (default)** - Fire when trigger matches but safety doesn't: ```yaml @@ -1078,6 +1078,16 @@ compare_to: base --- ``` +**4. Created** - Fire when newly created files match patterns: +```yaml +--- +name: New Component Checklist +created: "src/components/**/*.tsx" +compare_to: base +--- +``` +This mode triggers only for files that are newly created (not modified), useful for enforcing standards on new files. + ### Action Types **1. Prompt (default)** - Show instructions to the agent: @@ -1102,6 +1112,42 @@ compare_to: prompt --- ``` +### Prompt Runtime + +For prompt-type actions, you can specify how the prompt is delivered using the `prompt_runtime` setting: + +**1. send_to_stopping_agent (default)** - Return the prompt to the agent that triggered the rule: +```yaml +--- +name: Security Review +trigger: "src/auth/**/*" +compare_to: base +prompt_runtime: send_to_stopping_agent +--- +Please check for hardcoded credentials and validate input. +``` + +**2. claude** - Invoke Claude Code in headless mode to handle the rule: +```yaml +--- +name: Architecture Documentation Accuracy +trigger: "src/deepwork/core/**/*.py" +safety: "doc/architecture.md" +compare_to: base +prompt_runtime: claude +--- +Review doc/architecture.md for accuracy against the current implementation. +``` + +When `prompt_runtime: claude` is set, the rule evaluation: +1. Spawns a separate Claude Code process in headless mode +2. Passes the rule instructions as a prompt +3. Claude performs the required actions (e.g., updating documentation) +4. Returns a structured `block` or `allow` decision +5. If `allow`, the rule is marked as passed without blocking the original agent + +This is useful for automated remediation tasks that don't require user interaction. + ### Rule Evaluation Flow 1. **Session Start**: When a Claude Code session begins, the baseline git state is captured @@ -1289,15 +1335,21 @@ See `doc/doc-specs.md` for complete documentation. ### Rule Schema -Rules are validated against a JSON Schema: +Rules are validated against a JSON Schema. The frontmatter supports these fields: -```yaml -- name: string # Required: Friendly name for the rule - trigger: string|array # Required: Glob pattern(s) for triggering files - safety: string|array # Optional: Glob pattern(s) for safety files - instructions: string # Required (unless instructions_file): What to do - instructions_file: string # Alternative: Path to instructions file -``` +| Field | Required | Description | +|-------|----------|-------------| +| `name` | Yes | Human-friendly name for the rule (displayed in promise tags) | +| `compare_to` | Yes | Baseline for detecting file changes: `base`, `default_tip`, or `prompt` | +| `trigger` | One mode required | Glob pattern(s) for triggering files (trigger/safety mode) | +| `safety` | No | Glob pattern(s) that suppress the rule if changed | +| `set` | One mode required | Array of patterns for bidirectional correspondence | +| `pair` | One mode required | Object with `trigger` and `expects` for directional correspondence | +| `created` | One mode required | Glob pattern(s) for newly created files | +| `action` | No | Object with `command` and optional `run_for` for command actions | +| `prompt_runtime` | No | `send_to_stopping_agent` (default) or `claude` for headless execution | + +The markdown body after the frontmatter contains the instructions for prompt-type rules. ### Defining Rules diff --git a/pyproject.toml b/pyproject.toml index d84e3edb..2e3f5cd4 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "deepwork" -version = "0.4.0" +version = "0.5.0" description = "Framework for enabling AI agents to perform complex, multi-step work tasks" readme = "README.md" requires-python = ">=3.11" @@ -108,3 +108,9 @@ warn_redundant_casts = true warn_unused_ignores = true warn_no_return = true strict_equality = true + +[dependency-groups] +dev = [ + "gitpython>=3.1.46", + "pytest>=9.0.2", +] diff --git a/uv.lock b/uv.lock index cd4110a3..06af268e 100644 --- a/uv.lock +++ b/uv.lock @@ -126,7 +126,7 @@ toml = [ [[package]] name = "deepwork" -version = "0.4.0" +version = "0.5.0" source = { editable = "." } dependencies = [ { name = "click" }, @@ -147,6 +147,12 @@ dev = [ { name = "types-pyyaml" }, ] +[package.dev-dependencies] +dev = [ + { name = "gitpython" }, + { name = "pytest" }, +] + [package.metadata] requires-dist = [ { name = "click", specifier = ">=8.1.0" }, @@ -164,6 +170,12 @@ requires-dist = [ ] provides-extras = ["dev"] +[package.metadata.requires-dev] +dev = [ + { name = "gitpython", specifier = ">=3.1.46" }, + { name = "pytest", specifier = ">=9.0.2" }, +] + [[package]] name = "gitdb" version = "4.0.12" From aa84c2cd8dc82adee6cbbe54873aeb9301a20575 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 21 Jan 2026 01:36:01 +0000 Subject: [PATCH 06/13] Remove unnecessary prompt_runtime from command action rules The prompt_runtime setting only applies to prompt actions, not command actions. Removed it from: - manual-test-command-action.md - manual-test-infinite-block-command.md - uv-lock-sync.md The parser still accepts prompt_runtime on command rules (for backwards compatibility) but ignores it since command rules don't use prompts. --- .deepwork/rules/manual-test-command-action.md | 1 - .deepwork/rules/manual-test-infinite-block-command.md | 1 - .deepwork/rules/uv-lock-sync.md | 1 - 3 files changed, 3 deletions(-) diff --git a/.deepwork/rules/manual-test-command-action.md b/.deepwork/rules/manual-test-command-action.md index 31f1992a..966ab2de 100644 --- a/.deepwork/rules/manual-test-command-action.md +++ b/.deepwork/rules/manual-test-command-action.md @@ -5,7 +5,6 @@ action: command: echo "$(date '+%Y-%m-%d %H:%M:%S') - Command triggered by edit to {file}" >> manual_tests/test_command_action/test_command_action_log.txt run_for: each_match compare_to: prompt -prompt_runtime: send_to_stopping_agent --- # Manual Test: Command Action diff --git a/.deepwork/rules/manual-test-infinite-block-command.md b/.deepwork/rules/manual-test-infinite-block-command.md index 85438e96..8f8b24b4 100644 --- a/.deepwork/rules/manual-test-infinite-block-command.md +++ b/.deepwork/rules/manual-test-infinite-block-command.md @@ -5,7 +5,6 @@ action: command: "false" run_for: each_match compare_to: prompt -prompt_runtime: send_to_stopping_agent --- # Manual Test: Infinite Block Command (Promise Required) diff --git a/.deepwork/rules/uv-lock-sync.md b/.deepwork/rules/uv-lock-sync.md index 1d5279eb..75cca269 100644 --- a/.deepwork/rules/uv-lock-sync.md +++ b/.deepwork/rules/uv-lock-sync.md @@ -4,7 +4,6 @@ trigger: pyproject.toml action: command: uv sync compare_to: prompt -prompt_runtime: send_to_stopping_agent --- # UV Lock Sync From 24df91539a4533b5d99cada4d31f5dfa0e93c531 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 21 Jan 2026 02:24:41 +0000 Subject: [PATCH 07/13] Add fallback for claude runtime in Claude Code Web environment When CLAUDE_CODE_REMOTE=true, the claude command cannot be executed. Instead of hanging, the system now returns the prompt to the agent with instructions to evaluate it manually via a sub-agent. Changes: - Add is_claude_code_remote() function to detect remote environment - Update invoke_claude_headless() to return fallback prompt when remote - Add "Rules Requiring Sub-Agent Evaluation" section in hook output - Add tests for the new fallback behavior --- src/deepwork/hooks/rules_check.py | 51 +++++++++++++++++++---- tests/unit/test_rules_check.py | 68 +++++++++++++++++++++++++++++++ 2 files changed, 110 insertions(+), 9 deletions(-) diff --git a/src/deepwork/hooks/rules_check.py b/src/deepwork/hooks/rules_check.py index dd13ab5b..ec8ddbd5 100644 --- a/src/deepwork/hooks/rules_check.py +++ b/src/deepwork/hooks/rules_check.py @@ -626,7 +626,12 @@ def parse_claude_response(output: str) -> tuple[str, str]: return "block", "Claude did not return a structured response" -def invoke_claude_headless(prompt: str, rule_name: str) -> tuple[str, str]: +def is_claude_code_remote() -> bool: + """Check if running in Claude Code Web/Remote environment.""" + return os.environ.get("CLAUDE_CODE_REMOTE", "").lower() == "true" + + +def invoke_claude_headless(prompt: str, rule_name: str) -> tuple[str, str, str | None]: """ Invoke Claude Code in headless mode with the given prompt. @@ -635,8 +640,20 @@ def invoke_claude_headless(prompt: str, rule_name: str) -> tuple[str, str]: rule_name: Name of the rule being evaluated (for error messages) Returns: - Tuple of (decision, reason) where decision is "block" or "allow" + Tuple of (decision, reason, fallback_prompt) where: + - decision is "block" or "allow" + - reason is the explanation + - fallback_prompt is the prompt to show to agent if Claude can't run (or None) """ + # Check if we're in Claude Code Web/Remote environment + if is_claude_code_remote(): + fallback_msg = ( + "**Cannot run `claude` command in Claude Code Web environment.**\n\n" + "Please evaluate the following rule in a sub-agent:\n\n" + f"---\n{prompt}\n---" + ) + return "block", f"Rule '{rule_name}' requires manual evaluation", fallback_msg + try: # Run claude in headless mode with --print flag to get output result = subprocess.run( @@ -649,17 +666,22 @@ def invoke_claude_headless(prompt: str, rule_name: str) -> tuple[str, str]: if result.returncode != 0: error_msg = result.stderr.strip() or "Unknown error" - return "block", f"Claude execution failed: {error_msg}" + return "block", f"Claude execution failed: {error_msg}", None output = result.stdout.strip() - return parse_claude_response(output) + decision, reason = parse_claude_response(output) + return decision, reason, None except subprocess.TimeoutExpired: - return "block", f"Claude timed out while processing rule '{rule_name}'" + return "block", f"Claude timed out while processing rule '{rule_name}'", None except FileNotFoundError: - return "block", "Claude CLI not found. Please ensure 'claude' is installed and in PATH" + return ( + "block", + "Claude CLI not found. Please ensure 'claude' is installed and in PATH", + None, + ) except Exception as e: - return "block", f"Error invoking Claude: {str(e)}" + return "block", f"Error invoking Claude: {str(e)}", None def rules_check_hook(hook_input: HookInput) -> HookOutput: @@ -805,6 +827,7 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput: # Process Claude runtime rules claude_errors: list[str] = [] + claude_fallback_prompts: list[str] = [] for result in claude_prompt_results: rule = result.rule @@ -818,9 +841,13 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput: # Invoke Claude in headless mode prompt = format_claude_prompt(result, hook_input.transcript_path) - decision, reason = invoke_claude_headless(prompt, rule.name) + decision, reason, fallback_prompt = invoke_claude_headless(prompt, rule.name) - if decision == "allow": + if fallback_prompt: + # Claude can't run in this environment, return prompt to agent + claude_fallback_prompts.append(f"## {rule.name}\n\n{fallback_prompt}\n") + # Don't update queue status - let agent handle it + elif decision == "allow": # Claude resolved the issue queue.update_status( trigger_hash, @@ -861,6 +888,12 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput: messages.extend(claude_errors) messages.append("") + # Add Claude fallback prompts (when Claude can't run in this environment) + if claude_fallback_prompts: + messages.append("## Rules Requiring Sub-Agent Evaluation\n") + messages.extend(claude_fallback_prompts) + messages.append("") + # Add prompt rules if any (send_to_stopping_agent runtime) if agent_prompt_results: messages.append(format_rules_message(agent_prompt_results)) diff --git a/tests/unit/test_rules_check.py b/tests/unit/test_rules_check.py index 34d2852f..9cf907e3 100644 --- a/tests/unit/test_rules_check.py +++ b/tests/unit/test_rules_check.py @@ -1,5 +1,8 @@ """Tests for rules_check hook module.""" +import os +from unittest.mock import patch + from deepwork.core.rules_parser import ( DetectionMode, PairConfig, @@ -10,6 +13,8 @@ from deepwork.hooks.rules_check import ( extract_promise_tags, format_claude_prompt, + invoke_claude_headless, + is_claude_code_remote, parse_claude_response, ) @@ -412,3 +417,66 @@ def test_parses_result_embedded_in_longer_output(self) -> None: assert decision == "allow" assert "passes security review" in reason + + +class TestIsClaudeCodeRemote: + """Tests for is_claude_code_remote function.""" + + def test_returns_true_when_env_var_is_true(self) -> None: + """Test returns True when CLAUDE_CODE_REMOTE=true.""" + with patch.dict(os.environ, {"CLAUDE_CODE_REMOTE": "true"}): + assert is_claude_code_remote() is True + + def test_returns_true_when_env_var_is_TRUE(self) -> None: + """Test returns True when CLAUDE_CODE_REMOTE=TRUE (case insensitive).""" + with patch.dict(os.environ, {"CLAUDE_CODE_REMOTE": "TRUE"}): + assert is_claude_code_remote() is True + + def test_returns_false_when_env_var_is_false(self) -> None: + """Test returns False when CLAUDE_CODE_REMOTE=false.""" + with patch.dict(os.environ, {"CLAUDE_CODE_REMOTE": "false"}): + assert is_claude_code_remote() is False + + def test_returns_false_when_env_var_not_set(self) -> None: + """Test returns False when CLAUDE_CODE_REMOTE is not set.""" + env = os.environ.copy() + env.pop("CLAUDE_CODE_REMOTE", None) + with patch.dict(os.environ, env, clear=True): + assert is_claude_code_remote() is False + + def test_returns_false_when_env_var_is_empty(self) -> None: + """Test returns False when CLAUDE_CODE_REMOTE is empty.""" + with patch.dict(os.environ, {"CLAUDE_CODE_REMOTE": ""}): + assert is_claude_code_remote() is False + + +class TestInvokeClaudeHeadlessFallback: + """Tests for invoke_claude_headless fallback behavior in remote environments.""" + + def test_returns_fallback_prompt_in_remote_environment(self) -> None: + """Test returns fallback prompt when in Claude Code Remote environment.""" + with patch.dict(os.environ, {"CLAUDE_CODE_REMOTE": "true"}): + decision, reason, fallback = invoke_claude_headless( + "Test prompt content", "Test Rule" + ) + + assert decision == "block" + assert "manual evaluation" in reason + assert fallback is not None + assert "Cannot run `claude` command" in fallback + assert "Claude Code Web" in fallback + assert "Test prompt content" in fallback + + def test_no_fallback_in_local_environment(self) -> None: + """Test no fallback when not in remote environment (but may fail if claude not installed).""" + with patch.dict(os.environ, {"CLAUDE_CODE_REMOTE": "false"}): + # Mock subprocess to simulate claude not found + with patch("deepwork.hooks.rules_check.subprocess.run") as mock_run: + mock_run.side_effect = FileNotFoundError() + decision, reason, fallback = invoke_claude_headless( + "Test prompt", "Test Rule" + ) + + assert decision == "block" + assert "not found" in reason + assert fallback is None # No fallback, actual error From 2cfd2b53efd3c7ebaf2bca8c7d1d0d77414e9e70 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 21 Jan 2026 02:34:26 +0000 Subject: [PATCH 08/13] Update claude runtime test instructions Updated instructions to explain that testers should introduce a blatant error (e.g., division by zero) to verify Claude detects and blocks it. Also noted the fallback behavior in Claude Code Web environments. --- .../test_claude_runtime/test_claude_runtime_code.py | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/manual_tests/test_claude_runtime/test_claude_runtime_code.py b/manual_tests/test_claude_runtime/test_claude_runtime_code.py index ef0cd682..11208126 100644 --- a/manual_tests/test_claude_runtime/test_claude_runtime_code.py +++ b/manual_tests/test_claude_runtime/test_claude_runtime_code.py @@ -7,9 +7,11 @@ # 3. The hook parses Claude's response (block/allow) # # To test: -# 1. Edit this file (e.g., add a comment or change the function) -# 2. Run the rules_check hook -# 3. Verify Claude is invoked and returns a structured response +# 1. Introduce a BLATANT ERROR in the code below (e.g., undefined variable, +# obvious bug like dividing by zero, or completely wrong logic) +# 2. The Claude runtime will invoke Claude Code in headless mode +# 3. Claude should detect the error and return a "block" response +# 4. In Claude Code Web environment, you'll see the fallback prompt instead def calculate_sum(numbers: list[int]) -> int: From 10b40d3ea79456acf7a1047b4e1fe01ffdb4ffec Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 21 Jan 2026 02:37:30 +0000 Subject: [PATCH 09/13] Add comprehensive tests for invoke_claude_headless execution Tests cover: - Successful allow/block decisions - Non-zero exit code handling - Subprocess timeout handling - Generic exception handling - Correct command-line arguments --- tests/unit/test_rules_check.py | 129 +++++++++++++++++++++++++++++++++ 1 file changed, 129 insertions(+) diff --git a/tests/unit/test_rules_check.py b/tests/unit/test_rules_check.py index 9cf907e3..78c7233f 100644 --- a/tests/unit/test_rules_check.py +++ b/tests/unit/test_rules_check.py @@ -480,3 +480,132 @@ def test_no_fallback_in_local_environment(self) -> None: assert decision == "block" assert "not found" in reason assert fallback is None # No fallback, actual error + + +class TestInvokeClaudeHeadlessExecution: + """Tests for invoke_claude_headless subprocess execution.""" + + def test_successful_allow_decision(self) -> None: + """Test successful execution with allow decision.""" + mock_output = """ +I've reviewed the code. + +---RULE_RESULT--- +decision: allow +reason: Code looks good +---END_RULE_RESULT--- +""" + with patch.dict(os.environ, {"CLAUDE_CODE_REMOTE": "false"}): + with patch("deepwork.hooks.rules_check.subprocess.run") as mock_run: + mock_run.return_value.returncode = 0 + mock_run.return_value.stdout = mock_output + mock_run.return_value.stderr = "" + + decision, reason, fallback = invoke_claude_headless( + "Test prompt", "Test Rule" + ) + + assert decision == "allow" + assert reason == "Code looks good" + assert fallback is None + + def test_successful_block_decision(self) -> None: + """Test successful execution with block decision.""" + mock_output = """ +Found issues in the code. + +---RULE_RESULT--- +decision: block +reason: Security vulnerability detected +---END_RULE_RESULT--- +""" + with patch.dict(os.environ, {"CLAUDE_CODE_REMOTE": "false"}): + with patch("deepwork.hooks.rules_check.subprocess.run") as mock_run: + mock_run.return_value.returncode = 0 + mock_run.return_value.stdout = mock_output + mock_run.return_value.stderr = "" + + decision, reason, fallback = invoke_claude_headless( + "Test prompt", "Test Rule" + ) + + assert decision == "block" + assert reason == "Security vulnerability detected" + assert fallback is None + + def test_nonzero_exit_code(self) -> None: + """Test handling of non-zero exit code from Claude.""" + with patch.dict(os.environ, {"CLAUDE_CODE_REMOTE": "false"}): + with patch("deepwork.hooks.rules_check.subprocess.run") as mock_run: + mock_run.return_value.returncode = 1 + mock_run.return_value.stdout = "" + mock_run.return_value.stderr = "API rate limit exceeded" + + decision, reason, fallback = invoke_claude_headless( + "Test prompt", "Test Rule" + ) + + assert decision == "block" + assert "execution failed" in reason + assert "rate limit" in reason + assert fallback is None + + def test_timeout_handling(self) -> None: + """Test handling of subprocess timeout.""" + import subprocess + + with patch.dict(os.environ, {"CLAUDE_CODE_REMOTE": "false"}): + with patch("deepwork.hooks.rules_check.subprocess.run") as mock_run: + mock_run.side_effect = subprocess.TimeoutExpired( + cmd=["claude"], timeout=300 + ) + + decision, reason, fallback = invoke_claude_headless( + "Test prompt", "Test Rule" + ) + + assert decision == "block" + assert "timed out" in reason + assert "Test Rule" in reason + assert fallback is None + + def test_generic_exception_handling(self) -> None: + """Test handling of generic exceptions.""" + with patch.dict(os.environ, {"CLAUDE_CODE_REMOTE": "false"}): + with patch("deepwork.hooks.rules_check.subprocess.run") as mock_run: + mock_run.side_effect = OSError("Permission denied") + + decision, reason, fallback = invoke_claude_headless( + "Test prompt", "Test Rule" + ) + + assert decision == "block" + assert "Error invoking Claude" in reason + assert "Permission denied" in reason + assert fallback is None + + def test_calls_claude_with_correct_arguments(self) -> None: + """Test that Claude is called with the correct command-line arguments.""" + mock_output = """ +---RULE_RESULT--- +decision: allow +reason: OK +---END_RULE_RESULT--- +""" + with patch.dict(os.environ, {"CLAUDE_CODE_REMOTE": "false"}): + with patch("deepwork.hooks.rules_check.subprocess.run") as mock_run: + mock_run.return_value.returncode = 0 + mock_run.return_value.stdout = mock_output + mock_run.return_value.stderr = "" + + invoke_claude_headless("My test prompt", "Test Rule") + + mock_run.assert_called_once() + call_args = mock_run.call_args + cmd = call_args[0][0] + + assert cmd[0] == "claude" + assert "--print" in cmd + assert "--dangerously-skip-permissions" in cmd + assert "-p" in cmd + assert "My test prompt" in cmd From 7f4b96be59bdadb76dbfd30f7ec407a3e02c06a0 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 21 Jan 2026 02:55:56 +0000 Subject: [PATCH 10/13] Add SubagentStop hook support for Claude Code Claude Code now has separate Stop and SubagentStop events. Previously, Stop would trigger for both the main agent and subagents. This change ensures that when a Stop hook is defined, it is also registered for SubagentStop so the same validation logic triggers for both events. Changes: - generator.py: Duplicate Stop hooks to SubagentStop in skill templates - hooks_syncer.py: Duplicate Stop hooks to SubagentStop in global hooks - Add tests verifying SubagentStop hook registration --- src/deepwork/core/generator.py | 6 ++++ src/deepwork/core/hooks_syncer.py | 11 ++++++ tests/unit/test_hooks_syncer.py | 55 ++++++++++++++++++++++++++++- tests/unit/test_stop_hooks.py | 57 +++++++++++++++++++++++++++++++ 4 files changed, 128 insertions(+), 1 deletion(-) diff --git a/src/deepwork/core/generator.py b/src/deepwork/core/generator.py index 77eede57..2ee109f9 100644 --- a/src/deepwork/core/generator.py +++ b/src/deepwork/core/generator.py @@ -215,6 +215,12 @@ def _build_step_context( if hook_contexts: hooks[platform_event_name] = hook_contexts + # Claude Code has separate Stop and SubagentStop events. When a Stop hook + # is defined, also register it for SubagentStop so it triggers for both + # the main agent and subagents. + if "Stop" in hooks: + hooks["SubagentStop"] = hooks["Stop"] + # Backward compatibility: stop_hooks is after_agent hooks stop_hooks = hooks.get( adapter.get_platform_hook_name(SkillLifecycleHook.AFTER_AGENT) or "Stop", [] diff --git a/src/deepwork/core/hooks_syncer.py b/src/deepwork/core/hooks_syncer.py index 5df2e74f..4a97cbd7 100644 --- a/src/deepwork/core/hooks_syncer.py +++ b/src/deepwork/core/hooks_syncer.py @@ -187,6 +187,17 @@ def merge_hooks_for_platform( if not _hook_already_present(merged[event], command): merged[event].append(hook_config) + # Claude Code has separate Stop and SubagentStop events. When a Stop hook + # is defined, also register it for SubagentStop so it triggers for both + # the main agent and subagents. + if "Stop" in merged: + if "SubagentStop" not in merged: + merged["SubagentStop"] = [] + for hook_config in merged["Stop"]: + command = hook_config.get("hooks", [{}])[0].get("command", "") + if not _hook_already_present(merged["SubagentStop"], command): + merged["SubagentStop"].append(hook_config) + return merged diff --git a/tests/unit/test_hooks_syncer.py b/tests/unit/test_hooks_syncer.py index abaca222..79527681 100644 --- a/tests/unit/test_hooks_syncer.py +++ b/tests/unit/test_hooks_syncer.py @@ -224,6 +224,57 @@ def test_avoids_duplicate_hooks(self, temp_dir: Path) -> None: # Should only have one entry assert len(result["Stop"]) == 1 + def test_duplicates_stop_hooks_to_subagent_stop(self, temp_dir: Path) -> None: + """Test that Stop hooks are also registered for SubagentStop event. + + Claude Code has separate Stop and SubagentStop events. When a Stop hook + is defined, it should also be registered for SubagentStop so the hook + triggers for both the main agent and subagents. + """ + job_dir = temp_dir / ".deepwork" / "jobs" / "job1" + job_dir.mkdir(parents=True) + + job_hooks_list = [ + JobHooks( + job_name="job1", + job_dir=job_dir, + hooks={"Stop": [HookSpec(script="hook.sh")]}, + ), + ] + + result = merge_hooks_for_platform(job_hooks_list, temp_dir) + + # Should have both Stop and SubagentStop events + assert "Stop" in result + assert "SubagentStop" in result + assert len(result["Stop"]) == 1 + assert len(result["SubagentStop"]) == 1 + + # Both should have the same hook command + stop_cmd = result["Stop"][0]["hooks"][0]["command"] + subagent_stop_cmd = result["SubagentStop"][0]["hooks"][0]["command"] + assert stop_cmd == subagent_stop_cmd == ".deepwork/jobs/job1/hooks/hook.sh" + + def test_does_not_duplicate_subagent_stop_if_no_stop(self, temp_dir: Path) -> None: + """Test that SubagentStop is not created if there are no Stop hooks.""" + job_dir = temp_dir / ".deepwork" / "jobs" / "job1" + job_dir.mkdir(parents=True) + + job_hooks_list = [ + JobHooks( + job_name="job1", + job_dir=job_dir, + hooks={"UserPromptSubmit": [HookSpec(script="capture.sh")]}, + ), + ] + + result = merge_hooks_for_platform(job_hooks_list, temp_dir) + + # Should only have UserPromptSubmit, not SubagentStop + assert "UserPromptSubmit" in result + assert "SubagentStop" not in result + assert "Stop" not in result + class TestSyncHooksToPlatform: """Tests for sync_hooks_to_platform function using adapters.""" @@ -249,7 +300,8 @@ def test_syncs_hooks_via_adapter(self, temp_dir: Path) -> None: count = sync_hooks_to_platform(temp_dir, adapter, job_hooks_list) - assert count == 1 + # Count is 2 because Stop hooks are also registered for SubagentStop + assert count == 2 # Verify settings.json was created settings_file = temp_dir / ".claude" / "settings.json" @@ -260,6 +312,7 @@ def test_syncs_hooks_via_adapter(self, temp_dir: Path) -> None: assert "hooks" in settings assert "Stop" in settings["hooks"] + assert "SubagentStop" in settings["hooks"] def test_returns_zero_for_empty_hooks(self, temp_dir: Path) -> None: """Test returns 0 when no hooks to sync.""" diff --git a/tests/unit/test_stop_hooks.py b/tests/unit/test_stop_hooks.py index c1516514..55ff71c8 100644 --- a/tests/unit/test_stop_hooks.py +++ b/tests/unit/test_stop_hooks.py @@ -618,3 +618,60 @@ def test_build_context_multiple_hooks(self, generator: SkillGenerator, tmp_path: assert context["stop_hooks"][0]["type"] == "prompt" assert context["stop_hooks"][1]["type"] == "script" assert context["stop_hooks"][2]["type"] == "prompt" + + def test_build_context_duplicates_stop_to_subagent_stop( + self, generator: SkillGenerator, job_with_hooks: JobDefinition + ) -> None: + """Test that Stop hooks are also registered for SubagentStop event. + + Claude Code has separate Stop and SubagentStop events. When a Stop hook + is defined, it should also be registered for SubagentStop so the hook + triggers for both the main agent and subagents. + """ + adapter = ClaudeAdapter() + context = generator._build_step_context(job_with_hooks, job_with_hooks.steps[0], 0, adapter) + + # Should have both Stop and SubagentStop in hooks dict + assert "hooks" in context + assert "Stop" in context["hooks"] + assert "SubagentStop" in context["hooks"] + + # Both should have the same hooks + assert context["hooks"]["Stop"] == context["hooks"]["SubagentStop"] + assert len(context["hooks"]["Stop"]) == 1 + assert context["hooks"]["Stop"][0]["type"] == "prompt" + + def test_build_context_no_subagent_stop_without_stop( + self, generator: SkillGenerator, tmp_path: Path + ) -> None: + """Test that SubagentStop is not created if there are no Stop hooks.""" + job_dir = tmp_path / "test_job" + job_dir.mkdir() + steps_dir = job_dir / "steps" + steps_dir.mkdir() + (steps_dir / "step1.md").write_text("# Step 1") + + job = JobDefinition( + name="test_job", + version="1.0.0", + summary="Test", + description="Test", + steps=[ + Step( + id="step1", + name="Step 1", + description="Step", + instructions_file="steps/step1.md", + outputs=[OutputSpec(file="out.md")], + ) + ], + job_dir=job_dir, + ) + + adapter = ClaudeAdapter() + context = generator._build_step_context(job, job.steps[0], 0, adapter) + + # Should not have Stop or SubagentStop without any hooks + assert "hooks" in context + assert "Stop" not in context["hooks"] + assert "SubagentStop" not in context["hooks"] From f81dc14c790858ea97ed3f576863709568f049e8 Mon Sep 17 00:00:00 2001 From: Noah Horton Date: Wed, 21 Jan 2026 10:45:08 -0700 Subject: [PATCH 11/13] Register the subagentstop hook too --- .claude/settings.json | 11 ++ .../add_platform.add_capabilities/SKILL.md | 15 ++ .../skills/add_platform.implement/SKILL.md | 19 ++ .claude/skills/add_platform.research/SKILL.md | 16 ++ .claude/skills/add_platform.verify/SKILL.md | 14 ++ .../skills/commit.commit_and_push/SKILL.md | 11 ++ .claude/skills/commit.lint/SKILL.md | 10 ++ .claude/skills/commit.test/SKILL.md | 11 ++ .claude/skills/deepwork_jobs.define/SKILL.md | 31 ++++ .../skills/deepwork_jobs.implement/SKILL.md | 28 +++ .claude/skills/deepwork_jobs.learn/SKILL.md | 31 ++++ .../deepwork_jobs.review_job_spec/SKILL.md | 23 +++ .claude/skills/update.job/SKILL.md | 11 ++ .../test_claude_runtime_code.py | 2 +- src/deepwork/hooks/rules_check.py | 71 ++++++-- .../templates/claude/skill-job-step.md.jinja | 24 ++- .../test_rules_stop_hook.py | 168 +++++++++++++++++- tests/unit/test_stop_hooks.py | 120 +++++++++++++ 18 files changed, 595 insertions(+), 21 deletions(-) diff --git a/.claude/settings.json b/.claude/settings.json index 252c0209..380d928a 100644 --- a/.claude/settings.json +++ b/.claude/settings.json @@ -130,6 +130,17 @@ } ] } + ], + "SubagentStop": [ + { + "matcher": "", + "hooks": [ + { + "type": "command", + "command": "python -m deepwork.hooks.rules_check" + } + ] + } ] } } \ No newline at end of file diff --git a/.claude/skills/add_platform.add_capabilities/SKILL.md b/.claude/skills/add_platform.add_capabilities/SKILL.md index f00fd713..b9d76df3 100644 --- a/.claude/skills/add_platform.add_capabilities/SKILL.md +++ b/.claude/skills/add_platform.add_capabilities/SKILL.md @@ -18,6 +18,21 @@ hooks: If ALL criteria are met, include `✓ Quality Criteria Met`. + SubagentStop: + - hooks: + - type: prompt + prompt: | + Verify the capability additions meet ALL criteria: + 1. Any new hooks from the platform (for slash commands only) are added to src/deepwork/schemas/job_schema.py + 2. All existing adapters in src/deepwork/adapters.py are updated with the new hook fields + (set to None/null if the platform doesn't support that hook) + 3. Only hooks available on slash command definitions are added (not general CLI hooks) + 4. job_schema.py remains valid Python with no syntax errors + 5. adapters.py remains consistent - all adapters have the same hook fields + 6. If no new hooks are needed, document why in a comment + + If ALL criteria are met, include `✓ Quality Criteria Met`. + --- # add_platform.add_capabilities diff --git a/.claude/skills/add_platform.implement/SKILL.md b/.claude/skills/add_platform.implement/SKILL.md index eceb86ef..44722b65 100644 --- a/.claude/skills/add_platform.implement/SKILL.md +++ b/.claude/skills/add_platform.implement/SKILL.md @@ -22,6 +22,25 @@ hooks: If ALL criteria are met, include `✓ Quality Criteria Met`. + SubagentStop: + - hooks: + - type: command + command: ".deepwork/jobs/add_platform/hooks/run_tests.sh" + - type: prompt + prompt: | + Verify the implementation meets ALL criteria: + 1. Platform adapter class is added to src/deepwork/adapters.py + 2. Templates exist in src/deepwork/templates// with appropriate command structure + 3. Tests exist for all new functionality + 4. Test coverage is 100% for new code (run: uv run pytest --cov) + 5. All tests pass + 6. README.md is updated with: + - New platform listed in supported platforms + - Installation instructions for the platform + - Any platform-specific notes + + If ALL criteria are met, include `✓ Quality Criteria Met`. + --- # add_platform.implement diff --git a/.claude/skills/add_platform.research/SKILL.md b/.claude/skills/add_platform.research/SKILL.md index ff7e489e..af44f2d3 100644 --- a/.claude/skills/add_platform.research/SKILL.md +++ b/.claude/skills/add_platform.research/SKILL.md @@ -19,6 +19,22 @@ hooks: If ALL criteria are met, include `✓ Quality Criteria Met`. + SubagentStop: + - hooks: + - type: prompt + prompt: | + Verify the research output meets ALL criteria: + 1. Both files exist in doc/platforms//: cli_configuration.md and hooks_system.md + 2. Each file has a comment at the top with: + - Last updated date + - Source URL where the documentation was obtained + 3. cli_configuration.md covers how the platform's CLI is configured + 4. hooks_system.md covers hooks available for slash command definitions ONLY + 5. No extraneous documentation (only these two specific topics) + 6. Documentation is comprehensive enough to implement the platform + + If ALL criteria are met, include `✓ Quality Criteria Met`. + --- # add_platform.research diff --git a/.claude/skills/add_platform.verify/SKILL.md b/.claude/skills/add_platform.verify/SKILL.md index 67b20801..583101f2 100644 --- a/.claude/skills/add_platform.verify/SKILL.md +++ b/.claude/skills/add_platform.verify/SKILL.md @@ -17,6 +17,20 @@ hooks: If ALL criteria are met, include `✓ Quality Criteria Met`. + SubagentStop: + - hooks: + - type: prompt + prompt: | + Verify the installation meets ALL criteria: + 1. Platform-specific directories/files are added to the deepwork repo as needed + 2. Running `deepwork install --platform ` completes without errors + 3. Expected command files are created in the platform's command directory + 4. Command file content matches the templates and job definitions + 5. Established DeepWork jobs (deepwork_jobs, deepwork_rules) are installed correctly + 6. The platform can be used alongside existing platforms without conflicts + + If ALL criteria are met, include `✓ Quality Criteria Met`. + --- # add_platform.verify diff --git a/.claude/skills/commit.commit_and_push/SKILL.md b/.claude/skills/commit.commit_and_push/SKILL.md index e13fcf06..24900c84 100644 --- a/.claude/skills/commit.commit_and_push/SKILL.md +++ b/.claude/skills/commit.commit_and_push/SKILL.md @@ -14,6 +14,17 @@ hooks: 4. Changes were pushed to remote If ALL criteria are met, include `✓ Quality Criteria Met`. + SubagentStop: + - hooks: + - type: prompt + prompt: | + Verify the commit is ready: + 1. Changed files list was reviewed by the agent + 2. Files match what was modified during this session (or unexpected changes were investigated) + 3. Commit was created with appropriate message + 4. Changes were pushed to remote + If ALL criteria are met, include `✓ Quality Criteria Met`. + --- # commit.commit_and_push diff --git a/.claude/skills/commit.lint/SKILL.md b/.claude/skills/commit.lint/SKILL.md index 4054abaa..08fedf47 100644 --- a/.claude/skills/commit.lint/SKILL.md +++ b/.claude/skills/commit.lint/SKILL.md @@ -13,6 +13,16 @@ hooks: 3. No remaining lint errors If ALL criteria are met, include `✓ Quality Criteria Met`. + SubagentStop: + - hooks: + - type: prompt + prompt: | + Verify the linting is complete: + 1. ruff format was run successfully + 2. ruff check was run successfully (with --fix) + 3. No remaining lint errors + If ALL criteria are met, include `✓ Quality Criteria Met`. + --- # commit.lint diff --git a/.claude/skills/commit.test/SKILL.md b/.claude/skills/commit.test/SKILL.md index 5cfc1de9..07d6beb2 100644 --- a/.claude/skills/commit.test/SKILL.md +++ b/.claude/skills/commit.test/SKILL.md @@ -14,6 +14,17 @@ hooks: 4. Test output shows passing status If ALL criteria are met, include `✓ Quality Criteria Met`. + SubagentStop: + - hooks: + - type: prompt + prompt: | + Verify the tests are passing: + 1. Latest code was pulled from the branch + 2. All tests completed successfully + 3. No test failures or errors remain + 4. Test output shows passing status + If ALL criteria are met, include `✓ Quality Criteria Met`. + --- # commit.test diff --git a/.claude/skills/deepwork_jobs.define/SKILL.md b/.claude/skills/deepwork_jobs.define/SKILL.md index 0215a6c7..c77c043a 100644 --- a/.claude/skills/deepwork_jobs.define/SKILL.md +++ b/.claude/skills/deepwork_jobs.define/SKILL.md @@ -34,6 +34,37 @@ hooks: If criteria are NOT met OR the promise tag is missing, respond with: {"ok": false, "reason": "**AGENT: TAKE ACTION** - [which criteria failed and why]"} + SubagentStop: + - hooks: + - type: prompt + prompt: | + You must evaluate whether Claude has met all the below quality criteria for the request. + + ## Quality Criteria + + 1. **User Understanding**: Did the agent fully understand the user's workflow by asking structured questions? + 2. **Structured Questions Used**: Did the agent ask structured questions (using the AskUserQuestion tool) to gather user input? + 3. **Document Detection**: For document-oriented workflows, did the agent detect patterns and offer doc spec creation? + 4. **doc spec Created (if applicable)**: If a doc spec was needed, was it created in `.deepwork/doc_specs/[doc_spec_name].md` with proper quality criteria? + 5. **doc spec References**: Are document outputs properly linked to their doc specs using `{file, doc_spec}` format? + 6. **Valid Against doc spec**: Does the job.yml conform to the job.yml doc spec quality criteria (valid identifier, semantic version, concise summary, rich description, complete steps, valid dependencies)? + 7. **Clear Inputs/Outputs**: Does every step have clearly defined inputs and outputs? + 8. **Logical Dependencies**: Do step dependencies make sense and avoid circular references? + 9. **Concise Summary**: Is the summary under 200 characters and descriptive? + 10. **Rich Description**: Does the description provide enough context for future refinement? + 11. **Valid Schema**: Does the job.yml follow the required schema (name, version, summary, steps)? + 12. **File Created**: Has the job.yml file been created in `.deepwork/jobs/[job_name]/job.yml`? + + ## Instructions + + Review the conversation and determine if ALL quality criteria above have been satisfied. + Look for evidence that each criterion has been addressed. + + If the agent has included `✓ Quality Criteria Met` in their response AND + all criteria appear to be met, respond with: {"ok": true} + + If criteria are NOT met OR the promise tag is missing, respond with: + {"ok": false, "reason": "**AGENT: TAKE ACTION** - [which criteria failed and why]"} --- # deepwork_jobs.define diff --git a/.claude/skills/deepwork_jobs.implement/SKILL.md b/.claude/skills/deepwork_jobs.implement/SKILL.md index 28af7b70..9ad9bdcf 100644 --- a/.claude/skills/deepwork_jobs.implement/SKILL.md +++ b/.claude/skills/deepwork_jobs.implement/SKILL.md @@ -31,6 +31,34 @@ hooks: If criteria are NOT met OR the promise tag is missing, respond with: {"ok": false, "reason": "**AGENT: TAKE ACTION** - [which criteria failed and why]"} + SubagentStop: + - hooks: + - type: prompt + prompt: | + You must evaluate whether Claude has met all the below quality criteria for the request. + + ## Quality Criteria + + 1. **Directory Structure**: Is `.deepwork/jobs/[job_name]/` created correctly? + 2. **Complete Instructions**: Are ALL step instruction files complete (not stubs or placeholders)? + 3. **Specific & Actionable**: Are instructions tailored to each step's purpose, not generic? + 4. **Output Examples**: Does each instruction file show what good output looks like? + 5. **Quality Criteria**: Does each instruction file define quality criteria for its outputs? + 6. **Ask Structured Questions**: Do step instructions that gather user input explicitly use the phrase "ask structured questions"? + 7. **Sync Complete**: Has `deepwork sync` been run successfully? + 8. **Commands Available**: Are the slash-commands generated in `.claude/commands/`? + 9. **Rules Considered**: Has the agent thought about whether rules would benefit this job? If relevant rules were identified, did they explain them and offer to run `/deepwork_rules.define`? Not every job needs rules - only suggest when genuinely helpful. + + ## Instructions + + Review the conversation and determine if ALL quality criteria above have been satisfied. + Look for evidence that each criterion has been addressed. + + If the agent has included `✓ Quality Criteria Met` in their response AND + all criteria appear to be met, respond with: {"ok": true} + + If criteria are NOT met OR the promise tag is missing, respond with: + {"ok": false, "reason": "**AGENT: TAKE ACTION** - [which criteria failed and why]"} --- # deepwork_jobs.implement diff --git a/.claude/skills/deepwork_jobs.learn/SKILL.md b/.claude/skills/deepwork_jobs.learn/SKILL.md index da29dad5..cecc8e6f 100644 --- a/.claude/skills/deepwork_jobs.learn/SKILL.md +++ b/.claude/skills/deepwork_jobs.learn/SKILL.md @@ -33,6 +33,37 @@ hooks: If criteria are NOT met OR the promise tag is missing, respond with: {"ok": false, "reason": "**AGENT: TAKE ACTION** - [which criteria failed and why]"} + SubagentStop: + - hooks: + - type: prompt + prompt: | + You must evaluate whether Claude has met all the below quality criteria for the request. + + ## Quality Criteria + + 1. **Conversation Analyzed**: Did the agent review the conversation for DeepWork job executions? + 2. **Confusion Identified**: Did the agent identify points of confusion, errors, or inefficiencies? + 3. **Instructions Improved**: Were job instructions updated to address identified issues? + 4. **Instructions Concise**: Are instructions free of redundancy and unnecessary verbosity? + 5. **Shared Content Extracted**: Is lengthy/duplicated content extracted into referenced files? + 6. **doc spec Reviewed (if applicable)**: For jobs with doc spec outputs, were doc spec-related learnings identified? + 7. **doc spec Updated (if applicable)**: Were doc spec files updated with improved quality criteria or structure? + 8. **Bespoke Learnings Captured**: Were run-specific learnings added to AGENTS.md? + 9. **File References Used**: Do AGENTS.md entries reference other files where appropriate? + 10. **Working Folder Correct**: Is AGENTS.md in the correct working folder for the job? + 11. **Generalizable Separated**: Are generalizable improvements in instructions, not AGENTS.md? + 12. **Sync Complete**: Has `deepwork sync` been run if instructions were modified? + + ## Instructions + + Review the conversation and determine if ALL quality criteria above have been satisfied. + Look for evidence that each criterion has been addressed. + + If the agent has included `✓ Quality Criteria Met` in their response AND + all criteria appear to be met, respond with: {"ok": true} + + If criteria are NOT met OR the promise tag is missing, respond with: + {"ok": false, "reason": "**AGENT: TAKE ACTION** - [which criteria failed and why]"} --- # deepwork_jobs.learn diff --git a/.claude/skills/deepwork_jobs.review_job_spec/SKILL.md b/.claude/skills/deepwork_jobs.review_job_spec/SKILL.md index 85a881b1..25915621 100644 --- a/.claude/skills/deepwork_jobs.review_job_spec/SKILL.md +++ b/.claude/skills/deepwork_jobs.review_job_spec/SKILL.md @@ -26,6 +26,29 @@ hooks: If criteria are NOT met OR the promise tag is missing, respond with: {"ok": false, "reason": "**AGENT: TAKE ACTION** - [which criteria failed and why]"} + SubagentStop: + - hooks: + - type: prompt + prompt: | + You must evaluate whether Claude has met all the below quality criteria for the request. + + ## Quality Criteria + + 1. **Sub-Agent Used**: Was a sub-agent spawned to provide unbiased review? + 2. **All doc spec Criteria Evaluated**: Did the sub-agent assess all 9 quality criteria? + 3. **Findings Addressed**: Were all failed criteria addressed by the main agent? + 4. **Validation Loop Complete**: Did the review-fix cycle continue until all criteria passed? + + ## Instructions + + Review the conversation and determine if ALL quality criteria above have been satisfied. + Look for evidence that each criterion has been addressed. + + If the agent has included `✓ Quality Criteria Met` in their response AND + all criteria appear to be met, respond with: {"ok": true} + + If criteria are NOT met OR the promise tag is missing, respond with: + {"ok": false, "reason": "**AGENT: TAKE ACTION** - [which criteria failed and why]"} --- # deepwork_jobs.review_job_spec diff --git a/.claude/skills/update.job/SKILL.md b/.claude/skills/update.job/SKILL.md index 60038425..19ab7fb0 100644 --- a/.claude/skills/update.job/SKILL.md +++ b/.claude/skills/update.job/SKILL.md @@ -14,6 +14,17 @@ hooks: 4. Command files in .claude/commands/ were regenerated If ALL criteria are met, include `✓ Quality Criteria Met`. + SubagentStop: + - hooks: + - type: prompt + prompt: | + Verify the update process completed successfully: + 1. Changes were made in src/deepwork/standard_jobs/[job_name]/ (NOT in .deepwork/jobs/) + 2. `deepwork install --platform claude` was run + 3. Files in .deepwork/jobs/ match the source files + 4. Command files in .claude/commands/ were regenerated + If ALL criteria are met, include `✓ Quality Criteria Met`. + --- # update.job diff --git a/manual_tests/test_claude_runtime/test_claude_runtime_code.py b/manual_tests/test_claude_runtime/test_claude_runtime_code.py index 11208126..fed50e49 100644 --- a/manual_tests/test_claude_runtime/test_claude_runtime_code.py +++ b/manual_tests/test_claude_runtime/test_claude_runtime_code.py @@ -26,4 +26,4 @@ def calculate_average(numbers: list[int]) -> float: """Calculate the average of a list of numbers.""" if not numbers: return 0.0 - return calculate_sum(numbers) / len(numbers) + return calculate_sum(numbers) / 0 # BUG: divide by zero! diff --git a/src/deepwork/hooks/rules_check.py b/src/deepwork/hooks/rules_check.py index 04d0861c..d7fbedcb 100644 --- a/src/deepwork/hooks/rules_check.py +++ b/src/deepwork/hooks/rules_check.py @@ -645,6 +645,8 @@ def invoke_claude_headless(prompt: str, rule_name: str) -> tuple[str, str, str | - reason is the explanation - fallback_prompt is the prompt to show to agent if Claude can't run (or None) """ + import tempfile + # Check if we're in Claude Code Web/Remote environment if is_claude_code_remote(): fallback_msg = ( @@ -654,26 +656,52 @@ def invoke_claude_headless(prompt: str, rule_name: str) -> tuple[str, str, str | ) return "block", f"Rule '{rule_name}' requires manual evaluation", fallback_msg + output_path = None try: - # Run claude in headless mode with --print flag to get output - result = subprocess.run( - ["claude", "--print", "--dangerously-skip-permissions", "-p", prompt], - capture_output=True, - text=True, - timeout=300, # 5 minute timeout - cwd=Path.cwd(), - ) + # Create a temporary file for capturing output + # IMPORTANT: We redirect stdout/stderr to a file instead of using pipes + # (capture_output=True). This is critical because when Claude runs as a + # subprocess of another Claude instance, using pipes holds the parent's + # stdout file descriptor open. This blocks the snapshotter in the parent + # Claude, causing a 60-second timeout delay when the subprocess runs. + # By writing to a file and reading it after, we avoid this blocking issue. + with tempfile.NamedTemporaryFile( + mode="w", + suffix="_claude_output.log", + delete=False, + prefix="deepwork_", + ) as tmp: + output_path = tmp.name + + # Run claude in headless mode with output redirected to file + with open(output_path, "w") as outfile: + process = subprocess.Popen( + ["claude", "--print", "--dangerously-skip-permissions", "-p", prompt], + stdout=outfile, + stderr=subprocess.STDOUT, # Merge stderr into the same file + close_fds=True, # Close inherited file descriptors to prevent blocking + cwd=Path.cwd(), + ) - if result.returncode != 0: - error_msg = result.stderr.strip() or "Unknown error" + try: + # Wait for completion with timeout + process.wait(timeout=300) # 5 minute timeout + except subprocess.TimeoutExpired: + process.kill() + process.wait() + return "block", f"Claude timed out while processing rule '{rule_name}'", None + + # Read the output from the file + with open(output_path, "r") as f: + output = f.read().strip() + + if process.returncode != 0: + error_msg = output or "Unknown error" return "block", f"Claude execution failed: {error_msg}", None - output = result.stdout.strip() decision, reason = parse_claude_response(output) return decision, reason, None - except subprocess.TimeoutExpired: - return "block", f"Claude timed out while processing rule '{rule_name}'", None except FileNotFoundError: return ( "block", @@ -682,6 +710,13 @@ def invoke_claude_headless(prompt: str, rule_name: str) -> tuple[str, str, str | ) except Exception as e: return "block", f"Error invoking Claude: {str(e)}", None + finally: + # Clean up the temporary file + if output_path: + try: + Path(output_path).unlink(missing_ok=True) + except Exception: + pass # Ignore cleanup errors def rules_check_hook(hook_input: HookInput) -> HookOutput: @@ -764,13 +799,17 @@ def rules_check_hook(hook_input: HookInput) -> HookOutput: ): continue - # For PROMPT rules, also skip if already QUEUED (already shown to agent). - # This prevents infinite loops when transcript is unavailable or promise - # tags haven't been written yet. The agent has already seen this rule. + # For PROMPT rules with send_to_stopping_agent runtime, also skip if + # already QUEUED (already shown to agent). This prevents infinite loops + # when transcript is unavailable or promise tags haven't been written yet. + # The agent has already seen this rule. + # Note: Claude runtime rules should NOT be skipped here because they're + # executed by a separate Claude process, not shown to the stopping agent. if ( existing and existing.status == QueueEntryStatus.QUEUED and rule.action_type == ActionType.PROMPT + and rule.prompt_runtime == PromptRuntime.SEND_TO_STOPPING_AGENT ): continue diff --git a/src/deepwork/templates/claude/skill-job-step.md.jinja b/src/deepwork/templates/claude/skill-job-step.md.jinja index 8464a116..c76aa1ac 100644 --- a/src/deepwork/templates/claude/skill-job-step.md.jinja +++ b/src/deepwork/templates/claude/skill-job-step.md.jinja @@ -46,7 +46,8 @@ user-invocable: false {% if quality_criteria or hooks %} hooks: {% if quality_criteria %} - Stop: +{% for event_name in ["Stop", "SubagentStop"] %} + {{ event_name }}: - hooks: - type: prompt prompt: | @@ -68,9 +69,27 @@ hooks: If criteria are NOT met OR the promise tag is missing, respond with: {"ok": false, "reason": "**AGENT: TAKE ACTION** - [which criteria failed and why]"} +{% endfor %} {% endif %} {% for event_name, event_hooks in hooks.items() %} -{% if not (event_name == "Stop" and quality_criteria) %} +{% if not (event_name == "Stop" and quality_criteria) and not (event_name == "SubagentStop" and "Stop" in hooks) %} +{# For Stop events, generate both Stop and SubagentStop blocks #} +{% if event_name == "Stop" %} +{% for stop_event in ["Stop", "SubagentStop"] %} + {{ stop_event }}: + - hooks: +{% for hook in event_hooks %} +{% if hook.type == "script" %} + - type: command + command: ".deepwork/jobs/{{ job_name }}/{{ hook.path }}" +{% else %} + - type: prompt + prompt: | + {{ hook.content | indent(12) }} +{% endif %} +{% endfor %} +{% endfor %} +{% else %} {{ event_name }}: - hooks: {% for hook in event_hooks %} @@ -84,6 +103,7 @@ hooks: {% endif %} {% endfor %} {% endif %} +{% endif %} {% endfor %} {% endif %} --- diff --git a/tests/shell_script_tests/test_rules_stop_hook.py b/tests/shell_script_tests/test_rules_stop_hook.py index 5eaa73f6..0e67c6b8 100644 --- a/tests/shell_script_tests/test_rules_stop_hook.py +++ b/tests/shell_script_tests/test_rules_stop_hook.py @@ -305,10 +305,13 @@ class TestRulesStopHookInfiniteLoopPrevention: def test_queued_prompt_rule_does_not_refire( self, src_dir: Path, git_repo_with_src_rule: Path ) -> None: - """Test that a prompt rule with QUEUED status doesn't fire again. + """Test that a send_to_stopping_agent prompt rule with QUEUED status doesn't fire again. This prevents infinite loops when the transcript is unavailable or - promise tags haven't been written yet. + promise tags haven't been written yet. The agent has already seen this rule. + + Note: This only applies to rules with prompt_runtime: send_to_stopping_agent (default). + Claude runtime rules should refire - see test_claude_runtime_rule_refires_when_queued. """ # Create a file that triggers the rule test_src_dir = git_repo_with_src_rule / "src" @@ -407,3 +410,164 @@ def test_promise_tag_still_prevents_firing( assert result == {}, f"Rule should not fire with promise tag: {result}" finally: os.unlink(transcript_path) + + def test_claude_runtime_rule_refires_when_queued( + self, src_dir: Path, tmp_path: Path + ) -> None: + """Test that a claude runtime prompt rule DOES refire when QUEUED. + + Claude runtime rules execute in a separate subprocess, not shown to the + stopping agent. Therefore they should NOT be subject to the infinite loop + prevention that skips QUEUED rules. + + This test uses CLAUDE_CODE_REMOTE=true to simulate claude unavailability, + which causes the rule to remain QUEUED (fallback path) rather than + getting PASSED/FAILED status. + """ + # Create a git repo with a claude runtime rule + repo = Repo.init(tmp_path) + readme = tmp_path / "README.md" + readme.write_text("# Test Project\n") + repo.index.add(["README.md"]) + repo.index.commit("Initial commit") + + # Create rule with prompt_runtime: claude + rules_dir = tmp_path / ".deepwork" / "rules" + rules_dir.mkdir(parents=True, exist_ok=True) + + rule_file = rules_dir / "claude-runtime-rule.md" + rule_file.write_text( + """--- +name: Claude Runtime Rule +trigger: "src/**/*" +compare_to: prompt +prompt_runtime: claude +--- +This is a rule that runs in claude runtime. +Please check the code. +""" + ) + + # Set up baseline + deepwork_dir = tmp_path / ".deepwork" + (deepwork_dir / ".last_work_tree").write_text("") + + # Create a file that triggers the rule + test_src_dir = tmp_path / "src" + test_src_dir.mkdir(exist_ok=True) + (test_src_dir / "main.py").write_text("# New file\n") + repo.index.add(["src/main.py"]) + + # Run with CLAUDE_CODE_REMOTE=true so claude returns fallback (stays QUEUED) + env = os.environ.copy() + env["DEEPWORK_HOOK_PLATFORM"] = "claude" + env["CLAUDE_CODE_REMOTE"] = "true" + if src_dir: + env["PYTHONPATH"] = str(src_dir) + + import subprocess + + # First run: rule should fire and create queue entry + result1 = subprocess.run( + ["python", "-m", "deepwork.hooks.rules_check"], + cwd=tmp_path, + capture_output=True, + text=True, + input="", + env=env, + ) + output1 = json.loads(result1.stdout.strip()) + assert output1.get("decision") == "block", f"First run should block: {output1}" + assert "Claude Runtime Rule" in output1.get("reason", "") + + # Second run: rule SHOULD fire again (claude runtime rules are NOT skipped) + result2 = subprocess.run( + ["python", "-m", "deepwork.hooks.rules_check"], + cwd=tmp_path, + capture_output=True, + text=True, + input="", + env=env, + ) + output2 = json.loads(result2.stdout.strip()) + # Claude runtime rules should refire even when QUEUED + assert output2.get("decision") == "block", ( + f"Second run should also block (claude runtime rule should refire): {output2}" + ) + assert "Claude Runtime Rule" in output2.get("reason", "") + + +class TestSubagentStopEvent: + """Tests for SubagentStop event triggering agentFinished rules.""" + + def test_subagent_stop_event_triggers_rules( + self, src_dir: Path, git_repo_with_src_rule: Path + ) -> None: + """Test that SubagentStop event triggers agentFinished rules. + + Claude Code has both Stop and SubagentStop events that should both + trigger after_agent/agentFinished rules. + """ + # Create a file that triggers the rule + test_src_dir = git_repo_with_src_rule / "src" + test_src_dir.mkdir(exist_ok=True) + (test_src_dir / "main.py").write_text("# New file\n") + + # Stage the change + repo = Repo(git_repo_with_src_rule) + repo.index.add(["src/main.py"]) + + # Run with SubagentStop event + hook_input = {"hook_event_name": "SubagentStop"} + stdout, stderr, code = run_stop_hook( + git_repo_with_src_rule, hook_input, src_dir=src_dir + ) + + # Parse the output + output = stdout.strip() + assert output, f"Expected JSON output. stderr: {stderr}" + result = json.loads(output) + + # Should trigger the rule just like Stop event does + assert result.get("decision") == "block", f"SubagentStop should trigger rules: {result}" + assert "Test Rule" in result.get("reason", "") + + def test_both_stop_and_subagent_stop_trigger_same_rules( + self, src_dir: Path, git_repo_with_src_rule: Path + ) -> None: + """Test that Stop and SubagentStop events trigger the same rules. + + Both events should fire agentFinished rules with identical behavior. + """ + # Create a file that triggers the rule + test_src_dir = git_repo_with_src_rule / "src" + test_src_dir.mkdir(exist_ok=True) + (test_src_dir / "main.py").write_text("# New file\n") + + repo = Repo(git_repo_with_src_rule) + repo.index.add(["src/main.py"]) + + # Test Stop event + hook_input_stop = {"hook_event_name": "Stop"} + stdout_stop, _, _ = run_stop_hook( + git_repo_with_src_rule, hook_input_stop, src_dir=src_dir + ) + result_stop = json.loads(stdout_stop.strip()) + + # Clear the queue to allow the rule to fire again + queue_dir = git_repo_with_src_rule / ".deepwork" / "tmp" / "rules" / "queue" + if queue_dir.exists(): + for f in queue_dir.glob("*.json"): + f.unlink() + + # Test SubagentStop event + hook_input_subagent = {"hook_event_name": "SubagentStop"} + stdout_subagent, _, _ = run_stop_hook( + git_repo_with_src_rule, hook_input_subagent, src_dir=src_dir + ) + result_subagent = json.loads(stdout_subagent.strip()) + + # Both should produce the same blocking behavior + assert result_stop.get("decision") == result_subagent.get("decision") == "block" + assert "Test Rule" in result_stop.get("reason", "") + assert "Test Rule" in result_subagent.get("reason", "") diff --git a/tests/unit/test_stop_hooks.py b/tests/unit/test_stop_hooks.py index 55ff71c8..21feee1b 100644 --- a/tests/unit/test_stop_hooks.py +++ b/tests/unit/test_stop_hooks.py @@ -675,3 +675,123 @@ def test_build_context_no_subagent_stop_without_stop( assert "hooks" in context assert "Stop" not in context["hooks"] assert "SubagentStop" not in context["hooks"] + + +class TestGeneratorTemplateOutput: + """Tests for generated skill file output.""" + + @pytest.fixture + def full_generator(self) -> SkillGenerator: + """Create generator using actual package templates.""" + # Use the actual templates directory from the package + templates_dir = Path(__file__).parent.parent.parent / "src" / "deepwork" / "templates" + return SkillGenerator(templates_dir) + + @pytest.fixture + def job_with_quality_criteria(self, tmp_path: Path) -> JobDefinition: + """Create job with quality_criteria for testing template output.""" + job_dir = tmp_path / "test_job" + job_dir.mkdir() + steps_dir = job_dir / "steps" + steps_dir.mkdir() + (steps_dir / "step1.md").write_text("# Step 1 Instructions\n\nDo the thing.") + + return JobDefinition( + name="test_job", + version="1.0.0", + summary="Test job", + description="A test job", + steps=[ + Step( + id="step1", + name="Step 1", + description="First step", + instructions_file="steps/step1.md", + outputs=[OutputSpec(file="output.md")], + quality_criteria=["Criterion 1 is met", "Criterion 2 is verified"], + ), + ], + job_dir=job_dir, + ) + + @pytest.fixture + def job_with_stop_hooks(self, tmp_path: Path) -> JobDefinition: + """Create job with custom stop hooks for testing template output.""" + job_dir = tmp_path / "test_job" + job_dir.mkdir() + steps_dir = job_dir / "steps" + steps_dir.mkdir() + (steps_dir / "step1.md").write_text("# Step 1 Instructions") + + return JobDefinition( + name="test_job", + version="1.0.0", + summary="Test job", + description="A test job", + steps=[ + Step( + id="step1", + name="Step 1", + description="First step", + instructions_file="steps/step1.md", + outputs=[OutputSpec(file="output.md")], + hooks={ + "after_agent": [HookAction(prompt="Custom validation prompt")], + }, + ), + ], + job_dir=job_dir, + ) + + def test_template_generates_both_stop_and_subagent_stop_for_quality_criteria( + self, full_generator: SkillGenerator, job_with_quality_criteria: JobDefinition, tmp_path: Path + ) -> None: + """Test that template generates both Stop and SubagentStop hooks for quality_criteria.""" + adapter = ClaudeAdapter() + skill_path = full_generator.generate_step_skill( + job_with_quality_criteria, + job_with_quality_criteria.steps[0], + adapter, + tmp_path, + ) + + content = skill_path.read_text() + + # Both Stop and SubagentStop should be in the generated file + assert "Stop:" in content, "Stop hook should be in generated skill" + assert "SubagentStop:" in content, "SubagentStop hook should be in generated skill" + + # Both should contain the quality criteria prompt + lines = content.split("\n") + stop_found = False + subagent_stop_found = False + for i, line in enumerate(lines): + if line.strip().startswith("Stop:"): + stop_found = True + if line.strip().startswith("SubagentStop:"): + subagent_stop_found = True + + assert stop_found and subagent_stop_found, ( + f"Both Stop and SubagentStop should be generated. Content:\n{content[:1000]}" + ) + + def test_template_generates_both_stop_and_subagent_stop_for_custom_hooks( + self, full_generator: SkillGenerator, job_with_stop_hooks: JobDefinition, tmp_path: Path + ) -> None: + """Test that template generates both Stop and SubagentStop for custom stop hooks.""" + adapter = ClaudeAdapter() + skill_path = full_generator.generate_step_skill( + job_with_stop_hooks, + job_with_stop_hooks.steps[0], + adapter, + tmp_path, + ) + + content = skill_path.read_text() + + # Both Stop and SubagentStop should be in the generated file + assert "Stop:" in content, "Stop hook should be in generated skill" + assert "SubagentStop:" in content, "SubagentStop hook should be in generated skill" + + # Both should contain the custom prompt + assert "Custom validation prompt" in content, "Custom prompt should be in generated skill" From dcea5e460421eee2e7467021dfcf224bf3aaf251 Mon Sep 17 00:00:00 2001 From: Noah Horton Date: Thu, 22 Jan 2026 15:02:27 -0700 Subject: [PATCH 12/13] Consolidate changelog into 0.4.0 release Consolidated all changes from 0.4.0-0.5.2 into a single 0.4.0 release: Added: - Doc specs feature for document quality criteria - prompt_runtime setting for rules (send_to_stopping_agent vs claude) - Claude headless mode execution for automated rule remediation - deepwork rules clear_queue CLI command - Code review stage in commit job - Session start hook for version checking - Manual tests job Changed: - BREAKING: Renamed document_type to doc_spec - Step.outputs now uses OutputSpec dataclass - Updated deepwork_jobs to v0.6.0 Fixed: - Infinite loop bug in rules system with promise tags - COMMAND rules promise handling queue status - Quality criteria validation logic - compare_to: prompt mode file detection Co-Authored-By: Claude Opus 4.5 --- CHANGELOG.md | 99 +++++++++++++++++++------------------------------- pyproject.toml | 2 +- uv.lock | 2 +- 3 files changed, 40 insertions(+), 63 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index cedef235..654b696d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,56 +5,10 @@ All notable changes to DeepWork will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [0.5.2] - 2026-01-22 - -### Fixed -- Fixed COMMAND rules promise handling to properly update queue status - - When an agent provides a promise tag for a FAILED command rule, the queue entry is now correctly updated to SKIPPED status - - Previously, FAILED queue entries remained in FAILED state even after being acknowledged via promise - - This ensures the rules queue accurately reflects rule state throughout the workflow - -## [0.5.1] - 2026-01-22 - -### Fixed -- Fixed quality criteria validation logic in skill template (#111) - - Changed promise condition from AND to OR: promise OR all criteria met now passes - - Changed failure condition from OR to AND: requires both criteria NOT met AND promise missing to fail - - This corrects the logic so the promise mechanism properly serves as a bypass for quality criteria - -## [0.5.0] - 2026-01-20 - -### Changed -- **BREAKING**: Renamed `document_type` to `doc_spec` throughout the codebase - - Job.yml field: `document_type` → `doc_spec` (e.g., `outputs: [{file: "report.md", doc_spec: ".deepwork/doc_specs/report.md"}]`) - - Class: `DocumentTypeDefinition` → `DocSpec` (backward compat alias provided) - - Methods: `has_document_type()` → `has_doc_spec()`, `validate_document_type_references()` → `validate_doc_spec_references()` - - Template variables: `has_document_type` → `has_doc_spec`, `document_type` → `doc_spec` - - Internal: `_load_document_type()` → `_load_doc_spec()`, `_doc_type_cache` → `_doc_spec_cache` +## [0.4.0] - 2026-01-22 ### Added -- `prompt_runtime` setting for rules to control how prompt-type actions are executed - - `send_to_stopping_agent` (default): Returns prompt to the agent that triggered the rule - - `claude`: Invokes Claude Code in headless mode to handle the rule independently -- Claude headless mode execution for automated rule remediation - - Rules with `prompt_runtime: claude` spawn a separate Claude process - - Claude performs required actions and returns structured `block`/`allow` decision - - Useful for automated tasks like documentation updates without blocking the main agent -- Comprehensive tests for generator doc spec integration (9 new tests) - - `test_load_doc_spec_returns_parsed_spec` - Verifies doc spec loading - - `test_load_doc_spec_caches_result` - Verifies caching behavior - - `test_load_doc_spec_returns_none_for_missing_file` - Graceful handling of missing files - - `test_generate_step_skill_with_doc_spec` - End-to-end skill generation with doc spec - - `test_build_step_context_includes_doc_spec_info` - Context building verification - -### Migration Guide -- Update job.yml files: Change `document_type:` to `doc_spec:` in output definitions -- Update any code importing `DocumentTypeDefinition`: Use `DocSpec` instead (alias still works) -- Run `deepwork install` to regenerate skills with updated terminology - -## [0.4.0] - 2026-01-20 - -### Added -- Doc specs (document specifications) as a first-class feature for formalizing document quality criteria +- **Doc specs** (document specifications) as a first-class feature for formalizing document quality criteria - New `src/deepwork/schemas/doc_spec_schema.py` with JSON schema validation - New `src/deepwork/core/doc_spec_parser.py` with parser for frontmatter markdown doc spec files - Doc spec files stored in `.deepwork/doc_specs/` directory with quality criteria and example documents @@ -62,23 +16,47 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Extended job.yml output schema to support doc spec references - Outputs can now be strings (backward compatible) or objects with `file` and optional `doc_spec` fields - Example: `outputs: [{file: "report.md", doc_spec: ".deepwork/doc_specs/monthly_report.md"}]` - - The `doc_spec` uses the full path to the doc spec file, making references self-documenting -- Doc spec-aware skill generation - - Step skills now include doc spec quality criteria, target audience, and example documents - - Both Claude and Gemini templates updated for doc spec rendering -- Document detection workflow in `deepwork_jobs.define` - - Steps 1.5, 1.6, 1.7 guide users through creating doc specs for document-oriented jobs - - Pattern indicators: "report", "summary", "create", "monthly", "for stakeholders" -- Doc spec improvement workflow in `deepwork_jobs.learn` - - Steps 3.5, 4.5 capture doc spec-related learnings and update doc spec files -- New `OutputSpec` dataclass in parser for structured output handling -- Comprehensive doc spec documentation in `doc/doc-specs.md` -- New test fixtures for doc spec validation and parsing +- Doc spec-aware skill generation with quality criteria, target audience, and example documents +- **`prompt_runtime` setting** for rules to control how prompt-type actions are executed + - `send_to_stopping_agent` (default): Returns prompt to the agent that triggered the rule + - `claude`: Invokes Claude Code in headless mode to handle the rule independently +- Claude headless mode execution for automated rule remediation + - Rules with `prompt_runtime: claude` spawn a separate Claude process + - Claude performs required actions and returns structured `block`/`allow` decision + - Useful for automated tasks like documentation updates without blocking the main agent +- **`deepwork rules clear_queue` CLI command** for managing the rules queue (#117) + - Clears all entries from the rules queue to reset state +- Code review stage added to the `commit` standard job (#99) + - New `commit.review` step runs before testing to catch issues early +- Session start hook for version checking (#106) +- Manual tests job for validating hook/rule behavior (#102) ### Changed +- **BREAKING**: Renamed `document_type` to `doc_spec` throughout the codebase + - Job.yml field: `document_type` → `doc_spec` + - Class: `DocumentTypeDefinition` → `DocSpec` (backward compat alias provided) + - Methods: `has_document_type()` → `has_doc_spec()`, `validate_document_type_references()` → `validate_doc_spec_references()` - `Step.outputs` changed from `list[str]` to `list[OutputSpec]` for richer output metadata - `SkillGenerator.generate_all_skills()` now accepts `project_root` parameter for doc spec loading - Updated `deepwork_jobs` to v0.6.0 with doc spec-related quality criteria +- Skill template documentation now uses generic "agent" terminology (#115) + +### Fixed +- Fixed infinite loop bug in rules system when promise tags weren't recognized (#96) + - Rules now properly detect and honor promise acknowledgments +- Fixed COMMAND rules promise handling to properly update queue status (#120) + - FAILED queue entries now correctly update to SKIPPED when acknowledged via promise +- Fixed quality criteria validation logic in skill template (#113) + - Promise OR all criteria met now passes (was incorrectly AND) + - Requires both criteria NOT met AND promise missing to fail +- Fixed `compare_to: prompt` mode not detecting committed files during agent response (#95) + - Rules now search prompts for directory references +- Added timeout to deepwork install hook (#101) + +### Migration Guide +- Update job.yml files: Change `document_type:` to `doc_spec:` in output definitions +- Update any code importing `DocumentTypeDefinition`: Use `DocSpec` instead (alias still works) +- Run `deepwork install` to regenerate skills with updated terminology ## [0.3.1] - 2026-01-20 @@ -187,7 +165,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 Initial version. -[0.5.0]: https://github.com/anthropics/deepwork/releases/tag/0.5.0 [0.4.0]: https://github.com/anthropics/deepwork/releases/tag/0.4.0 [0.3.1]: https://github.com/anthropics/deepwork/releases/tag/0.3.1 [0.3.0]: https://github.com/anthropics/deepwork/releases/tag/0.3.0 diff --git a/pyproject.toml b/pyproject.toml index f8500821..78db1439 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "deepwork" -version = "0.5.2" +version = "0.4.0" description = "Framework for enabling AI agents to perform complex, multi-step work tasks" readme = "README.md" requires-python = ">=3.11" diff --git a/uv.lock b/uv.lock index f28bd05f..ec35b2f3 100644 --- a/uv.lock +++ b/uv.lock @@ -126,7 +126,7 @@ toml = [ [[package]] name = "deepwork" -version = "0.5.2" +version = "0.4.0" source = { editable = "." } dependencies = [ { name = "click" }, From be3ff71de742dfb41a43c72b566048df315fde1a Mon Sep 17 00:00:00 2001 From: Noah Horton Date: Thu, 22 Jan 2026 18:28:35 -0700 Subject: [PATCH 13/13] Add investigation notes for Claude subprocess hanging issue Documents the findings from debugging why `prompt_runtime: claude` hangs when running inside Claude Code. Key findings: - Direct bash execution works, Python subprocess doesn't - The hang appears to be at the process management level - Nested Claude invocation seems intentionally limited - Feature should work from external automation (CI, cron) Co-Authored-By: Claude Opus 4.5 --- .../claude_subprocess_investigation.md | 164 ++++++++++++++++++ 1 file changed, 164 insertions(+) create mode 100644 doc/debugging_history/claude_subprocess_investigation.md diff --git a/doc/debugging_history/claude_subprocess_investigation.md b/doc/debugging_history/claude_subprocess_investigation.md new file mode 100644 index 00000000..d3737e63 --- /dev/null +++ b/doc/debugging_history/claude_subprocess_investigation.md @@ -0,0 +1,164 @@ +# Claude Subprocess Investigation + +**Date**: 2026-01-22 +**Branch**: `claude/add-prompt-runtime-setting-gPJDA` +**Issue**: Running Claude as a subprocess from within Claude Code hangs indefinitely + +## Problem Statement + +The `prompt_runtime: claude` feature in DeepWork rules is designed to invoke Claude Code in headless mode to autonomously evaluate rules. When a rule with this setting triggers, the hook should: + +1. Spawn `claude --print` as a subprocess +2. Send the rule prompt to it +3. Parse the response for allow/block decision +4. Return the result + +However, when running inside a Claude Code session, this subprocess invocation hangs indefinitely. + +## Environment + +- Claude Code version: 2.1.15 +- Platform: macOS (Darwin 25.2.0) +- Python: 3.11.14 (nix-managed) +- Environment variables: `CLAUDECODE=1`, `CLAUDE_CODE_ENTRYPOINT=cli` + +## What Works + +### Direct Bash Execution (via Claude's Bash tool) +```bash +echo "Say TEST" | claude --print --output-format json 2>&1 | cat +# Returns JSON response immediately +``` + +### Piping through head -1 +```bash +echo "Say TEST" | claude --print --output-format json 2>&1 | head -1 +# Returns JSON and terminates cleanly +``` + +### Python heredoc script (run directly in bash) +```bash +python3 << 'EOF' +import subprocess +# ... subprocess code ... +EOF +# Works correctly +``` + +## What Doesn't Work + +### Python subprocess.run from within Claude +```python +# This hangs indefinitely when run from Python inside Claude Code +result = subprocess.run( + ["claude", "--print", "prompt"], + capture_output=True, + timeout=30, +) +``` + +### Shell=True with pipes +```python +# Also hangs +subprocess.run( + 'echo "prompt" | claude --print', + shell=True, + capture_output=True, +) +``` + +### Popen with various options +```python +# All of these hang: +# - start_new_session=True +# - close_fds=True +# - stdin=subprocess.DEVNULL +# - Writing to temp file instead of capture_output +``` + +### Environment variable clearing +```bash +# Still hangs +CLAUDECODE= timeout 15 bash -c 'echo "test" | claude --print' +env -i PATH="$PATH" HOME="$HOME" timeout 15 bash -c 'echo "test" | claude --print' +``` + +## Key Observations + +1. **Direct bash works, Python subprocess doesn't**: The exact same command that works when run via Claude's Bash tool hangs when run via Python's subprocess module. + +2. **Piping to `head -1` helps in some cases**: The command `| head -1` causes Claude to terminate after outputting the JSON line, but this doesn't help when the subprocess itself never starts producing output. + +3. **The hang occurs at the subprocess level**: Python's subprocess.run times out waiting for the process, suggesting Claude itself is blocked on something. + +4. **`--output-format json` is required**: Without this, Claude hangs even longer (possibly waiting for terminal interaction). + +5. **Hooks configuration doesn't prevent the hang**: Using `--settings '{"hooks": {}}'` to disable hooks in the subprocess doesn't help. + +## Research Findings + +### Related GitHub Issues +- [#1481 - Background Process Hangs](https://github.com/anthropics/claude-code/issues/1481): Claude Code waits for child processes even when backgrounded +- [#13598 - /dev/tty hang](https://github.com/anthropics/claude-code/issues/13598): Claude can hang when accessing terminal devices +- Subagent documentation states: "Subagents cannot spawn other subagents" - suggesting nested invocation is intentionally limited + +### Root Cause Hypothesis +Claude Code appears to manage its process tree in a way that blocks nested Claude invocations. When running as a subprocess of another Claude instance (detected via `CLAUDECODE=1` environment variable or process hierarchy), the child Claude may be waiting for resources held by the parent. + +## Attempted Solutions + +### 1. Use `--output-format json` + `| head -1` +**Result**: Works from bash, still hangs from Python subprocess + +### 2. Write to temp file instead of capturing output +**Result**: Still hangs - the file remains empty + +### 3. Clear CLAUDECODE environment variable +**Result**: Still hangs - the detection/blocking isn't based on this variable alone + +### 4. Use `start_new_session=True` for process isolation +**Result**: Still hangs + +### 5. Fall back to returning prompt to agent when inside Claude +**Result**: Works but defeats the purpose of `prompt_runtime: claude` + +### 6. Change hook command to use `uv run python` +**Result**: Still hangs - the issue is the nested Claude invocation, not Python version + +## Recommended Next Steps + +1. **Test immediate "allow" return**: Modify the code to immediately return "allow" for claude runtime rules to verify the rest of the flow works. + +2. **Create bash wrapper script**: Instead of invoking Claude from Python, create a standalone bash script that the hook can call. This might bypass the subprocess blocking. + +3. **Investigate Claude's process management**: Look at Claude Code's source or documentation for how it handles child processes and whether there's an API for nested invocation. + +4. **External execution approach**: Consider having the hook queue the rule evaluation and have an external process (outside Claude) handle the actual Claude invocation. + +5. **Test from CI/cron**: Verify that `prompt_runtime: claude` works correctly when invoked from outside a Claude session (e.g., from GitHub Actions or a cron job). + +## Code Changes Made + +The following changes were made to `src/deepwork/hooks/rules_check.py` during this investigation: + +1. Added `is_inside_claude_session()` function to detect nested Claude context +2. Added `--output-format json` to get structured output +3. Added `| head -1` pipe to force clean termination +4. Added temp file approach for prompt/output handling +5. Added extensive comments explaining the sensitivity of the subprocess code + +## Files Modified (not yet committed) + +- `src/deepwork/hooks/rules_check.py` - Multiple changes to invoke_claude_headless() +- `.claude/settings.json` - Changed hook command to use `uv run python` +- `.deepwork/jobs/manual_tests/job.yml` - Added functionality_tests step +- `.deepwork/jobs/manual_tests/steps/functionality_tests.md` - Test instructions + +## Conclusion + +Running Claude as a subprocess from within a Claude Code session appears to be blocked at a fundamental level. The solution likely requires either: +- An official API for nested Claude invocation +- Running the subprocess invocation from outside the Claude process tree +- Accepting the limitation and falling back to returning prompts to the agent + +The `prompt_runtime: claude` feature should work correctly when invoked from external automation (CI, cron, etc.) but cannot work when running inside Claude Code itself.