haja-k haja-k

⚡ Haja — AI Retrieval & Platform Engineer

Transforming raw institutional knowledge into accurate, scalable conversational and search experiences. I combine product ownership (citizen & civil servant chat platforms) with deep engineering across retrieval, performance, security, and infrastructure.

🎯 Value Proposition

High-precision hybrid retrieval (graph + vector + structured filters) • Production chatbot ownership • Performance + security baked into lifecycle • Resilient active-active infrastructure.

🧠 Core Technologies

🚀 Selected Public Projects

Area	Repo(s)	Summary
Graph & Retrieval	neo4j-document-pipeline	Graph ingestion + retrieval API for LLM workflows
Vector Benchmarks	tidb-vector-llm-testbed	Hybrid scoring, indexing & relevance experiments
Embedding Pipelines	mysql-to-pgvector-embeddings	MySQL → embeddings → pgVector semantic layer
FAQ Base	faq-retrieval-system	Structured query layer powering GPT-style retrieval
Performance	playwright-dayang • k6-for-custom-dify	Chatbot UX & API load test suites
Security Automation	zap-security-api	OWASP ZAP scan API (baseline/quick/full)
Experimentation	playwright-study • besu-ibft2.0	Testing paradigms & consensus exploration

Each project illustrates a stage of the lifecycle: ingestion → enrichment → retrieval → validation.

🛠 Behind-the-Scenes (Non-Public)

Dayang chatbot (citizen portal): Product ownership, retrieval tuning, performance modeling.
Civil servant assistant: Document-grounded Q&A with authoritative source controls.
Load strategy: Concurrency thresholds, ramp profiles (k6, Locust).
Active-active infra: Alibaba Cloud ECS, Nginx routing, SSL/TLS hardening.
Security automation: OWASP ZAP scans exposed via API for CI/CD or ad hoc use.

🧩 Differentiators

Strength	Why It Matters
Hybrid Retrieval Engineering	Precision lift vs pure vector recall
Product + Engineering Fusion	Faster iteration, fewer handoff losses
Embedded Performance & Security	Prevents late-stage surprises
Government Domain Exposure	Designs for trust & high factual accuracy
Benchmark-Driven Choices	Tech decisions backed by measurable outcomes

📈 Illustrative Impact (Insert Real Numbers When Allowed)

Area	Example Outcome
Retrieval Accuracy	~30–40% fewer irrelevant answers (hybrid approach)
Performance Readiness	Concurrency limits defined pre-launch (no collapse)
Security Cycle Time	Hours → minutes via automated ZAP API
Resilience	Active-active reduces failover disruption window

🧪 Current Explorations

Multi-pass hybrid ranking (graph traversal + semantic rerank)
Domain-adaptive embedding strategies
Unified Playwright + k6 harness (UX + load synergy)
Retrieval explainability overlays & confidence shaping

🧠 Architecture (High-Level)

Show Retrieval Flow Diagram

flowchart LR
  A[Data Sources<br/>MySQL • Docs • FAQs] --> B[Normalize & Clean]
  B --> C[Embeddings Generation]
  B --> D[Graph Modeling (Neo4j)]
  C --> E[(Vector Store<br/>pgVector / TiDB)]
  D --> F[Graph Relations]
  E --> G[Hybrid Retrieval Layer]
  F --> G
  G --> H[LLM Orchestrator]
  H --> I[Post-Processing & Ranking]
  I --> J[Chatbot / API Consumers]
  subgraph QUALITY_GATES[Quality Gates]
    K[Performance Tests]
    L[Security Scans (ZAP)]
  end
  J --> K
  J --> L

If Mermaid doesn't render, click “Raw” or use a Mermaid viewer.

🤝 Role Alignment

Ideal matches: AI Infrastructure Engineer • Retrieval Engineer • Platform Engineer (LLM enablement) • Technical Product Owner (Knowledge Systems).

Traits brought: architecture clarity • benchmark discipline • product empathy • automation-first mindset.

📬 Contact

LinkedIn: https://www.linkedin.com/in/nurhajjariahk/
Email: nurhajjariahk@gmail.com

📊 Signals

🔄 Philosophy

Build systems that are observable, evolvable, and grounded in measurable user impact — not novelty.

Thanks for visiting — let’s build something meaningful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly