Skip to content
View haja-k's full-sized avatar
🏠
Working from home
🏠
Working from home
  • Malaysia
  • 02:40 (UTC +08:00)

Block or report haja-k

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
haja-k/README.md

⚑ Haja β€” AI Retrieval & Platform Engineer

Transforming raw institutional knowledge into accurate, scalable conversational and search experiences. I combine product ownership (citizen & civil servant chat platforms) with deep engineering across retrieval, performance, security, and infrastructure.


🎯 Value Proposition

High-precision hybrid retrieval (graph + vector + structured filters) β€’ Production chatbot ownership β€’ Performance + security baked into lifecycle β€’ Resilient active-active infrastructure.


🧠 Core Technologies


πŸš€ Selected Public Projects

Area Repo(s) Summary
Graph & Retrieval neo4j-document-pipeline Graph ingestion + retrieval API for LLM workflows
Vector Benchmarks tidb-vector-llm-testbed Hybrid scoring, indexing & relevance experiments
Embedding Pipelines mysql-to-pgvector-embeddings MySQL β†’ embeddings β†’ pgVector semantic layer
FAQ Base faq-retrieval-system Structured query layer powering GPT-style retrieval
Performance playwright-dayang β€’ k6-for-custom-dify Chatbot UX & API load test suites
Security Automation zap-security-api OWASP ZAP scan API (baseline/quick/full)
Experimentation playwright-study β€’ besu-ibft2.0 Testing paradigms & consensus exploration

Each project illustrates a stage of the lifecycle: ingestion β†’ enrichment β†’ retrieval β†’ validation.


πŸ›  Behind-the-Scenes (Non-Public)

  • Dayang chatbot (citizen portal): Product ownership, retrieval tuning, performance modeling.
  • Civil servant assistant: Document-grounded Q&A with authoritative source controls.
  • Load strategy: Concurrency thresholds, ramp profiles (k6, Locust).
  • Active-active infra: Alibaba Cloud ECS, Nginx routing, SSL/TLS hardening.
  • Security automation: OWASP ZAP scans exposed via API for CI/CD or ad hoc use.

🧩 Differentiators

Strength Why It Matters
Hybrid Retrieval Engineering Precision lift vs pure vector recall
Product + Engineering Fusion Faster iteration, fewer handoff losses
Embedded Performance & Security Prevents late-stage surprises
Government Domain Exposure Designs for trust & high factual accuracy
Benchmark-Driven Choices Tech decisions backed by measurable outcomes

πŸ“ˆ Illustrative Impact (Insert Real Numbers When Allowed)

Area Example Outcome
Retrieval Accuracy ~30–40% fewer irrelevant answers (hybrid approach)
Performance Readiness Concurrency limits defined pre-launch (no collapse)
Security Cycle Time Hours β†’ minutes via automated ZAP API
Resilience Active-active reduces failover disruption window

πŸ§ͺ Current Explorations

  • Multi-pass hybrid ranking (graph traversal + semantic rerank)
  • Domain-adaptive embedding strategies
  • Unified Playwright + k6 harness (UX + load synergy)
  • Retrieval explainability overlays & confidence shaping

🧠 Architecture (High-Level)

Show Retrieval Flow Diagram
flowchart LR
  A[Data Sources<br/>MySQL β€’ Docs β€’ FAQs] --> B[Normalize & Clean]
  B --> C[Embeddings Generation]
  B --> D[Graph Modeling (Neo4j)]
  C --> E[(Vector Store<br/>pgVector / TiDB)]
  D --> F[Graph Relations]
  E --> G[Hybrid Retrieval Layer]
  F --> G
  G --> H[LLM Orchestrator]
  H --> I[Post-Processing & Ranking]
  I --> J[Chatbot / API Consumers]
  subgraph QUALITY_GATES[Quality Gates]
    K[Performance Tests]
    L[Security Scans (ZAP)]
  end
  J --> K
  J --> L
Loading

If Mermaid doesn't render, click β€œRaw” or use a Mermaid viewer.


🀝 Role Alignment

Ideal matches: AI Infrastructure Engineer β€’ Retrieval Engineer β€’ Platform Engineer (LLM enablement) β€’ Technical Product Owner (Knowledge Systems).

Traits brought: architecture clarity β€’ benchmark discipline β€’ product empathy β€’ automation-first mindset.


πŸ“¬ Contact


πŸ“Š Signals


πŸ”„ Philosophy

Build systems that are observable, evolvable, and grounded in measurable user impact β€” not novelty.

Thanks for visiting β€” let’s build something meaningful.

Pinned Loading

  1. mysql-to-pgvector-embeddings mysql-to-pgvector-embeddings Public

    vectorizing data from mysql database to vector so it can be used by LLM in Dify workflow orchestration

    Python 2

  2. tidb-vector-llm-testbed tidb-vector-llm-testbed Public

    Experimental framework for evaluating TiDB’s vector search capabilities with LangChain-based LLM retrieval workflows. Includes setup scripts, indexing pipelines, and retrieval benchmarks to test hy…

    Python

  3. neo4j-document-pipeline neo4j-document-pipeline Public

    Using Neo4j for knowledge graph. Complete with API for end-to-end ingestion, indexing and retrieval pipeline ready for workflow integration.

    Python

  4. besu-ibft2.0 besu-ibft2.0 Public

    hyperledger besu with ibft 2.0 experiment

    Shell 1

  5. img-classification-api img-classification-api Public

    image classification platform for image model self training

    JavaScript

  6. zap-security-api zap-security-api Public

    Flask + Docker service for running OWASP ZAP security scans on demand via a simple REST API. Designed for centralized, repeatable application security testing in CI/CD or ad-hoc use. Supports multi…

    Python