HoloInsight is a cloud-native observability platform with a special focus on real-time log analysis and AI integration.
-
Updated
Jul 10, 2025 - Java
HoloInsight is a cloud-native observability platform with a special focus on real-time log analysis and AI integration.
InsightSolver: Colab notebooks for exploring and solving operational issues using deep learning, machine learning, and related models.
An autonomous SRE agent that monitors cloud logs across multiple platforms, leveraging AI models from various providers to detect anomalies, perform root cause analysis, and automate remediation by creating GitHub Pull Requests.
ARF is an agentic reliability intelligence platform that separates decision intelligence (OSS) from governed execution (Enterprise), enabling autonomous operations with deterministic safety guarantees.
Open source code for AIOpsServing
tpu-doc is a zero-dependency diagnostic binary for Google Cloud TPU environments that instantly validates hardware health, discovers software stack configurations, and provides AI-powered log analysis to eliminate expensive debugging downtime.
AI that ships your code. Deploy to any cloud with plain English.
🤖 Build and deploy scalable Multi-AI Agent systems with LangGraph and Groq LLMs to enhance intelligence across enterprise applications.
🚀 Enhance Google Cloud operations with the Gemini SRE Agent, automating log monitoring and incident response for smarter site reliability.
It is an AI-powered DevOps tool that analyzes Linux server logs to detect anomalies and predict failures. It integrates ML models, automated fixes via Ansible, containerization with Docker, and orchestration using Kubernetes—providing a full-stack solution for predictive maintenance.
AI-powered alert automation for n8n — unify alerts from monitoring systems, analyze via LLM, and auto-notify DevOps teams on Telegram.
Advanced, end-to-end, enterprise-grade agentic AI pipeline that automates competitor ad intelligence, performs multimodal creative strategy extraction, enables brand-safe adaptation, and generates AI video ads using LLM reasoning, multimodal analysis, and deterministic workflow orchestration with full auditability.
ReliaKit TL-15 is an open-source, planet-grade resilience framework for distributed infrastructure. It integrates automated DDoS protection, geo-aware routing, chaos engineering, and symbolic AI hooks to achieve fault tolerance beyond traditional benchmarks.
ModelSpec is an open, declarative specification for describing how AI models especially LLMs are deployed, served, and operated in production. It captures execution, serving, and orchestration intent to enable validation, reasoning, and automation across modern AI infrastructure.
Production-ready MLOps platform for monitoring and evaluating LLM response quality with automated alerts and real-time analytics
Advanced, modular, and enterprise-grade AI automation control plane combining Custom GPT Actions, n8n orchestration, Google Workspace workflows, and serverless OCR. Implements schema-driven, agent-based ingest, clean, analyze, and report pipelines with data normalization, conversion, audit logging, cron-based scheduling, & enterprise observability.
The collection of charms used to integrate Juju deployed Slurm clusters with Vantage.
🚦 Streamline alert management by using AI to triage infrastructure alerts, ensuring operators focus only on critical issues with Uptime Kuma integration.
An AI Agent IaC tool that aims to make developing and deploying AI Agents easier.
Add a description, image, and links to the ai-ops topic page so that developers can more easily learn about it.
To associate your repository with the ai-ops topic, visit your repo's landing page and select "manage topics."