Control Room - Operations & Analytics Dashboard

Control Room is the operations and analytics dashboard for managing LanOnasis services. It provides real-time monitoring, performance analytics, incident management, team collaboration, and administrative controls for platform operations and compliance.

Overview

Control Room acts as the nerve center for platform operations:

LanOnasis Services (All Instances)
  ↓
Real-Time Metrics Collection
  ↓
Control Room Dashboard
  ├─ Real-Time Metrics
  ├─ Performance Analytics
  ├─ Incident Management
  ├─ Team Collaboration
  ├─ Compliance Monitoring
  └─ Administrative Controls
  ↓
Alerts & Notifications

Primary Functions

Real-Time Monitoring: Service health, latency, error rates
Analytics & Reporting: Performance trends, usage patterns
Incident Management: Create, track, and resolve incidents
Team Collaboration: Shared dashboards, alerts, communication
Compliance Monitoring: Audit trail, compliance reports
Administrative Controls: Service configuration, user management

Architecture

Service Structure

control-room/
├── src/
│   ├── pages/
│   │   ├── dashboard/          # Main analytics dashboard
│   │   ├── incidents/          # Incident management
│   │   ├── services/           # Service monitoring
│   │   ├── metrics/            # Metrics explorer
│   │   ├── team/               # Team collaboration
│   │   ├── compliance/         # Compliance reports
│   │   └── admin/              # Administrative settings
│   ├── components/
│   │   ├── metric-card/        # KPI display
│   │   ├── time-series-chart/  # Performance graphs
│   │   ├── incident-timeline/  # Incident history
│   │   ├── alert-feed/         # Real-time alerts
│   │   └── team-chat/          # Team messaging
│   ├── hooks/
│   │   ├── useMetrics/         # Fetch metrics
│   │   ├── useIncidents/       # Fetch incidents
│   │   └── useWebSocket/       # Real-time updates
│   ├── api/
│   │   ├── metrics.ts          # Metrics API
│   │   ├── incidents.ts        # Incident API
│   │   └── admin.ts            # Admin API
│   └── index.tsx
├── public/
├── README.md
└── package.json

Technology Stack

Frontend: Next.js 15 with React 19
Real-Time: WebSocket for live updates
Charts: Recharts or Chart.js for visualizations
UI Components: brand-kit integration
Authentication: OAuth 2.0 with role-based access

Key Dashboards

1. Main Analytics Dashboard

Overview of all platform metrics:

Control Room Dashboard
═════════════════════════════════════════════════════════

[Time Range Selector: Last 24 hours ▼] [Refresh: Real-time]

┌─ Key Metrics ────────────────────────────────────────┐
│ Requests/sec: 125 (↑ 12% from avg)                   │
│ Avg Latency: 85ms (✓ Within SLA)                     │
│ Error Rate: 0.02% (✓ Under 0.5% threshold)           │
│ Uptime: 99.98% (✓ Above 99.95% target)               │
└──────────────────────────────────────────────────────┘

┌─ Request Volume (Last 24h) ──────────────────────────┐
│                                                       │
│     125 req/s │                    ╱╲                │
│     100 req/s │           ╱╲      ╱  ╲               │
│      75 req/s │╱╲    ╱╲  ╱  ╲    ╱    ╲              │
│      50 req/s │  ╲  ╱  ╲╱    ╲  ╱      ╲            │
│               └────────────────────────────          │
│               0h      6h     12h    18h    24h       │
└──────────────────────────────────────────────────────┘

┌─ Service Health ─────────────────────────────────────┐
│ ✅ Memory Suite        │ ✅ v-secure                 │
│    Latency: 45ms       │    Latency: 22ms            │
│    Error Rate: 0.01%   │    Error Rate: 0.00%        │
│                        │                              │
│ ✅ Onasis-CORE         │ ⚠️  MCP Core (Degraded)     │
│    Latency: 75ms       │    Latency: 450ms           │
│    Error Rate: 0.05%   │    Error Rate: 0.10%        │
└──────────────────────────────────────────────────────┘

┌─ Recent Incidents ───────────────────────────────────┐
│ [Jan 15 10:30] MCP Core latency spike (30 min)       │
│ [Jan 14 14:22] Memory memcache hit (resolved)        │
│ [Jan 12 09:15] v-secure certificate renewal (1 sec) │
└──────────────────────────────────────────────────────┘

┌─ Alerts ─────────────────────────────────────────────┐
│ 🔔 Error rate exceeding 0.1% for Onasis-CORE         │
│ 🔔 Memory Suite P99 latency above 500ms              │
│ ℹ️  Scheduled maintenance: Jan 20 02:00 UTC          │
└──────────────────────────────────────────────────────┘

2. Service Monitoring

Deep dive into individual service performance:

Service: Memory Suite (v1.3.2)
═════════════════════════════════════════════════════════

Status: ✅ Operational
Last Updated: 2 seconds ago

┌─ Performance Metrics ──────────────────────────────┐
│ Requests/sec:     85 (Normal)                      │
│ Avg Latency:      50ms (Good)                      │
│ P95 Latency:      120ms (Good)                     │
│ P99 Latency:      250ms (Good)                     │
│ Error Rate:       0.01% (Excellent)                │
│ Cache Hit Rate:   92% (Excellent)                  │
│ Connection Pool:  245/250 active                   │
└────────────────────────────────────────────────────┘

┌─ Instances ───────────────────────────────────────┐
│ prod-mem-1 (us-east-1)      ✅ Healthy            │
│ prod-mem-2 (us-west-2)      ✅ Healthy            │
│ prod-mem-3 (eu-central-1)   ✅ Healthy            │
│ prod-mem-4 (ap-northeast-1) ✅ Healthy            │
└────────────────────────────────────────────────────┘

[View Logs] [View Metrics] [Scale] [Restart]

3. Incident Management

Create and track incidents:

Incident: [INC-2026-0015] MCP Core Latency Spike
═════════════════════════════════════════════════════════

Status: 🟡 Investigating (Started 30 min ago)
Severity: 🔴 High
Service: MCP Core
Owner: @ops-team

Timeline:
├─ 10:30 UTC - Latency spike detected (450ms avg)
├─ 10:31 UTC - Alert sent to on-call engineer
├─ 10:32 UTC - Investigation started
│  ├─ Database query performance normal
│  ├─ Memory usage 78% (within limits)
│  └─ Network latency normal
├─ 10:35 UTC - Scaled up instances from 3 to 5
├─ 10:45 UTC - Latency returning to normal (85ms avg)
└─ [Still investigating root cause...]

Actions Taken:
- Scaled instances 3 → 5
- Cleared stale cache entries
- Enabled debug logging

Impact:
- Affected users: ~1,200 requests with >500ms latency
- Error rate: 0.05% (elevated from 0.01%)
- Estimated impact: $45 in SLA credits owed

[Update] [Resolve] [Escalate] [Share]

4. Team Collaboration

Shared dashboards and real-time communication:

Team Channel: #operations
═════════════════════════════════════════════════════════

@alice: Just detected latency spike on MCP Core
        Scaled from 3 to 5 instances

@bob: Good catch! Checking database logs now

@alice: P99 latency still at 200ms, not dropping
        Let me check the cache

@bob: Found it! Stale cache entries preventing optimization
      I'm clearing the cache now

@alice: ✓ Latency back to normal (50ms avg)
        Incident created: [INC-2026-0015]

@charlie: Great teamwork! Let's do a postmortem
          tomorrow at 10am UTC.

[Attach metric snapshot] [Create incident] [Start call]

Installation & Setup

Prerequisites

Node.js 18+ or Bun 1.1+
Access to LanOnasis metrics backend
OAuth 2.0 configured

Local Development

# Clone repository
git clone https://github.com/lanonasis/control-room.git
cd control-room

# Install dependencies
bun install

# Configure environment
cp .env.example .env.local

# Start development server
bun run dev

# Open dashboard
open http://localhost:3001

Configuration

Environment Variables

# Backend API
NEXT_PUBLIC_API_BASE_URL="https://api.lanonasis.com"
NEXT_PUBLIC_METRICS_API="https://metrics.lanonasis.com"

# WebSocket
NEXT_PUBLIC_WS_URL="wss://ws.lanonasis.com"

# Authentication
NEXT_AUTH_URL="http://localhost:3001"
NEXT_AUTH_SECRET="your-secret-key"

# Monitoring
SENTRY_DSN="your-sentry-dsn"

# Feature Flags
NEXT_PUBLIC_INCIDENT_MANAGEMENT=true
NEXT_PUBLIC_TEAM_COLLABORATION=true
NEXT_PUBLIC_COMPLIANCE_REPORTS=true

API Reference

Get Real-Time Metrics

GET /api/metrics?service=memory-suite&timeRange=24h

Response:
{
  "service": "memory-suite",
  "metrics": {
    "requestsPerSecond": 85,
    "avgLatency": 50,
    "p95Latency": 120,
    "p99Latency": 250,
    "errorRate": 0.0001,
    "uptime": 0.9998
  },
  "timestamp": "2026-01-15T10:45:00Z"
}

Create Incident

POST /api/incidents

Body:
{
  "title": "Memory Suite Latency Spike",
  "service": "memory-suite",
  "severity": "high",
  "description": "P99 latency exceeded 500ms",
  "assignee": "alice@lanonasis.com"
}

Response:
{
  "id": "INC-2026-0015",
  "status": "investigating",
  "createdAt": "2026-01-15T10:30:00Z"
}

Update Incident

PATCH /api/incidents/:id

Body:
{
  "status": "resolved",
  "rootCause": "Stale cache entries",
  "resolution": "Cleared cache and scaled instances",
  "postmortemUrl": "https://..."
}

Common Workflows

Scenario 1: Detect and Respond to Incident

Real-time alert triggers (latency > threshold)
Dashboard highlights affected service
On-call engineer receives notification
Opens Control Room dashboard
Views incident details and timeline
Identifies root cause (stale cache)
Executes remediation (clear cache)
Monitors metrics for recovery
Creates incident record
Schedules postmortem

Scenario 2: Generate Compliance Report

1. Operations manager opens Control Room
2. Navigates to Compliance section
3. Selects compliance standard (PCI-DSS, HIPAA)
4. Chooses date range (monthly)
5. Clicks "Generate Report"
6. Report shows:
   - All access logs
   - Security incidents
   - Audit trail
   - Uptime metrics
7. Exports as PDF for auditor

Troubleshooting

Issue: Real-time metrics not updating

Solution: Check WebSocket connection

# Check WebSocket connectivity
curl -i -N -H "Connection: Upgrade" \
  -H "Upgrade: websocket" \
  wss://ws.lanonasis.com/metrics

Issue: Slow dashboard loading

Solution: Reduce time range or cache results

NEXT_PUBLIC_CACHE_RESULTS=true
NEXT_PUBLIC_CACHE_TTL=300  # 5 minutes

Memory Suite – Service being monitored
Lanonasis Index – Service discovery
v-secure – Credentials for accessing metrics

Support & Resources

GitHub: lanonasis/control-room
Issues: Report bugs
Email: ops@lanonasis.com

Last Updated: February 3, 2026
Version: 1.2.0+

Overview​

Primary Functions​

Architecture​

Service Structure​

Technology Stack​

Key Dashboards​

1. Main Analytics Dashboard​

2. Service Monitoring​

3. Incident Management​

4. Team Collaboration​

Installation & Setup​

Prerequisites​

Local Development​

Configuration​

Environment Variables​

API Reference​

Get Real-Time Metrics​

Create Incident​

Update Incident​

Common Workflows​

Scenario 1: Detect and Respond to Incident​

Scenario 2: Generate Compliance Report​

Troubleshooting​

Issue: Real-time metrics not updating​

Issue: Slow dashboard loading​

Related Services​

Support & Resources​