February 09, 2026 • Niels Weistra • Cloud, Architecture

Building Your Own Cloud: ITL Control Plane Alpha

Introducing the ITL Control Plane — a baby wolf taking its first steps in building a proper cloud abstraction layer before touching any servers.

ITL Control Plane Dashboard showing 374 resources, 6 tenants, all managed through a unified abstraction layer

This is alpha. A baby wolf. A first breath of something that might grow into a full-fledged platform, or crash spectacularly trying. Either way, we’re building it in public.

The ITL Control Plane is my attempt at solving a problem I’ve encountered in every enterprise I’ve worked with: how do you manage diverse infrastructure without drowning in bespoke scripts and tribal knowledge?

The Problem: Building Your Cloud Wrong

Most people think “building your own cloud” means spinning up servers, configuring hypervisors, and wrestling with networking. That’s the data plane. And jumping straight there is a recipe for an unmaintainable mess.

Before you deploy your first VM, you need answers to:

Who can create resources? (Identity, roles, permissions)
How do you organize them? (Tenants, subscriptions, resource groups)
How do you track what exists? (Metadata, relationships, lineage)
How do you audit changes? (Activity logs, compliance trails)
How do you scale governance? (Policies, management groups, hierarchies)

Azure, AWS, and GCP all solved these problems first. Their control planes (the abstraction layer) came before the servers. We’re following the same path.

The Abstraction Layer: ARM-Style Resource Hierarchy

ITL Control Plane implements a hierarchical resource model inspired by Azure Resource Manager (ARM):

                    ┌─────────────┐
                    │   Tenant    │  (Organization boundary)
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
     ┌────────▼─────┐ ┌────▼────┐ ┌─────▼─────┐
     │ Management   │ │ Mgmt    │ │ Mgmt      │  (Governance scope)
     │ Group        │ │ Group   │ │ Group     │
     └──────┬───────┘ └────┬────┘ └─────┬─────┘
            │              │            │
     ┌──────▼──────┐  ┌────▼────┐  ┌────▼────┐
     │ Subscription│  │ Sub     │  │ Sub     │  (Billing/isolation)
     └──────┬──────┘  └────┬────┘  └────┬────┘
            │              │            │
     ┌──────▼──────┐  ┌────▼────┐  ┌────▼────┐
     │ Resource    │  │ RG      │  │ RG      │  (Logical grouping)
     │ Group       │  │         │  │         │
     └──────┬──────┘  └────┬────┘  └────┬────┘
            │              │            │
        Resources      Resources    Resources    (VMs, databases, etc.)

Every resource has a hierarchical ID that tells you exactly where it lives:

/tenants/kadaster/managementgroups/kadaster-platform/subscriptions/sub-prod/resourcegroups/rg-app/providers/ITL.Compute/virtualmachines/vm-web-01

This isn’t just organization — it’s queryable, auditable, and policy-attachable. Want to block certain resource types in production subscriptions? Attach a policy at the management group level. Need to audit who created what in the last 30 days? Query the activity log with the tenant scope.

What’s Working (The Dashboard Tour)

Let me show you what’s actually functional in this alpha.

Resource Discovery & Management

The dashboard provides a unified view across all tenants and resource types. Filtering, sorting, export to CSV — the basics that make resource management bearable.

All Resources View 354 resources across 6 tenants, filterable by tenant, location, and type

Current counts:

374 total resources
6 tenants (ITL, Contoso, Fabrikam, Kadaster, etc.)
36 management groups
50 subscriptions
235 resource groups
27 locations

Resource Graph Visualization

Every resource relationship is stored in Neo4j as a graph. The dashboard includes a D3.js force-directed visualization showing how tenants, management groups, subscriptions, and resource groups connect.

Resource Graph 368 nodes, 481 relationships — visualized as an interactive graph

This isn’t just pretty — it’s operational reality. When you need to understand blast radius (“if I delete this subscription, what resource groups go with it?”), the graph tells you immediately.

Activity Logs & KQL Query Editor

Every resource operation (CREATE, UPDATE, DELETE) gets logged with full context: who did it, when, what changed, correlation IDs, and response times.

Activity Logs Audit trail showing resource creation events, compliance-ready

The query editor supports KQL-style queries for filtering logs:

KQL Query Editor Filter logs with KQL: operation == "CREATE" and tenant == "kadaster"

Saved queries and a sample library make common operations repeatable:

Built-in sample queries organized by category

Resource Details

Click any resource to see its full properties, relationships, and actions:

Resource Detail Kadaster tenant detail view with properties and metadata

Infrastructure Monitoring

The platform itself runs as a Docker Compose stack (for local development) or Kubernetes (production). The infra view shows container health and service dependencies:

Infrastructure View 7 containers: API Gateway, Core Provider, Dashboard, Neo4j, PostgreSQL, RabbitMQ, CloudBeaver

The Architecture: Separation of Concerns

┌─────────────────────────────────────────────────────────────┐
│                    API Gateway (FastAPI)                    │
│  Routes requests to resource providers, enforces auth       │
└────────────────────┬────────────────────────────────────────┘
                     │
         ┌───────────┼───────────┐
         ▼           ▼           ▼
    ┌─────────┐ ┌─────────┐ ┌─────────┐
    │   SDK   │ │ Neo4j   │ │Keycloak │
    │(Contracts│ │(Graph   │ │(IAM)    │
    └────┬────┘ │ Metadata│ └────┬────┘
         │      └────┬────┘      │
         └───────────┼───────────┘
                     │
    ┌────────────────┼────────────────┐
    ▼                ▼                ▼
┌──────────┐   ┌──────────┐   ┌──────────┐
│  Core    │   │ Compute  │   │   IAM    │
│ Provider │   │ Provider │   │ Provider │
└──────────┘   └──────────┘   └──────────┘

Key design decisions:

SDK as Contract Layer — The Python SDK (itl-controlplane-sdk) defines all resource models, request/response structures, and provider interfaces. It’s the single source of truth.
Provider Pattern — Each resource type (compute, IAM, storage) has its own provider service implementing a standardized interface. This allows independent scaling, deployment, and team ownership.
Graph-Based Metadata — Neo4j stores all resource relationships, enabling queries like “find all resource groups under this management group” or “show the lineage of this resource.”
PostgreSQL for State — The relational database stores resource properties and operational state. The graph stores relationships.
Message Queue for Async — RabbitMQ handles long-running operations and provider-to-provider communication.

The Technology Stack (And Why)

Component	Technology	Why This Choice
API Gateway	FastAPI (Python)	Async, type-safe, auto-generated OpenAPI docs, easy to extend
SDK	Python + Pydantic	Type validation, serialization, IDE support, shared across all services
Graph DB	Neo4j	Native graph queries for relationships, Cypher is readable, scales well
Relational DB	PostgreSQL	Battle-tested, JSON support, strong consistency for resource state
Message Queue	RabbitMQ	Reliable delivery, dead-letter queues, widely understood
IAM	Keycloak	Full OIDC/OAuth2, realms = tenants, groups/roles, admin API
Containers	Docker Compose / Kubernetes	Local dev with Compose, production with K8s Helm charts
DB Admin	CloudBeaver	Web-based SQL client, useful for debugging and demos

Why Python everywhere? Consistency. The SDK, API, providers, and CLI all share the same models. No translation layers, no serialization bugs, no “works in the SDK but breaks in the API” surprises. Plus, Python’s async/await with FastAPI handles concurrent requests efficiently.

Why Neo4j + PostgreSQL (two databases)? Different data, different access patterns. Resource properties (name, location, tags, state) are relational — you query by ID, filter by type, paginate results. Resource relationships (this subscription belongs to that management group, which belongs to that tenant) are graph queries. Forcing relationships into SQL joins gets ugly fast when hierarchies are 5+ levels deep.

Current Providers

Provider	Status	What It Does
Core Provider	Working	Tenants, management groups, subscriptions, resource groups, locations
IAM Provider	In Progress	Keycloak integration — realms, users, groups, roles, service accounts
Compute Provider	Planned	VM lifecycle via Proxmox/libvirt, container orchestration
Network Provider	Planned	VNets, subnets, tunnels (WireGuard), DNS zones
Storage Provider	Planned	Block storage, object storage (MinIO), file shares

The CLI: `itlcp`

Following the same pattern as Azure CLI (az), we’re building itlcp, a command-line interface for managing resources:

# Authenticate (uses ITLAuth/Keycloak tokens)
itlc login

# Create a subscription
itlcp subscription create --name "prod-workloads" --owner "platform-team"

# List resources in a subscription  
itlcp resource list --subscription sub-prod --resource-group rg-app

# Create a VM
itlcp vm create \
  --subscription sub-prod \
  --resource-group rg-app \
  --name vm-web-01 \
  --size Standard_D2s_v3 \
  --location westeurope

The CLI uses the same SDK models as the API, ensuring consistency between programmatic and interactive access.

What’s Next (The Roadmap)

This is alpha. Baby wolf. Here’s what’s coming:

Phase 1: Core Platform

Identity & Access:

IAM Provider with full Keycloak integration
Service accounts with managed credentials (like Azure Managed Identities)
PIM-style privileged access with time-bound role elevations and approval workflows
RBAC at every scope level (tenant, management group, subscription, resource group)

Governance:

Policy engine to deny certain resource types, enforce naming conventions, require tags
Cost tracking with resource metering, subscription budgets, alerts
Compliance reports showing who did what, when, with exportable audit trails

Phase 2: Infrastructure Providers

Compute:

VM Provider via Proxmox VE (KVM-based, REST API)
Container workloads via Kubernetes (vCluster for tenant isolation)
Serverless functions (OpenFaaS or Knative)

Networking:

VNet provider for virtual networks with subnet CIDR management
Cilium, an eBPF-based CNI for high-performance networking and native network policies
Multus for multi-network attachment, connecting pods to tenant VNets
Hubble for network observability and flow visibility built on Cilium
Tunnel provider using WireGuard mesh for secure site-to-site connections
ZTNA tunnels for Zero Trust Network Access with SPIRE/SPIFFE workload identity
DNS provider using CoreDNS or PowerDNS with zone delegation per tenant
Load balancers using HAProxy or Envoy with config-as-resource

Why Cilium + Multus + Hubble? Traditional Kubernetes networking gives you one flat network. For a multi-tenant cloud, you need proper VNet isolation. Cilium uses eBPF for kernel-level packet processing, faster than iptables, with L7-aware policies (filter by HTTP path, gRPC method). Multus lets pods attach to multiple networks. A pod can have its management interface on the cluster network and a data interface on a tenant-specific VNet. Hubble gives you real-time visibility into all network flows without sampling, showing exactly which services are talking to each other, with latency metrics and HTTP status codes. Together, they form the foundation for Azure-style VNet isolation on Kubernetes.

Why SPIRE/SPIFFE? Traditional VPNs trust the network perimeter. Zero Trust says “never trust, always verify”: every workload gets a cryptographic identity (SPIFFE ID), and connections are authenticated at the workload level, not the network level. SPIRE issues and rotates these identities automatically. Combined with mTLS, you get encrypted, identity-verified communication between services without managing certificates manually.

Storage:

Block storage via Ceph or local LVM
Object storage via MinIO (S3-compatible)
File shares via NFS or SMB with access policies

Phase 3: Enterprise Features

Multi-Cloud Bridge:

Azure Resource Provider to manage Azure resources through ITL Control Plane
AWS Resource Provider with the same abstraction, different cloud
Hybrid policies like “this workload runs on-prem, that one in Azure”

Developer Experience:

Terraform provider for terraform apply against ITL Control Plane
Pulumi provider, same but TypeScript/Python/Go native
GitOps with resource definitions in Git, auto-reconciled
Self-service portal where tenant admins manage their own subscriptions

Infrastructure as Code Vision

The goal is to manage ITL resources the same way you manage Azure or AWS: declaratively.

Terraform (HCL):

resource "itl_tenant" "acme" {
  name         = "acme"
  display_name = "Acme Corp"
}

resource "itl_subscription" "prod" {
  name   = "prod"
  tenant = itl_tenant.acme.name
  owner  = "platform-team"
}

resource "itl_virtual_machine" "web" {
  name           = "web-01"
  subscription   = itl_subscription.prod.name
  resource_group = "app-rg"
  location       = "westeurope"
  size           = "Standard_D2s"
}

Pulumi (Python), real code instead of config:

import itl_pulumi as itl

tenant = itl.Tenant("acme", display_name="Acme Corp")
sub = itl.Subscription("prod", tenant=tenant, owner="platform-team")
rg = itl.ResourceGroup("app-rg", subscription=sub, location="westeurope")
vm = itl.VirtualMachine("web-01", resource_group=rg, size="Standard_D2s")

Both approaches work. Terraform is widely adopted and declarative. Pulumi gives you loops, conditionals, type safety, and IDE autocomplete. Real programming instead of a DSL. Pick your poison.

Operations:

Helm charts for production Kubernetes deployment
Prometheus/Grafana integration for platform observability
Disaster recovery including backup/restore of control plane state

Why Build This?

The idea started a few months ago, sparked by geopolitical tensions and the growing conversation around data sovereignty. When your data lives in someone else’s cloud, you’re subject to their jurisdiction, their policies, their geopolitical reality. CLOUD Act and the list goes on.

What if you need to run infrastructure that stays within your borders? What if “the cloud” needs to be your cloud, on premises you control, in datacenters you choose, under laws you understand?

But there’s another reason: demystification.

Commercial clouds are built on the same technologies available to everyone. Take Entra ID with its Privileged Identity Management (PIM) feature. Sounds enterprise-grade and complex, right? It’s role assignments with time-based expiration and approval workflows. Keycloak can do that. Service accounts? Managed identities? Those are just identity tokens with specific scopes and rotation policies. VNets and tunnels? That’s VLANs, WireGuard, and routing tables with a nice API in front.

The magic isn’t the technology. It’s the abstraction layer that makes it manageable. Azure doesn’t have secret sauce for virtual networks; they have excellent APIs, consistent resource models, and governance tooling that ties it all together.

If you can build the control plane, you can build the cloud. The individual services (compute, storage, networking, identity) are well-understood problems with mature open-source solutions. The hard part is making them work together in a governable, multi-tenant, enterprise-ready way.

That’s what this project is: proving that the patterns behind commercial clouds are reproducible with commodity software and good architecture.

You can’t just spin up some VMs and call it a day. You need the same governance, the same abstractions, the same operational patterns that make Azure, AWS, and GCP manageable. The patterns should feel familiar — tenants, subscriptions, resource groups, hierarchical IDs, activity logs, RBAC — because these patterns work. They’ve been battle-tested at planetary scale.

But I also saw enterprise after enterprise struggle with common problems:

Shadow IT — Teams spinning up resources with no governance
Audit nightmares — “Who created this? When? Why is it still here?”
Cost explosions — Resources that outlive their purpose
Multi-cloud chaos — Different tooling for every provider
Tribal knowledge — The one person who knows where things are leaves

A proper control plane solves these problems before you have them. The abstraction layer isn’t overhead. It’s the foundation everything else builds on.

Try It Yourself

The codebase is spread across several repositories (it’s a microservices architecture, after all):

ITL.ControlPanel.SDK — Python SDK with resource models
ITL.ControlPlane.Api — FastAPI gateway
ITL.ControlPlane.Dashboard — This UI
ITL.ControlPlane.ResourceProvider.Core — Base provider + Docker Compose
ITL.ControlPlane.GraphDB — Neo4j integration

Note: Not every repository is public yet. This is early alpha — I’m cleaning up code, writing documentation, and making sure the repos are ready for external eyes. If you’re interested in early access or want to follow along, reach out.

For local development (once repos are available):

cd ITL.ControlPlane.ResourceProvider.Core
docker-compose up -d

That spins up the full stack: API gateway, core provider, dashboard, Neo4j, PostgreSQL, RabbitMQ, and CloudBeaver for database management.

Status: Alpha (baby wolf)
Stack: Python, FastAPI, Neo4j, PostgreSQL, RabbitMQ, Docker
Inspiration: Azure Resource Manager, Kubernetes Control Plane
Goal: Proper abstraction before infrastructure chaos

It’s a small wolf now. Let’s see how big it grows.

Tags: # control-plane # azure-resource-manager # multi-cloud # python # fastapi # neo4j # cilium # multus # hubble # kubernetes-networking

Share this post:

Back to Blog