Creating a VKS Cluster with Cilium CNI

Julius M. Nicolescu

May 2026

Overview

This document describes how to create a VMware Kubernetes Service (VKS) guest cluster that uses Cilium as its Container Network Interface (CNI) plugin, replacing the default Antrea CNI.

Cilium is an eBPF-based networking and security solution for Kubernetes. Unlike traditional CNIs that rely on iptables, Cilium uses the Linux kernel’s eBPF framework to implement networking rules with higher throughput and lower latency. Key capabilities include:

This guide deploys Cilium version 1.19.1+vmware.1-vks.1 on VKS v1.35.2 with Hubble UI exposed via a LoadBalancer service.

Prerequisite: The VKS Standard Packages bundle must already be mirrored to your local Harbor registry. Follow 01-Readme-vcf-addons-air-gapped.md before proceeding.


Reference Environment

The procedures and configurations in this document were validated against the following platform versions:

Component Version
VMware Cloud Foundation 9.0.2.0
Supervisor v1.32.9+vmware.2-fips.vsc9.0.2.0100-25262241
vSphere Kubernetes Service 3.6.0+v1.35
Kubernetes Release v1.35.2+vmware.1-vkr.3

Architecture

┌─────────────────────────────────────────────────────────────────┐
│  vSphere Supervisor                                             │
│                                                                 │
│  AddonInstall (cilium) ──► selects clusters with cniRef=cilium  │
│  AddonConfig  (cilium) ──► Hubble enabled, LoadBalancer service │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │  VKS Guest Cluster: dev-cluster-05                       │   │
│  │                                                          │   │
│  │  Control Plane (3 nodes) + Worker Pool (3 nodes)         │   │
│  │  CNI: Cilium 1.19.1                                      │   │
│  │  Image source: Harbor (air-gapped)                       │   │
│  │                                                          │   │
│  │  ┌─────────────────────────────────────────────────┐     │   │
│  │  │  kube-system                                    │     │   │
│  │  │  cilium (DaemonSet)     — eBPF dataplane        │     │   │
│  │  │  cilium-envoy (DS)      — L7 proxy              │     │   │
│  │  │  cilium-operator        — cluster-wide control  │     │   │
│  │  │  hubble-relay           — flow aggregator       │     │   │
│  │  │  hubble-ui (LB svc)     — web dashboard         │     │   │
│  │  └─────────────────────────────────────────────────┘     │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Prerequisites

Set the following environment variables before running any commands:

export CLUSTER_NAMESPACE='lab-poc-namespace-dev'
export CLUSTER_NAME='dev-cluster-05'
export HARBOR='harbor.example.com'

Step 1 — Verify That the Cilium Addon Is Available

Before creating any resources, confirm that the Supervisor can resolve the Cilium addon from the (potentially overridden) addon repository:

vcf addon available list cilium

Expected output:

NAMESPACE                 ADDONNAME  VERSION                ADDON-RELEASE-NAME                                  PACKAGE
vmware-system-vks-public  cilium     1.19.1+vmware.1-vks.1  cilium.kubernetes.vmware.com.1.19.1-vmware.1-vks.1  cilium.kubernetes.vmware.com/1.19.1+vmware.1-vks.1

If Cilium does not appear, the addon repository override is not active. Return to Part III of the air-gapped setup guide and verify the AddonRepository and AddonRepositoryInstall resources.

You can also inspect the full Cilium addon definition at the Supervisor level:

kubectl get addons cilium -n vmware-system-vks-public -o yaml

Step 2 — Prepare the Manifests

Four YAML manifests are needed to create a VKS cluster with Cilium:

Manifest Purpose
AddonInstall Tells the Supervisor to install Cilium on clusters matching a selector
AddonConfig Provides Cilium-specific configuration (Hubble, service type)
Secret Injects the Harbor CA certificate into cluster nodes so they can pull images
Cluster Defines the VKS cluster topology, networking, and node configuration

2.1 AddonInstall — Cilium Addon Binding

The AddonInstall resource binds the Cilium addon to any cluster that: - Is labeled with cluster.x-k8s.io/cluster-name: dev-cluster-05 - Has cniRef.name = cilium set in its bootstrap addons - Runs Kubernetes version >= 1.35.0

The stopMatchingBehavior: Retain field means that if a cluster no longer matches the selector (e.g., the label is removed), the Cilium addon is kept rather than uninstalled — this prevents accidental network outages.

cat << EOF > ${CLUSTER_NAME}-cilium-addon-install.yaml
apiVersion: addons.kubernetes.vmware.com/v1alpha1
kind: AddonInstall
metadata:
  name: ${CLUSTER_NAME}-cilium
  namespace: ${CLUSTER_NAMESPACE}
spec:
  addonConfigNameTemplate: '{{.cluster.name}}-{{.addon.name}}'
  addonRef:
    name: cilium
  clusters:
  - constraints:
      expression: cluster.cniRefName() == 'cilium' && version_in_range(cluster.spec.topology.version, '>=1.35.0')
    selector:
      matchLabels:
        cluster.x-k8s.io/cluster-name: ${CLUSTER_NAME}
  stopMatchingBehavior: Retain
EOF

2.2 AddonConfig — Cilium Customization

The AddonConfig provides values that override the default Cilium Helm chart values. Here, Hubble (Cilium’s observability layer) is enabled with all three of its components:

cat << EOF > ${CLUSTER_NAME}-cilium-addon-config.yaml
apiVersion: addons.kubernetes.vmware.com/v1alpha1
kind: AddonConfig
metadata:
  annotations:
    clusteraddon.addons.kubernetes.vmware.com/owned-for-deletion: "true"
  name: ${CLUSTER_NAME}-cilium
  namespace: ${CLUSTER_NAMESPACE}
spec:
  values:
    hubble:
      enabled: true
      relay:
        enabled: true
      ui:
        enabled: true
        service:
          type: LoadBalancer
EOF

Hubble UI service types: LoadBalancer provisions an external IP via NSX or AVI (recommended for lab/demo access). Use ClusterIP if you prefer to access Hubble only via kubectl port-forward, or NodePort for a fixed port on worker node IPs.

2.3 CA Certificate Secret — Harbor Trust for Cluster Nodes

VKS cluster nodes need to trust your Harbor CA certificate to pull addon images during bootstrap. The osConfiguration.trust.additionalTrustedCAs field in the Cluster spec references a Secret that contains the CA in PEM format, base64-encoded.

Generate the CA Secret Manifest

Run the following to generate the manifest directly from your Harbor CA certificate file:

CA_B64=$(base64 -w 0 /tmp/harbor-ca/$HARBOR.crt)

cat << EOF > ${CLUSTER_NAME}-additional-ca.yaml
apiVersion: v1
kind: Secret
metadata:
  name: harbor-user-trusted-ca-secret
  namespace: ${CLUSTER_NAMESPACE}
data:
  harbor-ca: ${CA_B64}
type: Opaque
EOF

Note: The key name harbor-ca in the Secret’s data field must match the secretRef.key value in the Cluster manifest (see section 2.4 below). If you change one, update the other.

2.4 Cluster Manifest — VKS Cluster Definition

This is the primary manifest that defines the entire VKS cluster topology. Each section is explained below.

cat << EOF > ${CLUSTER_NAME}-create.yaml
apiVersion: cluster.x-k8s.io/v1beta2
kind: Cluster
metadata:
  name: ${CLUSTER_NAME}
  namespace: ${CLUSTER_NAMESPACE}
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
        - 192.168.156.0/20    # Pod IP range — assigned to pods across all nodes
    services:
      cidrBlocks:
        - 10.96.0.0/12        # ClusterIP range — virtual IPs for Services
    serviceDomain: cluster.local
  topology:
    classRef:
      name: builtin-generic-v3.6.0
      namespace: vmware-system-vks-public
    version: v1.35.2---vmware.1-vkr.3
    variables:
      - name: kubernetes
        value:
          certificateRotation:
            enabled: true
            renewalDaysBeforeExpiry: 90    # Rotate certs 90 days before expiry
          security:
            podSecurityStandard:
              audit: restricted
              auditVersion: latest
              enforce: privileged           # Privileged enforcement required for Cilium's eBPF DaemonSet
              enforceVersion: latest
              warn: privileged
              warnVersion: latest
      - name: osConfiguration
        value:
          trust:
            additionalTrustedCAs:
              - caCert:
                  secretRef:
                    key: harbor-ca
                    name: harbor-user-trusted-ca-secret    # References the CA Secret from step 2.3
      - name: vmClass
        value: best-effort-medium    # Default VM class (overridden per role below)
      - name: storageClass
        value: vks-storage-policy
      - name: bootstrapAddons
        value:
          cniRef:
            name: cilium               # Selects Cilium as the CNI — triggers AddonInstall matching
      - name: node
        value:
          firewall:
            inboundRules:
              - fromPort: 2379         # etcd client API
                protocol: tcp
                toPort: 2380           # etcd server-to-server
              - fromPort: 8472         # Cilium VXLAN/Geneve overlay (UDP)
                protocol: udp
                toPort: 8472
              - fromPort: 4240         # Cilium health checks
                protocol: tcp
                toPort: 4240
              - fromPort: 4244         # Hubble relay gRPC
                protocol: tcp
                toPort: 4244
              - fromPort: 31235        # Cilium metrics
                protocol: tcp
                toPort: 31235
              - protocol: icmp         # Required for Cilium node health probes
    controlPlane:
      replicas: 3
      metadata:
        annotations:
          run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu, os-version=24.04
      variables:
        overrides:
          - name: vmClass
            value: best-effort-large   # Control plane nodes use a larger VM class
          - name: volumes
            value:
              - name: vol-containerd
                mountPath: /var/lib/containerd
                storageClass: vks-storage-policy
                capacity: 30Gi         # Dedicated volume for container images
    workers:
      machineDeployments:
        - class: node-pool
          name: ${CLUSTER_NAME}-np-01
          replicas: 3
          metadata:
            annotations:
              run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu, os-version=24.04
          variables:
            overrides:
              - name: vmClass
                value: best-effort-large
              - name: volumes
                value:
                  - name: vol-containerd
                    mountPath: /var/lib/containerd
                    storageClass: vks-storage-policy
                    capacity: 30Gi
              - name: node
                value:
                  firewall:
                    inboundRules:
                      - fromPort: 8472
                        protocol: udp
                        toPort: 8472
                      - fromPort: 4240
                        protocol: tcp
                        toPort: 4240
                      - fromPort: 4244
                        protocol: tcp
                        toPort: 4244
                      - fromPort: 31235
                        protocol: tcp
                        toPort: 31235
                      - protocol: icmp
EOF

Key design decisions in this manifest:

Setting Value Reason
Control plane replicas 3 High availability — survives single node failure
Worker replicas 3 Minimum for production-like workload distribution
OS image Ubuntu 24.04 LTS, required for Cilium’s eBPF capabilities
enforce: privileged Required for Cilium Cilium’s DaemonSet requires privilege to load eBPF programs
vol-containerd 30 Gi Dedicated containerd volume Prevents / disk pressure from large addon images
VXLAN port 8472/UDP Cilium tunnel mode Default overlay encapsulation between nodes
Ports 4240, 4244 Cilium health + Hubble Required for connectivity probing and flow collection

Step 3 — Deploy

Apply the manifests in the following order. The CA Secret and AddonInstall/AddonConfig must be created before the Cluster itself so that they are resolved during cluster bootstrap.

# 1. Create the Cilium AddonInstall on the Supervisor
kubectl apply -f ${CLUSTER_NAME}-cilium-addon-install.yaml
kubectl get addoninstall ${CLUSTER_NAME}-cilium -n ${CLUSTER_NAMESPACE}

# 2. Apply the Cilium AddonConfig (Hubble customization)
kubectl apply -f ${CLUSTER_NAME}-cilium-addon-config.yaml

# 3. Create the Harbor CA Secret (must exist before the Cluster is created)
kubectl apply -f ${CLUSTER_NAME}-additional-ca.yaml

# 4. Create the VKS Cluster
kubectl apply -f ${CLUSTER_NAME}-create.yaml

Step 4 — Monitor Cluster Creation

Cluster provisioning involves several stages: VM creation, OS bootstrapping, Kubernetes control plane initialization, and worker join. Use the following commands to track progress.

High-Level Status

# List all clusters in all namespaces
vcf cluster list -A

# Get detailed status for the specific cluster
vcf cluster get ${CLUSTER_NAME} -n ${CLUSTER_NAMESPACE}

# Kubernetes-level cluster object
kubectl get cluster ${CLUSTER_NAME} -n ${CLUSTER_NAMESPACE}

Detailed Resource Status

# Watch all cluster-related resources at once
kubectl get virtualmachinesetresourcepolicy,virtualmachineservice,kubeadmcontrolplane,machinedeployment,machine,virtualmachine \
  -n ${CLUSTER_NAMESPACE}

# Control plane status (watch for "initialized: true" and "ready: true")
kubectl get kubeadmcontrolplanes -n ${CLUSTER_NAMESPACE}

# Worker node deployment status (watch for "readyReplicas" reaching desired count)
kubectl get machinedeployments -n ${CLUSTER_NAMESPACE}

Cluster creation typically takes 10–20 minutes. When the cluster is ready, the Cluster object’s status will show phase: Provisioned.


Step 5 — Fetch the Cluster Kubeconfig

Once the cluster is provisioned, the Supervisor stores the admin kubeconfig in a Secret resource. Extract it to connect directly to the guest cluster:

# Write the kubeconfig to a file
kubectl get secret ${CLUSTER_NAME}-kubeconfig \
  -n ${CLUSTER_NAMESPACE} \
  -o jsonpath='{.data.value}' \
  | base64 -d > $HOME/${CLUSTER_NAME}-kubeconfig

# Set KUBECONFIG to point to the guest cluster
export KUBECONFIG=$HOME/${CLUSTER_NAME}-kubeconfig

# Verify connectivity
kubectl get nodes

Step 6 — Install the Cilium CLI and Hubble CLI

The Cilium CLI (cilium) and Hubble CLI (hubble) are optional but recommended tools for validating the installation and observing network flows from the command line.

Install Cilium CLI

CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64

curl -L --fail --remote-name-all \
  https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}

sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}

cilium version | head -1
# Expected: cilium-cli: v0.19.4 compiled with go1.26.3 on linux/amd64

Install Hubble CLI

The Hubble CLI (hubble) is the command-line interface for querying the Hubble relay. It connects to localhost:4245 via a port-forward.

HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
HUBBLE_ARCH=amd64

curl -L --fail --remote-name-all \
  https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-${HUBBLE_ARCH}.tar.gz{,.sha256sum}

sha256sum --check hubble-linux-${HUBBLE_ARCH}.tar.gz.sha256sum
sudo tar xzvfC hubble-linux-${HUBBLE_ARCH}.tar.gz /usr/local/bin
rm hubble-linux-${HUBBLE_ARCH}.tar.gz{,.sha256sum}

hubble version | head -1
# Expected: hubble v1.19.3 compiled with go1.26.2 on linux/amd64

Step 7 — Validate the Cilium Installation

7.1 Overall Health Check

cilium status

Expected output showing all components healthy:

    /¯¯\
 /¯¯\__/¯¯\    Cilium:             OK
 \__/¯¯\__/    Operator:           OK
 /¯¯\__/¯¯\    Envoy DaemonSet:    OK
 \__/¯¯\__/    Hubble Relay:       OK
    \__/       ClusterMesh:        disabled

DaemonSet   cilium           Desired: 5, Ready: 5/5, Available: 5/5
DaemonSet   cilium-envoy     Desired: 5, Ready: 5/5, Available: 5/5
Deployment  cilium-operator  Desired: 2, Ready: 2/2, Available: 2/2
Deployment  hubble-relay     Desired: 1, Ready: 1/1, Available: 1/1
Deployment  hubble-ui        Desired: 1, Ready: 1/1, Available: 1/1

The Image versions section of the output will also list each image SHA with your Harbor registry as the source — confirming that images were pulled from the air-gapped mirror.

7.2 Verify Configuration

Inspect key Cilium configuration settings:

# Check routing mode and kube-proxy replacement
cilium config view | grep -E "cluster-pool|routing-mode|kube-proxy|masquerade|hubble"
Setting Expected Value Description
routing-mode tunnel Pod-to-pod traffic is encapsulated in VXLAN/Geneve between nodes
kube-proxy-replacement true Cilium replaces kube-proxy with eBPF load balancing
hubble-enabled true Hubble flow collection is active

Tunnel mode means traffic between pods on different nodes is encapsulated: Pod A → Node A → VXLAN tunnel → Node B → Pod B. This is the default for VKS and works across standard L3 networks without requiring BGP or special routing.

7.3 IPAM and Pod CIDR Allocation

Verify that each node has received a /24 pod CIDR from the cluster’s pod range (192.168.156.0/20):

kubectl get ciliumnode -o json \
  | jq '.items[] | {name: .metadata.name, podCIDRs: .spec.ipam.podCIDRs}'

Expected output (one entry per node):

{
  "name": "dev-cluster-05-np-01-worker-1",
  "podCIDRs": ["192.168.144.0/24"]
}
{
  "name": "dev-cluster-05-np-01-worker-2",
  "podCIDRs": ["192.168.145.0/24"]
}

7.4 Review Cilium-Managed Endpoints

Each running pod that Cilium manages receives a CiliumEndpoint object with a security identity:

kubectl get ciliumendpoints -A -o wide

Expected output (truncated):

NAMESPACE    NAME                                 SECURITY IDENTITY   ENDPOINT STATE   IPV4
kube-system  coredns-f885c4f4d-cw6sx              13843               ready            192.168.144.165
kube-system  hubble-relay-74475757c4-l7js5         39586               ready            192.168.145.194
kube-system  hubble-ui-55c8649bdc-drjps            558                 ready            192.168.145.224
kube-system  metrics-server-54dd6f4ffc-5n6lw       14069               ready            192.168.144.138

Security identities are assigned based on pod labels. These identities are used to enforce CiliumNetworkPolicy rules.

7.5 Envoy DaemonSet

Cilium uses a per-node Envoy proxy (rather than per-pod sidecars) to implement L7 network policies. One cilium-envoy pod runs on every node:

kubectl get pod -n kube-system -l k8s-app=cilium-envoy -o wide

Expected output:

NAME                 READY   STATUS    RESTARTS   IP           NODE
cilium-envoy-2nmkr   1/1     Running   0          10.55.20.3   dev-cluster-05-worker-1
cilium-envoy-5pqz9   1/1     Running   0          10.55.20.6   dev-cluster-05-worker-2
...

7.6 eBPF Load Balancer Service Table

Cilium maintains an eBPF map of all Kubernetes Services and their backend endpoints. Inspect it to confirm that kube-proxy replacement is functioning:

kubectl exec -n kube-system ds/cilium -c cilium-agent -- cilium-dbg service list

Expected output (example with DNS and API server services):

ID   Frontend                  Service Type   Backend
1    10.103.44.230:80/TCP      ClusterIP      1 => 192.168.145.194:4245/TCP (active)
2    10.77.64.2:80/TCP         LoadBalancer   1 => 192.168.145.224:8081/TCP (active)
4    10.96.0.10:53/TCP         ClusterIP      1 => 192.168.144.99:53/TCP (active)
                                              2 => 192.168.144.165:53/TCP (active)
7    10.96.0.1:443/TCP         ClusterIP      1 => 10.55.20.2:6443/TCP (active)
                                              2 => 10.55.20.6:6443/TCP (active)
                                              3 => 10.55.20.7:6443/TCP (active)

Entry 7 shows the Kubernetes API server Service (10.96.0.1:443) load-balanced across three control plane nodes — confirming HA control plane routing via eBPF.


Step 8 — Network Observability with Hubble

Hubble provides real-time visibility into pod-to-pod and pod-to-service network flows across the entire cluster.

8.1 Hubble CLI — Command-Line Flow Inspection

The Hubble CLI connects to the Hubble relay via a local port-forward on port 4245. Start the port-forward in the background:

cilium hubble port-forward &
# Output: Hubble Relay is available at 127.0.0.1:4245

Check Hubble relay connectivity and flow statistics:

hubble status

Expected output:

Healthcheck (via localhost:4245): Ok
Current/Max Flows: 24,570/24,570 (100.00%)
Flows/s: 22.73
Connected Nodes: 6/6

Connected Nodes: 6/6 confirms that all 6 nodes (3 control plane + 3 workers) are reporting flows to the relay.

Stream live network flows from all namespaces:

hubble observe

Example output showing health probe traffic between nodes:

May 31 05:35:19: 192.168.149.181:40514 (remote-node) -> 192.168.146.183:4240 (health) to-endpoint FORWARDED (TCP Flags: ACK)
May 31 05:35:29: 192.168.147.95:4118 (world) <> kube-system/hubble-ui-55c8649bdc-drjps:8081 (ID:558) to-overlay FORWARDED (TCP Flags: ACK, PSH)

Each flow line includes: - Timestamp - Source (IP, port, and label if known) - Destination (pod name / namespace / service if known) - Direction (to-endpoint, to-overlay, to-stack) - Verdict (FORWARDED, DROPPED, AUDIT)

You can filter flows by namespace, pod, or verdict:

# Show only flows in the default namespace
hubble observe --namespace default

# Show only dropped flows (useful for debugging network policy)
hubble observe --verdict DROPPED

# Show flows to/from a specific pod
hubble observe --pod kube-system/hubble-ui-55c8649bdc-drjps

8.2 Hubble UI — Web Dashboard

The Hubble UI provides a graphical, real-time service map of pod-to-pod communication. Since it was deployed with service.type: LoadBalancer, it receives an external IP from the NSX/AVI load balancer.

Retrieve the Hubble UI URL:

HUBBLE_URL="http://$(kubectl get service hubble-ui -n kube-system \
  -o jsonpath='{.status.loadBalancer.ingress[0].ip}')/?namespace=kube-system"

echo "Hubble UI: $HUBBLE_URL"

Open the URL in a browser to view the service dependency graph and flow log for the selected namespace.


Troubleshooting

Symptom Likely Cause Resolution
Cluster stuck in Provisioning Harbor CA not injected into nodes Verify the harbor-user-trusted-ca-secret Secret exists in the namespace before applying the Cluster manifest
Cilium pods in Init state AddonInstall not created, or Cilium images not available in Harbor Run vcf addon available list cilium and confirm the image is in Harbor
cilium status shows Hubble relay NOT OK Hubble relay pod not running Check kubectl get pods -n kube-system -l k8s-app=hubble-relay
hubble observe shows no flows Port-forward not started Run cilium hubble port-forward & before using hubble CLI
Hubble UI has no external IP AVI/NSX load balancer not configured for the namespace Change service.type to NodePort or ClusterIP with port-forward as a workaround
Pods failing to pull images Harbor CA not trusted at OS level Re-run the CA trust steps on the relevant node(s) or verify the CA Secret content
enforce: privileged warning Pod Security Standard alerting on Cilium pods This is expected; Cilium requires privilege. The privileged enforce level is intentional

Reference

Name URL
Cilium Documentation https://docs.cilium.io/
Hubble Documentation https://docs.cilium.io/en/stable/observability/hubble/
VKS Cluster Management — Broadcom Techdocs https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-service-administration-and-development/9-0.html
VCF CLI Plugin Reference https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-service-administration-and-development/9-0.html
Air-Gapped Setup Guide 01-Readme-vcf-addons-air-gapped.md