Hooking private registries to your Kubernetes cluster

2025-01-10

background

Hooking private registries to your Kubernetes cluster

What's a Container Registry?

A container registry is a repository for storing and distributing container images. Think of it as a Git repository but for Docker images - it stores your container images securely and makes them available for download when needed.

Docker Login and Credentials

Basic Docker Login

To authenticate with a private registry, you use the docker login command:

docker login myregistry.plamen.works

You can specify your credentials in the command with -u username -p password

Understanding Docker Credentials

When you log in, Docker creates a credentials file at ~/.docker/config.json. Here's how it looks:

{
  "auths": {
    "myregistry.azurecr.io": {
      "auth": "base64EncodedCredentials"
    }
  }
}

The auth field contains a base64-encoded string of username:password. You can decode it using:

echo "base64EncodedCredentials" | base64 --decode

In some of the most mature registries the username will be something like 00000000-_0000-0000-0000_-000000000000 and the password will be a token that's short lived or long lived depending on how you authenticated yourself with the provider methods.

Now that we understand how Docker handles registry authentication, let's see how Kubernetes builds upon these concepts for container orchestration.

How Kubernetes Authenticates with Registries

Kubernetes uses several mechanisms to authenticate with private registries:

  1. ImagePullSecrets: The primary method where you create a Secret containing registry credentials and use said secret in your manifests specifying it when is needed for an image or more

    apiVersion: v1
    kind: Secret
    metadata:
      name: xyz-registry-secret
    type: kubernetes.io/dockerconfigjson
    data:
      .dockerconfigjson: <base64-encoded-docker-config>
  2. Default Service Account: You can attach the ImagePullSecret to the default (or any actually) Service Account so by default when this Service Account is used or auto-attached it will be able to pull your private images:

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: default
    imagePullSecrets:
    - name: xyz-registry-secret

Connecting Independent Docker Registries

Step 1: Create Registry Credentials

kubectl create secret docker-registry registry-cred \
  --docker-server=<registry-server> \
  --docker-username=<username> \
  --docker-password=<password> \
  --docker-email=<email>

Step 2: Configure Pods to Use the Secret

apiVersion: v1
kind: Pod
metadata:
  name: private-app
spec:
  containers:
  - name: app
    image: myregistry.plamen.works/app:1.0
  imagePullSecrets:
  - name: registry-cred

Step 3: Verify Configuration

kubectl describe pod private-app

While manual registry configuration provides flexibility, in the real world companies opt for managed registry solutions that integrate natively with their cloud infrastructure. Let's see these options.

Managed Registry Solutions and Platform-Specific Registry Integration

When using managed Kubernetes services from major cloud providers, they offer native integration with their respective container registries. This integration automatically configures the necessary authentication and authorization between your Kubernetes cluster and the container registry, eliminating the need for manual secret management.

The integration works by leveraging the cloud provider's identity and access management (IAM) systems. When enabled, the cluster's nodes receive the appropriate IAM roles or service accounts that grant them permission to pull images from the designated registry. This means:

  1. No need to manually create or manage registry credentials
  2. Automatic rotation of credentials and tokens
  3. Centralized access control through the cloud provider's IAM system
  4. Reduced risk of credential exposure since no registry passwords or tokens are stored in Kubernetes

AKS Attached Registries

  1. Managed Identity Method:

    # Enable workload identity and OIDC issuer
    az aks update -n myAKSCluster -g myResourceGroup \
        --enable-workload-identity \
        --enable-oidc-issuer

Similar to AWS IRSA and GCP Workload Identity below, you'll need to:

  • Create a managed identity
  • Grant it ACR pull permissions
  • Create a service account with proper annotations
  • Establish federation between the identity and service account
  1. Service Principal Method:

    az aks update -n myAKSCluster -g myResourceGroup \
        --attach-acr <acr-name>

For AWS's EKS, there isn't a direct equivalent command like Azure's --attach-acr. Instead, you need to:

  1. Set up IAM roles and policies:
# Assuming you setup your cluster with eksctl and have the binary
# Create an OpenID Connect provider for your cluster
eksctl utils associate-iam-oidc-provider --cluster=<cluster-name> --approve

# Create an IAM role for your service account
eksctl create iamserviceaccount \
    --name aws-node \
    --namespace kube-system \
    --cluster <cluster-name> \
    --attach-policy-arn arn:aws:iam::aws:policy/AmazonECR_READONLY_ACCESS \
    --approve

For GCP GKE, they use Workload Identity instead of a single command:

# Enable Workload Identity on your cluster
gcloud container clusters update CLUSTER_NAME \
    --workload-pool=PROJECT_ID.svc.id.goog

# Create a service account
gcloud iam service-accounts create gke-sa

# Grant access to GCR
gcloud projects add-iam-policy-binding PROJECT_ID \
    --member="serviceAccount:gke-sa@PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/storage.objectViewer"

# Bind the K8s service account to the Google service account
kubectl annotate serviceaccount default \
    iam.gke.io/gcp-service-account=gke-sa@PROJECT_ID.iam.gserviceaccount.com

The key difference is:

  • Azure: Single command solution with --attach-acr
  • AWS: Uses IAM roles with service accounts (IRSA)
  • GCP: Uses Workload Identity

Azure's approach is the most streamlined of the three, while AWS and GCP require more explicit configuration of the identity chain between the cluster and the registry.

Alternative Registry Providers

  1. Harbor Registry

    • Open source registry
    • Enterprise security features
    • Image replication
    • RBAC support
  2. JFrog Artifactory

    • Multi-repository support
    • Advanced security features
    • High availability
    • Extensive API support
  3. GitLab Container Registry

    • Integrated with GitLab CI/CD
    • Built-in vulnerability scanning
    • Registry cleanup policies

Troubleshooting

Common issues and solutions:

  1. ImagePullBackOff

    • Verify credentials: Check if secrets are properly configured and not expired
    kubectl get secret <secret-name> -o yaml
    kubectl describe pod <pod-name>  # Look for events section
    • Check network connectivity: Ensure nodes can reach the registry
    # Test from a debug pod
    kubectl run test-net --rm -i -t --restart=Never --image=busybox -- ping myregistry.plamen.works
    • Validate image name and tag: Confirm exact spelling and case sensitivity
    # Try pulling manually first
    docker pull myregistry.plamen.works/app:1.0
    • Check registry quota: Verify you haven't hit registry limits
    # For Azure Container Registry
    az acr show-usage -n <registry-name> -g <resource-group>
  2. Authentication Failures

    • Ensure secrets are correctly mounted:
    # Verify secret exists in the correct namespace
    kubectl get secrets -n <namespace>
    
    # Check secret contents (encoded)
    kubectl get secret <secret-name> -o jsonpath='{.data.*}' | base64 --decode
    • Verify credential format:
    # Test credentials manually
    docker login myregistry.plamen.works -u <username> -p <password>
    
    # Check if secret is properly formatted
    kubectl get secret <secret-name> -o jsonpath='{.data.\.dockerconfigjson}' | base64 --decode | jq .
    • Check expiration dates for tokens:
    # For Azure AD tokens
    az acr token list -r <registry-name>
    
    # For AWS ECR
    aws ecr get-login-password --region <region>
    • Verify service account configuration:
    kubectl describe serviceaccount <sa-name>
    kubectl get pods <pod-name> -o yaml | grep serviceAccount
  3. Rate Limiting

    • Implement pull-through caching
    • Use registry mirrors
  4. Performance Issues

    • Monitor registry metrics:
    # For Azure Container Registry
    az monitor metrics list --resource <registry-id> --metric "TotalPullCount"
    
    # For Harbor
    curl -X GET "https://<harbor-url>/api/v2.0/statistics"
    • Check for network latency:
    # Test latency to registry
    time curl -I https://myregistry.plamen.works/v2/
    • Verify storage performance:
    # Check storage metrics
    kubectl get pvc
    kubectl describe pv <pv-name>
  5. Clean-up and Maintenance

    • Remove unused images:
    # For Docker
    docker image prune -a
    
    # For Azure Container Registry
    az acr run --cmd "acr purge --filter 'repo:tag' --ago 30d" -r <registry-name>
    • Check registry storage usage:
    # For Harbor
    curl -X GET "https://<harbor-url>/api/v2.0/statistics"
    
    # For Docker Registry
    curl -X GET "https://<registry-url>/v2/_catalog"
    • Monitor garbage collection:
    # For Harbor
    curl -X GET "https://<harbor-url>/api/v2.0/system/gc/schedule"

Each issue should be systematically investigated using the provided commands and checks. Always start with the simplest possible cause and work your way up to more complex scenarios. Keep logs of all troubleshooting steps taken for future reference.

That's all fellas! Questions? Catch me on Bluesky @plamen.works. Might be worth bookmarking this for the next time you're wrestling with registry auth.

Peace ;)