Posted 2022-11-01DevOps4 minutes read (About 660 words)

Azure Kubernetes Service Auditing

The AKS assessment is a comprehensive list of factors you need to think about when preparing a cluster for production. It is based on all the popular best practices that have been agreed around Kubernetes.

DISASTER RECOVERY

Ensure perform a greenfield deployment
Create a storage migration plan
Guaranty SLA 99,95%

CLUSTER SETUP

Logically/Physically isolate cluster
- Minimize the use of physical isolation for each separate team or application deployment. Instead, use logical isolation
- Logical separation of clusters usually provides a higher pod density than physically isolated clusters, with less excess compute capacity sitting idle in the cluster. When combined with the Kubernetes cluster autoscaler, you can scale the number of nodes up or down to meet demands. This best practice approach to autoscaling minimizes costs by running only the number of nodes required.
- Logically isolate cluster
- Physically isolate cluster
AAD Integration
Use System Node Pools
AKS Managed Identity. Azure Government isn’t currently supported
VM Sizing
K8S RBAC + AAD Integration
Private cluster
Enable cluster autoscaling
Sizing of the nodes
Refresh container when base image is updated
Use AKS and ACR integration without password
Use placement proximity group to improve performance

DEVELOPMENT

Implement a proper Liveness probe
Implement a proper Startup probe
Implement a proper Readiness probe
Implement a proper prestop hook
Run more than one replica for Deployment
Store secrets in Azure Key Vault, don’t inject passwords in Docker Images
Implement Pod Identity
Use Kubernetes namespaces to properly isolate Kubernetes resources
Set up requests and limits on containers
Specify the security context of pod/container
Conduct Docker Image Builds using Docker Image Security
Static Analysis of Docker Images on Build
Threshold enforcement of Docker Image Builds that contain vulnerabilities
Compliance enforcement of Docker Image Builds:
Scan the container image against vulnerabilities
Allow deploying containers only from known registries
Runtime Security of Applications
Role-Based Access Contol (RBAC) to Docker Registries
Prefer distroless images

RESOURCE MANAGEMENT

Enforce resource quotas
Set memory limits and requests for all containers
Configure pod disruption budgets
Set up cluster auto-scaling

CLUSTER MAINTENANCE

Maintain kubernetes version up to date
Keep nodes up to date and patched
Securely connect to nodes through a bastion host
Regularly check for cluster issues
Monitor the security of cluster with Azure Security Center
Provision a log aggregation tool
Enable master node logs
Collect metrics
Configure distributed tracing
Control the compliance with Azure Policies
Enable Azure Defender for Kubernetes
Use Azure Key Vault
Use GitOps
Use with K8S Tools (Helm, K9s, Rancher)
Don’t use the default namespace

SECURITY

Don’t expose your load-balancer on Internet if not necessary
Use Azure Firewall to secure and control all egress traffic going outside of the cluster

QUESTIONS

How many total pod in application run with max expected CCUs?
How many pod run per node?
Should you use few large nodes or many small nodes in cluster?
- As always, there is no definite answer.
- The type of applications that you want to deploy to the cluster may guide the decision.
- For example, if application requires 10 GB of memory, probably shouldn’t use small nodes — the nodes in cluster should have at least 10 GB of memory.
- Or if application requires 10-fold replication for high-availability, then probably shouldn’t use just 2 nodes — cluster should have at least 10 nodes.
- For all the scenarios in-between it depends on specific requirements.
Few large nodes vs Many small nodes
- Few large nodes
  - Pros:
    - Less management overhead
    - Allows running resource-hungry applications
  - Cons:
    - Large number of pods per node
    - Limited replication
    - Higher blast radius
    - Large scaling increments
- Many small nodes
  - Pros:
    - Reduced blast radius
    - Allows high replication
  - Cons:
    - Large number of nodes
    - More system overhead
    - Lower resource utilisation
    - Pod limits on small nodes