Skip to main content

Auto-Scaling vs Resource Management

Mundarija

  1. Resource Management Asoslari
  2. Resource Requests va Limits
  3. Horizontal Pod Autoscaler (HPA)
  4. Vertical Pod Autoscaler (VPA)
  5. Cluster Autoscaler
  6. Resource Quotas
  7. LimitRange
  8. Best Practices

Resource Management Asoslari

Muammo

Kubernetes cluster limited resource'larga ega:

  • CPU
  • Memory
  • Storage
  • Network bandwidth

Muammolar:

  1. Bitta pod barcha resource'ni iste'mol qilsa?
  2. Resource'lar kam bo'lsa nima qilish kerak?
  3. Qanday qilib resource'larni optimal ishlatish mumkin?

Yechim: Resource Management

Kubernetes 3 ta mexanizmni taqdim etadi:

  1. Resource Requests/Limits: Pod-level resource control
  2. Resource Quotas: Namespace-level limits
  3. LimitRange: Default values va constraints

Resource Types

Compressible Resources:

  • CPU
  • Network bandwidth
  • Throttle qilinishi mumkin, pod crash bo'lmaydi

Incompressible Resources:

  • Memory
  • Storage
  • Limit oshsa, pod OOMKilled (Out of Memory Killed)

Resource Requests va Limits

Ta'rif

Request: Pod ishlashi uchun minimal kerakli resource Limit: Pod maksimal iste'mol qilishi mumkin bo'lgan resource

resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"

CPU

Units:

  • 1 = 1 CPU core (AWS vCPU, Azure vCore)
  • 1000m = 1 core (m = milli-CPU)
  • 0.5 = 500m = yarim core
  • 0.1 = 100m = 1/10 core

Misol:

resources:
requests:
cpu: "250m" # 0.25 core minimum
limits:
cpu: "1" # 1 core maximum

CPU Throttling: Limit oshsa, container throttle qilinadi (slow down), lekin kill qilinmaydi.

Memory

Units:

  • Ki = Kibibyte (1024 bytes)
  • Mi = Mebibyte (1024 Ki)
  • Gi = Gibibyte (1024 Mi)
  • Ti = Tebibyte (1024 Gi)

yoki

  • K = Kilobyte (1000 bytes)
  • M = Megabyte (1000 K)
  • G = Gigabyte (1000 M)
  • T = Terabyte (1000 G)

Misol:

resources:
requests:
memory: "128Mi" # 128 MiB minimum
limits:
memory: "256Mi" # 256 MiB maximum

OOMKilled: Limit oshsa, container OOMKilled (Out of Memory Killed) bo'ladi va restart qilinadi.

QoS (Quality of Service) Classes

Kubernetes pod'larni 3 ta QoS classga ajratadi:

1. Guaranteed (Eng yuqori prioritet)

Sharti:

  • Barcha container'larda request = limit
  • CPU va memory ikkisi ham belgilangan
spec:
containers:
- name: app
resources:
requests:
memory: "200Mi"
cpu: "500m"
limits:
memory: "200Mi" # request bilan teng
cpu: "500m" # request bilan teng

Xususiyat:

  • Eng oxirida evict qilinadi
  • Guaranteed resource

2. Burstable (O'rta prioritet)

Sharti:

  • Request belgilangan, lekin limit yo'q yoki request != limit
spec:
containers:
- name: app
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi" # request'dan katta
cpu: "1" # request'dan katta

Xususiyat:

  • Limit'gacha burst qilishi mumkin
  • Resource pressure'da evict qilinishi mumkin

3. BestEffort (Eng past prioritet)

Sharti:

  • Request va limit yo'q
spec:
containers:
- name: app
resources: {} # Request va limit yo'q

Xususiyat:

  • Resource pressure'da birinchi bo'lib evict qilinadi
  • Barcha available resource'ni ishlata oladi

Scheduling

Scheduler request'larga qarab pod'ni joylashtiradi:

Node Capacity:

CPU: 4 cores
Memory: 8Gi

Running Pods:

Pod-A: CPU request 1 core, Memory 2Gi
Pod-B: CPU request 0.5 core, Memory 1Gi
Total: CPU 1.5 cores, Memory 3Gi

New Pod:

resources:
requests:
cpu: "3"
memory: "6Gi"

Javob: ❌ Joylashtirib bo'lmaydi (CPU: 4 - 1.5 = 2.5 cores qoldi, 3 core kerak)

Node'da joy bormi?

kubectl describe node <node-name>

Output:

Allocated resources:
CPU Requests: 1500m (37%)
CPU Limits: 3000m (75%)
Memory Requests: 3Gi (37%)
Memory Limits: 6Gi (75%)

Resource Pressure

Memory Pressure: Node'da memory kam bo'lsa, kubelet pod'larni evict qila boshlaydi:

  1. BestEffort pod'lar
  2. Burstable pod'lar (limit oshganlar)
  3. Burstable pod'lar (request ichida)
  4. Guaranteed pod'lar (faqat node'ni saqlab qolish uchun)

CPU Pressure: CPU compressible, shuning uchun eviction yo'q, throttling bo'ladi.


Horizontal Pod Autoscaler (HPA)

Ta'rif

HPA - bu pod'lar sonini avtomatik o'zgartirish (horizontal scaling).

Scaling Logic:

Load oshadi → HPA yangi pod'lar yaratadi
Load kamayadi → HPA pod'larni o'chiradi

HPA v1 (CPU-based)

Basic HPA:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 80

Kubectl shorthand:

kubectl autoscale deployment myapp --min=2 --max=10 --cpu-percent=80

Algorithm:

Desired Replicas = ceil(Current Replicas × (Current Metric / Target Metric))

Misol:

Current Replicas: 3
Current CPU: 90%
Target CPU: 80%

Desired = ceil(3 × (90 / 80)) = ceil(3.375) = 4

HPA v2 (Advanced)

Multiple Metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15

Metric Types

1. Resource Metrics

CPU va Memory:

metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80

Target Types:

  • Utilization: Percentage (%)
  • AverageValue: Absolute value

2. Pods Metrics

Custom pod metrics:

metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"

3. Object Metrics

Kubernetes object metrics:

metrics:
- type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: myapp-ingress
target:
type: Value
value: "10k"

4. External Metrics

External system metrics:

metrics:
- type: External
external:
metric:
name: queue_messages_ready
selector:
matchLabels:
queue_name: myqueue
target:
type: AverageValue
averageValue: "30"

Custom Metrics

Prometheus Adapter:

O'rnatish:

helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--set prometheus.url=http://prometheus-server.monitoring.svc

Configuration:

rules:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total"
as: "${1}_per_second"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'

HPA:

metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"

Scaling Behavior

Scale Up:

behavior:
scaleUp:
stabilizationWindowSeconds: 0 # Tez scale up
policies:
- type: Percent
value: 100 # 100% oshirish (double)
periodSeconds: 15 # Har 15 soniyada
- type: Pods
value: 4 # Yoki maksimal 4 ta pod
periodSeconds: 15
selectPolicy: Max # Eng katta policyni tanlash

Scale Down:

behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 5 minut kutish
policies:
- type: Percent
value: 50 # 50% kamaytirish
periodSeconds: 60 # Har daqiqada
- type: Pods
value: 2 # Yoki maksimal 2 ta pod
periodSeconds: 60
selectPolicy: Min # Eng kichik policyni tanlash

HPA Status

kubectl get hpa

Output:

NAME        REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
myapp-hpa Deployment/myapp 45%/80% 2 10 3 5m

Describe:

kubectl describe hpa myapp-hpa

HPA Limitations

  1. Metrics Server kerak: CPU/Memory uchun
  2. Request belgilangan bo'lishi kerak: Resource metrics uchun
  3. Cold start: Yangi pod'lar ready bo'lgunga qadar vaqt ketadi
  4. Flapping: Tez-tez scale up/down oldini olish uchun stabilization kerak

Vertical Pod Autoscaler (VPA)

Ta'rif

VPA - bu pod'larning resource request/limit'larini avtomatik o'zgartirish.

HPA vs VPA:

HPAVPA
Pod soniResource size
HorizontalVertical
StatelessStateless va Stateful
Real-timePod restart kerak

VPA Installation

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

VPA YAML

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: myapp-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "2"
memory: "2Gi"
controlledResources:
- cpu
- memory

Update Modes

1. Off

Tavsiyalar beradi, lekin o'zgartirmaydi:

updatePolicy:
updateMode: "Off"

Status:

kubectl describe vpa myapp-vpa

Output:

Recommendation:
Container Recommendations:
Container Name: app
Lower Bound:
Cpu: 250m
Memory: 256Mi
Target:
Cpu: 500m
Memory: 512Mi
Upper Bound:
Cpu: 1
Memory: 1Gi

2. Initial

Faqat yangi pod'larga qo'llaydi:

updatePolicy:
updateMode: "Initial"

Mavjud pod'lar o'zgarmaydi, yangi pod'lar tavsiya qilingan resource bilan yaratiladi.

3. Recreate

Pod'larni o'chiradi va yangi resource bilan yaratadi:

updatePolicy:
updateMode: "Recreate"

⚠️ Downtime! Pod restart qilinadi.

4. Auto

Avtomatik o'zgartiradi:

updatePolicy:
updateMode: "Auto"

Pod'larni evict qiladi va recommendation bilan qayta yaratadi.

Resource Policy

Min va Max:

resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "4"
memory: "8Gi"

Controlled Resources:

controlledResources:
- cpu
- memory

Faqat belgilangan resource'larni boshqaradi.

VPA Recommendations

VPA 3 ta recommendation beradi:

  1. Lower Bound: Minimal recommend
  2. Target: Optimal recommend
  3. Upper Bound: Maksimal recommend

Algorithm: VPA tarixiy resource usage'ga qarab calculate qiladi (histogram).

HPA + VPA

HPA va VPA birga ishlatish:

Scenario 1: HPA faqat custom metrics:

# HPA - custom metrics
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second

VPA - CPU/Memory:

# VPA - resource optimization
resourcePolicy:
containerPolicies:
- containerName: app
controlledResources:
- cpu
- memory

Scenario 2: HPA va VPA alohida workload'lar:

  • Stateless → HPA
  • Stateful → VPA

⚠️ Ehtiyot: HPA va VPA bir xil metric (CPU/Memory) bo'yicha bir xil deployment'da foydalanmang! Conflict bo'ladi.


Cluster Autoscaler

Ta'rif

Cluster Autoscaler - bu node'lar sonini avtomatik o'zgartirish.

Scaling Logic:

Pod'lar schedule qila olmaslik → Yangi node qo'shish
Node'lar kam ishlatilayotgan → Node'larni o'chirish

Architecture

Pending Pods → Cluster Autoscaler → Cloud Provider API → Node yaratish/o'chirish

Cloud Provider Support

  • AWS (EKS, ASG)
  • GCP (GKE, MIG)
  • Azure (AKS, VMSS)
  • Alibaba Cloud
  • DigitalOcean

Installation (AWS)

IAM Policy:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeLaunchTemplateVersions"
],
"Resource": "*"
}
]
}

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
serviceAccountName: cluster-autoscaler
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.28.0
name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --cloud-provider=aws
- --namespace=kube-system
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/<cluster-name>
- --balance-similar-node-groups
- --skip-nodes-with-system-pods=false

Scale Up

Trigger: Pod pending (schedule qila olmaslik)

Jarayon:

  1. Pending pod'ni topadi
  2. Qaysi node group mos kelishini aniqlaydi
  3. Cloud provider'ga yangi node so'raydi
  4. Node ready bo'lsa, pod schedule qilinadi

Time: 3-5 minut (cloud provider'ga bog'liq)

Scale Down

Trigger: Node kam ishlatilayotgan

Shart:

  • Node CPU < 50% (default)
  • Node Memory < 50%
  • Node'dagi barcha pod'larni boshqa node'ga ko'chirish mumkin

Jarayon:

  1. Candidate node'larni topadi
  2. Pod'larni boshqa node'larga drain qiladi
  3. Node'ni o'chiradi

Time: 10 minut (default)

Node Affinity va Anti-affinity

Prevent Scale Down:

apiVersion: v1
kind: Pod
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
spec:
containers:
- name: app
image: myapp:1.0

Cluster Autoscaler Status

kubectl logs -n kube-system deployment/cluster-autoscaler

Resource Quotas

Ta'rif

ResourceQuota - bu namespace uchun resource limit.

Maqsad: Multi-tenant cluster'da resource'larni fair taqsimlash.

ResourceQuota YAML

apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: development
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
pods: "50"
services: "10"
persistentvolumeclaims: "5"
requests.storage: "100Gi"

Quota Scopes

Scope Types:

  • BestEffort: BestEffort pod'lar
  • NotBestEffort: Request/limit'li pod'lar
  • Terminating: Active deadline'li pod'lar
  • NotTerminating: Active deadline'siz pod'lar

Misol:

spec:
hard:
pods: "10"
scopes:
- BestEffort

Priority Class Quota

apiVersion: v1
kind: ResourceQuota
metadata:
name: high-priority-quota
namespace: production
spec:
hard:
pods: "10"
requests.cpu: "5"
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["high"]

Status

kubectl describe resourcequota -n development

Output:

Name:            compute-quota
Namespace: development
Resource Used Hard
-------- ---- ----
limits.cpu 5 20
limits.memory 10Gi 40Gi
pods 12 50
requests.cpu 3 10
requests.memory 6Gi 20Gi

LimitRange

Ta'rif

LimitRange - bu namespace'da container resource'lari uchun default va constraint'lar.

LimitRange YAML

apiVersion: v1
kind: LimitRange
metadata:
name: limit-range
namespace: development
spec:
limits:
- type: Pod
max:
cpu: "4"
memory: "8Gi"
min:
cpu: "100m"
memory: "128Mi"
- type: Container
default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "250m"
memory: "256Mi"
max:
cpu: "2"
memory: "2Gi"
min:
cpu: "100m"
memory: "128Mi"
maxLimitRequestRatio:
cpu: "4"
memory: "4"
- type: PersistentVolumeClaim
max:
storage: "10Gi"
min:
storage: "1Gi"

Limit Types

1. Container

- type: Container
default: # Limit default
cpu: "500m"
memory: "512Mi"
defaultRequest: # Request default
cpu: "250m"
memory: "256Mi"
max: # Maximum
cpu: "2"
memory: "2Gi"
min: # Minimum
cpu: "100m"
memory: "128Mi"

2. Pod

- type: Pod
max:
cpu: "4" # Pod jami CPU
memory: "8Gi" # Pod jami Memory

3. PersistentVolumeClaim

- type: PersistentVolumeClaim
max:
storage: "10Gi"
min:
storage: "1Gi"

Max Limit Request Ratio

maxLimitRequestRatio:
cpu: "4"
memory: "4"

Ma'nosi: Limit maksimal 4x request bo'lishi mumkin.

Misol:

# ✅ Valid
requests:
cpu: "250m"
limits:
cpu: "1" # 4x

# ❌ Invalid
requests:
cpu: "250m"
limits:
cpu: "2" # 8x

Best Practices

1. Resource Requests va Limits

Always set requests:

resources:
requests:
cpu: "250m"
memory: "256Mi"

Set limits cautiously:

  • CPU limit: Throttling (usually OK)
  • Memory limit: OOMKilled (problematic)

Recommendation:

resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1" # Set CPU limit
# memory limit'ni ko'pincha qo'ymang (yoki juda katta qo'ying)

2. Right-sizing

Use VPA recommendations:

kubectl describe vpa myapp-vpa

Monitor actual usage:

kubectl top pods

Iterate:

  1. Start conservative
  2. Monitor
  3. Adjust
  4. Repeat

3. QoS Classes

Production critical apps → Guaranteed:

resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "512Mi"

Batch jobs → Burstable:

resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "2"
memory: "2Gi"

4. HPA Configuration

CPU target: 70-80% (not 100%!) Scaling behavior: Aggressive scale up, conservative scale down Min replicas: Hech qachon 1 emas (HA)

minReplicas: 2
maxReplicas: 10
behavior:
scaleUp:
stabilizationWindowSeconds: 0
scaleDown:
stabilizationWindowSeconds: 300

5. Resource Quotas

Per namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: team-a
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"

6. Monitoring

Key metrics:

  • Resource utilization (CPU, Memory)
  • HPA events
  • Pod evictions
  • OOMKilled events

Alerts:

  • High CPU/Memory usage
  • Frequent scaling
  • OOMKilled

Xulosa

Kubernetes Auto-scaling va Resource Management:

Resource Requests/Limits: Pod-level control ✅ HPA: Horizontal scaling (pod count) ✅ VPA: Vertical scaling (resource size) ✅ Cluster Autoscaler: Node scaling ✅ Resource Quotas: Namespace limits ✅ LimitRange: Default values