Auto-Scaling vs Resource Management
Mundarija
- Resource Management Asoslari
- Resource Requests va Limits
- Horizontal Pod Autoscaler (HPA)
- Vertical Pod Autoscaler (VPA)
- Cluster Autoscaler
- Resource Quotas
- LimitRange
- Best Practices
Resource Management Asoslari
Muammo
Kubernetes cluster limited resource'larga ega:
- CPU
- Memory
- Storage
- Network bandwidth
Muammolar:
- Bitta pod barcha resource'ni iste'mol qilsa?
- Resource'lar kam bo'lsa nima qilish kerak?
- Qanday qilib resource'larni optimal ishlatish mumkin?
Yechim: Resource Management
Kubernetes 3 ta mexanizmni taqdim etadi:
- Resource Requests/Limits: Pod-level resource control
- Resource Quotas: Namespace-level limits
- LimitRange: Default values va constraints
Resource Types
Compressible Resources:
- CPU
- Network bandwidth
- Throttle qilinishi mumkin, pod crash bo'lmaydi
Incompressible Resources:
- Memory
- Storage
- Limit oshsa, pod OOMKilled (Out of Memory Killed)
Resource Requests va Limits
Ta'rif
Request: Pod ishlashi uchun minimal kerakli resource Limit: Pod maksimal iste'mol qilishi mumkin bo'lgan resource
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
CPU
Units:
1= 1 CPU core (AWS vCPU, Azure vCore)1000m= 1 core (m= milli-CPU)0.5=500m= yarim core0.1=100m= 1/10 core
Misol:
resources:
requests:
cpu: "250m" # 0.25 core minimum
limits:
cpu: "1" # 1 core maximum
CPU Throttling: Limit oshsa, container throttle qilinadi (slow down), lekin kill qilinmaydi.
Memory
Units:
Ki= Kibibyte (1024 bytes)Mi= Mebibyte (1024 Ki)Gi= Gibibyte (1024 Mi)Ti= Tebibyte (1024 Gi)
yoki
K= Kilobyte (1000 bytes)M= Megabyte (1000 K)G= Gigabyte (1000 M)T= Terabyte (1000 G)
Misol:
resources:
requests:
memory: "128Mi" # 128 MiB minimum
limits:
memory: "256Mi" # 256 MiB maximum
OOMKilled:
Limit oshsa, container OOMKilled (Out of Memory Killed) bo'ladi va restart qilinadi.
QoS (Quality of Service) Classes
Kubernetes pod'larni 3 ta QoS classga ajratadi:
1. Guaranteed (Eng yuqori prioritet)
Sharti:
- Barcha container'larda request = limit
- CPU va memory ikkisi ham belgilangan
spec:
containers:
- name: app
resources:
requests:
memory: "200Mi"
cpu: "500m"
limits:
memory: "200Mi" # request bilan teng
cpu: "500m" # request bilan teng
Xususiyat:
- Eng oxirida evict qilinadi
- Guaranteed resource
2. Burstable (O'rta prioritet)
Sharti:
- Request belgilangan, lekin limit yo'q yoki request != limit
spec:
containers:
- name: app
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi" # request'dan katta
cpu: "1" # request'dan katta
Xususiyat:
- Limit'gacha burst qilishi mumkin
- Resource pressure'da evict qilinishi mumkin
3. BestEffort (Eng past prioritet)
Sharti:
- Request va limit yo'q
spec:
containers:
- name: app
resources: {} # Request va limit yo'q
Xususiyat:
- Resource pressure'da birinchi bo'lib evict qilinadi
- Barcha available resource'ni ishlata oladi
Scheduling
Scheduler request'larga qarab pod'ni joylashtiradi:
Node Capacity:
CPU: 4 cores
Memory: 8Gi
Running Pods:
Pod-A: CPU request 1 core, Memory 2Gi
Pod-B: CPU request 0.5 core, Memory 1Gi
Total: CPU 1.5 cores, Memory 3Gi
New Pod:
resources:
requests:
cpu: "3"
memory: "6Gi"
Javob: ❌ Joylashtirib bo'lmaydi (CPU: 4 - 1.5 = 2.5 cores qoldi, 3 core kerak)
Node'da joy bormi?
kubectl describe node <node-name>
Output:
Allocated resources:
CPU Requests: 1500m (37%)
CPU Limits: 3000m (75%)
Memory Requests: 3Gi (37%)
Memory Limits: 6Gi (75%)
Resource Pressure
Memory Pressure: Node'da memory kam bo'lsa, kubelet pod'larni evict qila boshlaydi:
- BestEffort pod'lar
- Burstable pod'lar (limit oshganlar)
- Burstable pod'lar (request ichida)
- Guaranteed pod'lar (faqat node'ni saqlab qolish uchun)
CPU Pressure: CPU compressible, shuning uchun eviction yo'q, throttling bo'ladi.
Horizontal Pod Autoscaler (HPA)
Ta'rif
HPA - bu pod'lar sonini avtomatik o'zgartirish (horizontal scaling).
Scaling Logic:
Load oshadi → HPA yangi pod'lar yaratadi
Load kamayadi → HPA pod'larni o'chiradi
HPA v1 (CPU-based)
Basic HPA:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 80
Kubectl shorthand:
kubectl autoscale deployment myapp --min=2 --max=10 --cpu-percent=80
Algorithm:
Desired Replicas = ceil(Current Replicas × (Current Metric / Target Metric))
Misol:
Current Replicas: 3
Current CPU: 90%
Target CPU: 80%
Desired = ceil(3 × (90 / 80)) = ceil(3.375) = 4
HPA v2 (Advanced)
Multiple Metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
Metric Types
1. Resource Metrics
CPU va Memory:
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
Target Types:
Utilization: Percentage (%)AverageValue: Absolute value
2. Pods Metrics
Custom pod metrics:
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
3. Object Metrics
Kubernetes object metrics:
metrics:
- type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: myapp-ingress
target:
type: Value
value: "10k"
4. External Metrics
External system metrics:
metrics:
- type: External
external:
metric:
name: queue_messages_ready
selector:
matchLabels:
queue_name: myqueue
target:
type: AverageValue
averageValue: "30"
Custom Metrics
Prometheus Adapter:
O'rnatish:
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--set prometheus.url=http://prometheus-server.monitoring.svc
Configuration:
rules:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total"
as: "${1}_per_second"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
HPA:
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
Scaling Behavior
Scale Up:
behavior:
scaleUp:
stabilizationWindowSeconds: 0 # Tez scale up
policies:
- type: Percent
value: 100 # 100% oshirish (double)
periodSeconds: 15 # Har 15 soniyada
- type: Pods
value: 4 # Yoki maksimal 4 ta pod
periodSeconds: 15
selectPolicy: Max # Eng katta policyni tanlash
Scale Down:
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 5 minut kutish
policies:
- type: Percent
value: 50 # 50% kamaytirish
periodSeconds: 60 # Har daqiqada
- type: Pods
value: 2 # Yoki maksimal 2 ta pod
periodSeconds: 60
selectPolicy: Min # Eng kichik policyni tanlash
HPA Status
kubectl get hpa
Output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
myapp-hpa Deployment/myapp 45%/80% 2 10 3 5m
Describe:
kubectl describe hpa myapp-hpa
HPA Limitations
- Metrics Server kerak: CPU/Memory uchun
- Request belgilangan bo'lishi kerak: Resource metrics uchun
- Cold start: Yangi pod'lar ready bo'lgunga qadar vaqt ketadi
- Flapping: Tez-tez scale up/down oldini olish uchun stabilization kerak
Vertical Pod Autoscaler (VPA)
Ta'rif
VPA - bu pod'larning resource request/limit'larini avtomatik o'zgartirish.
HPA vs VPA:
| HPA | VPA |
|---|---|
| Pod soni | Resource size |
| Horizontal | Vertical |
| Stateless | Stateless va Stateful |
| Real-time | Pod restart kerak |
VPA Installation
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
VPA YAML
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: myapp-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "2"
memory: "2Gi"
controlledResources:
- cpu
- memory
Update Modes
1. Off
Tavsiyalar beradi, lekin o'zgartirmaydi:
updatePolicy:
updateMode: "Off"
Status:
kubectl describe vpa myapp-vpa
Output:
Recommendation:
Container Recommendations:
Container Name: app
Lower Bound:
Cpu: 250m
Memory: 256Mi
Target:
Cpu: 500m
Memory: 512Mi
Upper Bound:
Cpu: 1
Memory: 1Gi
2. Initial
Faqat yangi pod'larga qo'llaydi:
updatePolicy:
updateMode: "Initial"
Mavjud pod'lar o'zgarmaydi, yangi pod'lar tavsiya qilingan resource bilan yaratiladi.
3. Recreate
Pod'larni o'chiradi va yangi resource bilan yaratadi:
updatePolicy:
updateMode: "Recreate"
⚠️ Downtime! Pod restart qilinadi.
4. Auto
Avtomatik o'zgartiradi:
updatePolicy:
updateMode: "Auto"
Pod'larni evict qiladi va recommendation bilan qayta yaratadi.
Resource Policy
Min va Max:
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "4"
memory: "8Gi"
Controlled Resources:
controlledResources:
- cpu
- memory
Faqat belgilangan resource'larni boshqaradi.
VPA Recommendations
VPA 3 ta recommendation beradi:
- Lower Bound: Minimal recommend
- Target: Optimal recommend
- Upper Bound: Maksimal recommend
Algorithm: VPA tarixiy resource usage'ga qarab calculate qiladi (histogram).
HPA + VPA
HPA va VPA birga ishlatish:
Scenario 1: HPA faqat custom metrics:
# HPA - custom metrics
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
VPA - CPU/Memory:
# VPA - resource optimization
resourcePolicy:
containerPolicies:
- containerName: app
controlledResources:
- cpu
- memory
Scenario 2: HPA va VPA alohida workload'lar:
- Stateless → HPA
- Stateful → VPA
⚠️ Ehtiyot: HPA va VPA bir xil metric (CPU/Memory) bo'yicha bir xil deployment'da foydalanmang! Conflict bo'ladi.
Cluster Autoscaler
Ta'rif
Cluster Autoscaler - bu node'lar sonini avtomatik o'zgartirish.
Scaling Logic:
Pod'lar schedule qila olmaslik → Yangi node qo'shish
Node'lar kam ishlatilayotgan → Node'larni o'chirish
Architecture
Pending Pods → Cluster Autoscaler → Cloud Provider API → Node yaratish/o'chirish
Cloud Provider Support
- AWS (EKS, ASG)
- GCP (GKE, MIG)
- Azure (AKS, VMSS)
- Alibaba Cloud
- DigitalOcean
Installation (AWS)
IAM Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeLaunchTemplateVersions"
],
"Resource": "*"
}
]
}
Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
serviceAccountName: cluster-autoscaler
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.28.0
name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --cloud-provider=aws
- --namespace=kube-system
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/<cluster-name>
- --balance-similar-node-groups
- --skip-nodes-with-system-pods=false
Scale Up
Trigger: Pod pending (schedule qila olmaslik)
Jarayon:
- Pending pod'ni topadi
- Qaysi node group mos kelishini aniqlaydi
- Cloud provider'ga yangi node so'raydi
- Node ready bo'lsa, pod schedule qilinadi
Time: 3-5 minut (cloud provider'ga bog'liq)
Scale Down
Trigger: Node kam ishlatilayotgan
Shart:
- Node CPU < 50% (default)
- Node Memory < 50%
- Node'dagi barcha pod'larni boshqa node'ga ko'chirish mumkin
Jarayon:
- Candidate node'larni topadi
- Pod'larni boshqa node'larga drain qiladi
- Node'ni o'chiradi
Time: 10 minut (default)
Node Affinity va Anti-affinity
Prevent Scale Down:
apiVersion: v1
kind: Pod
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
spec:
containers:
- name: app
image: myapp:1.0
Cluster Autoscaler Status
kubectl logs -n kube-system deployment/cluster-autoscaler
Resource Quotas
Ta'rif
ResourceQuota - bu namespace uchun resource limit.
Maqsad: Multi-tenant cluster'da resource'larni fair taqsimlash.
ResourceQuota YAML
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: development
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
pods: "50"
services: "10"
persistentvolumeclaims: "5"
requests.storage: "100Gi"
Quota Scopes
Scope Types:
BestEffort: BestEffort pod'larNotBestEffort: Request/limit'li pod'larTerminating: Active deadline'li pod'larNotTerminating: Active deadline'siz pod'lar
Misol:
spec:
hard:
pods: "10"
scopes:
- BestEffort
Priority Class Quota
apiVersion: v1
kind: ResourceQuota
metadata:
name: high-priority-quota
namespace: production
spec:
hard:
pods: "10"
requests.cpu: "5"
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["high"]
Status
kubectl describe resourcequota -n development
Output:
Name: compute-quota
Namespace: development
Resource Used Hard
-------- ---- ----
limits.cpu 5 20
limits.memory 10Gi 40Gi
pods 12 50
requests.cpu 3 10
requests.memory 6Gi 20Gi
LimitRange
Ta'rif
LimitRange - bu namespace'da container resource'lari uchun default va constraint'lar.
LimitRange YAML
apiVersion: v1
kind: LimitRange
metadata:
name: limit-range
namespace: development
spec:
limits:
- type: Pod
max:
cpu: "4"
memory: "8Gi"
min:
cpu: "100m"
memory: "128Mi"
- type: Container
default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "250m"
memory: "256Mi"
max:
cpu: "2"
memory: "2Gi"
min:
cpu: "100m"
memory: "128Mi"
maxLimitRequestRatio:
cpu: "4"
memory: "4"
- type: PersistentVolumeClaim
max:
storage: "10Gi"
min:
storage: "1Gi"
Limit Types
1. Container
- type: Container
default: # Limit default
cpu: "500m"
memory: "512Mi"
defaultRequest: # Request default
cpu: "250m"
memory: "256Mi"
max: # Maximum
cpu: "2"
memory: "2Gi"
min: # Minimum
cpu: "100m"
memory: "128Mi"
2. Pod
- type: Pod
max:
cpu: "4" # Pod jami CPU
memory: "8Gi" # Pod jami Memory
3. PersistentVolumeClaim
- type: PersistentVolumeClaim
max:
storage: "10Gi"
min:
storage: "1Gi"
Max Limit Request Ratio
maxLimitRequestRatio:
cpu: "4"
memory: "4"
Ma'nosi: Limit maksimal 4x request bo'lishi mumkin.
Misol:
# ✅ Valid
requests:
cpu: "250m"
limits:
cpu: "1" # 4x
# ❌ Invalid
requests:
cpu: "250m"
limits:
cpu: "2" # 8x
Best Practices
1. Resource Requests va Limits
Always set requests:
resources:
requests:
cpu: "250m"
memory: "256Mi"
Set limits cautiously:
- CPU limit: Throttling (usually OK)
- Memory limit: OOMKilled (problematic)
Recommendation:
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1" # Set CPU limit
# memory limit'ni ko'pincha qo'ymang (yoki juda katta qo'ying)
2. Right-sizing
Use VPA recommendations:
kubectl describe vpa myapp-vpa
Monitor actual usage:
kubectl top pods
Iterate:
- Start conservative
- Monitor
- Adjust
- Repeat
3. QoS Classes
Production critical apps → Guaranteed:
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "512Mi"
Batch jobs → Burstable:
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "2"
memory: "2Gi"
4. HPA Configuration
CPU target: 70-80% (not 100%!) Scaling behavior: Aggressive scale up, conservative scale down Min replicas: Hech qachon 1 emas (HA)
minReplicas: 2
maxReplicas: 10
behavior:
scaleUp:
stabilizationWindowSeconds: 0
scaleDown:
stabilizationWindowSeconds: 300
5. Resource Quotas
Per namespace:
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: team-a
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
6. Monitoring
Key metrics:
- Resource utilization (CPU, Memory)
- HPA events
- Pod evictions
- OOMKilled events
Alerts:
- High CPU/Memory usage
- Frequent scaling
- OOMKilled
Xulosa
Kubernetes Auto-scaling va Resource Management:
✅ Resource Requests/Limits: Pod-level control ✅ HPA: Horizontal scaling (pod count) ✅ VPA: Vertical scaling (resource size) ✅ Cluster Autoscaler: Node scaling ✅ Resource Quotas: Namespace limits ✅ LimitRange: Default values