Arquitetura softwareSeguranca
Keycloak em Alta Disponibilidade
Keycloak em produção requer topologia HA para garantir disponibilidade de autenticação — um componente crítico de toda a arquitetura. Este guia cobre o clustering, banco de dados HA, Helm e monitoramento.
Topologia HA
┌─────────────────────┐
│ Load Balancer │
│ (nginx / ALB) │
└──────┬──────┬───────┘
│ │
┌───────────▼──┐ ┌─▼──────────────┐
│ Keycloak #1 │ │ Keycloak #2 │
│ (Pod/VM) │ │ (Pod/VM) │
└──────┬───────┘ └───────┬──────────┘
│ │
└────────┬────────┘
│ JGroups (cluster)
┌──────────▼──────────┐
│ Infinispan │
│ (cache distribuído) │
│ sessions / tokens │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ PostgreSQL │
│ (Primary + Replica) │
└─────────────────────┘Componentes:
| Componente | Função |
|---|---|
| Load Balancer | Distribui requisições, sticky sessions opcionais |
| Keycloak nodes | Stateless exceto pelo cache Infinispan |
| Infinispan | Cache distribuído: sessões de usuário, tokens, user sessions |
| PostgreSQL | Estado persistente: realms, clients, users, roles |
Infinispan e JGroups
O Keycloak usa Infinispan para cache distribuído. Em Kubernetes, o JGroups descobre nós via DNS (headless service).
cache-ispn.xml para Kubernetes
<?xml version="1.0" encoding="UTF-8"?>
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:14.0 http://www.infinispan.org/schemas/infinispan-config-14.0.xsd"
xmlns="urn:infinispan:config:14.0">
<jgroups>
<stack name="my-kubernetes" extends="kubernetes">
<DNS_PING
dns_query="keycloak-headless.default.svc.cluster.local"
dns_record_type="A"
stack.combine="REPLACE"
stack.position="MPING" />
</stack>
</jgroups>
<cache-container name="keycloak" statistics="true">
<transport cluster="keycloak-cluster" stack="my-kubernetes" />
<!-- Sessões de usuário — replicadas em todos os nós -->
<replicated-cache name="work" statistics="true">
<transaction locking="PESSIMISTIC" mode="NON_XA" />
<expiration max-idle="0" />
</replicated-cache>
<!-- Tokens e sessões de autenticação -->
<distributed-cache name="sessions" owners="2" statistics="true">
<transaction locking="PESSIMISTIC" mode="NON_XA" />
<expiration max-idle="0" />
</distributed-cache>
<distributed-cache name="clientSessions" owners="2" statistics="true">
<transaction locking="PESSIMISTIC" mode="NON_XA" />
<expiration max-idle="0" />
</distributed-cache>
<distributed-cache name="offlineSessions" owners="2" statistics="true">
<transaction locking="PESSIMISTIC" mode="NON_XA" />
<expiration max-idle="0" />
</distributed-cache>
<local-cache name="realms" statistics="true">
<encoding media-type="application/x-java-serialized-object" />
</local-cache>
<local-cache name="users" statistics="true">
<encoding media-type="application/x-java-serialized-object" />
<eviction max-entries="10000" strategy="REMOVE" />
</local-cache>
</cache-container>
</infinispan>PostgreSQL HA
PgBouncer (connection pooling)
# pgbouncer.ini
[databases]
keycloak = host=postgres-primary port=5432 dbname=keycloak
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 100
reserve_pool_size = 10
log_connections = 0
log_disconnections = 0
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
listen_port = 5432
listen_addr = *
admin_users = pgbouncerRDS Multi-AZ (AWS)
# Terraform
resource "aws_db_instance" "keycloak" {
engine = "postgres"
engine_version = "15.4"
instance_class = "db.t3.medium"
allocated_storage = 100
storage_encrypted = true
multi_az = true # standby automático em outra AZ
backup_retention_period = 7
db_name = "keycloak"
username = "keycloak"
password = var.db_password
vpc_security_group_ids = [aws_security_group.rds.id]
db_subnet_group_name = aws_db_subnet_group.keycloak.name
tags = {
Name = "keycloak-rds"
}
}Helm Values para HA
# values-ha.yaml
replicaCount: 3
auth:
adminUser: admin
adminPassword: ${ADMIN_PASSWORD}
extraEnvVars:
- name: KC_CACHE
value: ispn
- name: KC_CACHE_STACK
value: kubernetes
- name: KC_CACHE_CONFIG_FILE
value: cache-ispn.xml
- name: KC_METRICS_ENABLED
value: "true"
- name: KC_HEALTH_ENABLED
value: "true"
- name: KC_PROXY
value: edge
- name: JAVA_OPTS_APPEND
value: "-Djgroups.dns.query={{ include \"keycloak.fullname\" . }}-headless"
service:
type: ClusterIP
ingress:
enabled: true
hostname: auth.example.com
annotations:
nginx.ingress.kubernetes.io/proxy-buffer-size: "128k"
nginx.ingress.kubernetes.io/affinity: "cookie"
nginx.ingress.kubernetes.io/session-cookie-name: "KEYCLOAK_SESSION"
postgresql:
enabled: false # usar PostgreSQL externo em prod
externalDatabase:
host: pgbouncer.default.svc.cluster.local
port: 5432
database: keycloak
user: keycloak
password: ${DB_PASSWORD}
# PodDisruptionBudget — manter mínimo 2 pods durante rollout
podDisruptionBudget:
create: true
minAvailable: 2
# Horizontal Pod Autoscaler
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPU: 70
targetMemory: 80
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi
# Anti-affinity — nós em diferentes Workers
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app.kubernetes.io/name: keycloakHealth Checks e Readiness Probes
# Kubernetes Deployment — probes
livenessProbe:
httpGet:
path: /health/live
port: 9000
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 9000
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
startupProbe:
httpGet:
path: /health/started
port: 9000
initialDelaySeconds: 60
periodSeconds: 5
failureThreshold: 30 # aguarda até 150s na inicializaçãoEndpoints disponíveis (porta 9000):
| Endpoint | Descrição |
|---|---|
/health | Status geral |
/health/live | Liveness — processo respondendo |
/health/ready | Readiness — pronto para receber tráfego (cluster joined) |
/health/started | Started — bootstrap concluído |
/metrics | Métricas Prometheus |
Monitoramento: Prometheus + Grafana
# ServiceMonitor para Prometheus Operator
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: keycloak
namespace: monitoring
spec:
selector:
matchLabels:
app.kubernetes.io/name: keycloak
endpoints:
- port: http
path: /metrics
interval: 30sMétricas importantes:
| Métrica | Descrição |
|---|---|
keycloak_logins_total | Total de logins (por realm/provider) |
keycloak_login_errors_total | Erros de login |
keycloak_registrations_total | Registros de usuário |
keycloak_sessions | Sessões ativas |
keycloak_request_duration_seconds | Latência das requisições |
jvm_memory_used_bytes | Uso de memória JVM |
# Grafana Dashboard — alertas essenciais
groups:
- name: keycloak
rules:
- alert: KeycloakHighLoginErrorRate
expr: |
rate(keycloak_login_errors_total[5m]) /
rate(keycloak_logins_total[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "Taxa de erro de login > 10%"
- alert: KeycloakPodDown
expr: kube_deployment_status_replicas_available{deployment="keycloak"} < 2
for: 1m
labels:
severity: critical
annotations:
summary: "Keycloak com menos de 2 réplicas disponíveis"Backup e Restore de Realm
Export via CLI
# Export de realm específico (parar serviço primeiro em dev, ou usar export em prod)
kubectl exec -it keycloak-0 -- \
/opt/keycloak/bin/kc.sh export \
--dir /tmp/realm-export \
--realm myrealm \
--users realm_file
# Copiar para fora do pod
kubectl cp keycloak-0:/tmp/realm-export/myrealm-realm.json \
./backups/myrealm-$(date +%Y%m%d).jsonImport via API Admin
# Criar realm a partir do backup
curl -s -X POST \
"https://auth.example.com/admin/realms" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d @./backups/myrealm-20260101.jsonExport automático via CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: keycloak-backup
spec:
schedule: "0 2 * * *" # diário às 02:00
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: quay.io/keycloak/keycloak:23.0
command:
- /bin/sh
- -c
- |
/opt/keycloak/bin/kc.sh export \
--dir /backup \
--realm myrealm \
--users realm_file
env:
- name: KC_DB
value: postgres
- name: KC_DB_URL
valueFrom:
secretKeyRef:
name: keycloak-db
key: url
volumeMounts:
- name: backup-storage
mountPath: /backup
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: keycloak-backups
restartPolicy: OnFailure