Autoscaling
Requirements for autoscaling
You can use either Kubernetes HPA (Horizontal Pod Autoscaler) or KEDA (Kubernetes Event-Driven Autoscaler) for autoscaling. Both rely on the same metrics but have different requirements.
- HPA: Kubernetes built-in, read metrics from Metrics server or external metrics pushed by Prometheus adapter. A bit less responsive than KEDA. Cannot scale under 1 replica.
- KEDA: Read events from variety of sources (here Prometheus or Cloudwatch for AWS). Faster scaling. Can scale to 0 replica.
Kubernetes HPA
Install Prometheus adapter, it is required to expose external metrics in Kubernetes. You can install it using Helm. Ensure it is configured to connect to your Prometheus server.
KEDA (Helm only)
As an alternative of Prometheus adapter, you can use the KEDA controller to enable autoscaling. You can install it using Helm.
You must configure your Helm values to allow KEDA to connect to your Prometheus Server:
autoscaling:
keda:
prometheus:
metadata:
serverAddress: http://<prometheus-host>:9090
# Optional. Custom headers to include in query
customHeaders: X-Client-Id=cid,X-Tenant-Id=tid,X-Organization-Id=oid
# Optional. Specify authentication mode (basic, bearer, tls)
authModes: bearer
# Optional. Specify TriggerAuthentication resource to use when authModes is specified.
authenticationRef:
name: keda-prom-creds
A ScaledObject
and an hpa
will be created in the GitGuardian namespace.
Autoscaling workers
Autoscaling allows for dynamic scaling of worker pods based on Celery task queue length as an external metric for scaling decisions, improving efficiency and performance while optimizing resource costs.
To enable autoscaling based on Celery queue lengths, you need first to enable application metrics following this guide.
If you use KEDA, configuring the Prometheus adapter is not necessary.
Prometheus adapter configuration
Configure Prometheus adapter to expose Celery queue lengths as external metrics. This is done by setting up a custom rule in the Prometheus Adapter configuration.
The following rule should be added to your Prometheus Adapter Helm values to expose Celery queue lengths:
rules:
external:
- seriesQuery: '{__name__="gim_celery_queue_length",queue_name!=""}'
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (queue_name)
resources:
namespaced: true
overrides:
namespace:
resource: namespace
If you use Machine Learning, you will also need this rule:
rules:
external:
- seriesQuery: '{__name__="bentoml_service_request_in_progress",exported_endpoint!=""}'
resources:
namespaced: false
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
Autoscaling Behavior
The following behavior will be applied:
- Scaling Up: If the length of a Celery queue exceeds 10 tasks per current worker replica, the number of replicas will be increased, provided the current number of replicas is below the specified maximum limit.
- Scaling Down: If the number of tasks per current worker replica remains below 10 for a continuous period of 5 minutes, the number of replicas will be decreased, provided the current number of replicas is above the specified minimum limit.
Using KEDA, when the Celery queue is empty, the worker will transition to an idle state, resulting in the number of replicas being scaled down to zero.
Autoscaling APIs
You must first enable ingress routes before proceeding with the following steps.
Autoscaling allows for dynamic scaling of API pods based on API response time as an external metric for scaling decisions, improving efficiency and performance while optimizing resource costs. This can be useful if the Gitguardian API is highly requested for a non-representative time period like historical scans or large number of pre-commits hooks at the end of the day.
Enabling autoscaling based on response time depends on the Ingress Controller you are using. We currently support: ingress-nginx, traefik, istio ingress, openshift/haproxy, contour and AWS Application Load Balancer. We will guide you through the configuration for all these controllers.
If you use KEDA, configuring the Prometheus adapter is not necessary.
ingress-nginx
(HPA only) The following rule should be added to your Prometheus Adapter Helm values to expose response times:
rules:
external:
- seriesQuery: '{__name__="nginx_ingress_controller_request_duration_seconds_bucket",exported_service!="",le!=""}'
resources:
namespaced: false
metricsQuery: "histogram_quantile(0.95, sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (exported_service, le))*1000"
traefik
(HPA only) The following rule should be added to your Prometheus Adapter Helm values to expose response times:
rules:
external:
- seriesQuery: '{__name__="traefik_service_request_duration_seconds_bucket",service!="",le!=""}'
resources:
namespaced: false
metricsQuery: "histogram_quantile(0.95, sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (service, le))*1000"
(HPA and KEDA) You will also need to add this configuration to prometheus:
additionalScrapeConfigs:
- job_name: 'traefik'
scrape_interval: 30s
kubernetes_sd_configs:
- role: service
relabel_configs:
# Select the namespace and service name of Traefik
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
action: keep
regex: ingress;traefik-metrics
# Extract the target port
- source_labels: [__meta_kubernetes_service_port_name]
action: keep
regex: metrics
metric_relabel_configs:
# only keep the service name, remove pre namespace and post port
- source_labels: [service]
regex: "^(.*)@kubernetes$"
target_label: service
replacement: "$2"
- source_labels: [instance]
regex: "^(.*)\\..+$"
target_label: instance
replacement: "$1"
Istio
(HPA only) The following rule should be added to your Prometheus Adapter Helm values to expose response times:
rules:
external:
- seriesQuery: '{__name__="istio_request_duration_milliseconds_bucket",destination_service_name!="",le!=""}'
resources:
namespaced: false
metricsQuery: "histogram_quantile(0.95, sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (destination_service_name, le))"
Contour
(HPA only) The following rule should be added to your Prometheus Adapter Helm values to expose response times:
rules:
external:
- seriesQuery: '{__name__="envoy_cluster_upstream_rq_time_bucket",envoy_cluster_name!="",le!=""}'
resources:
namespaced: false
metricsQuery: "histogram_quantile(0.95, sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (envoy_cluster_name, le))"
Openshift
This metric has been tested on OKD. It uses the Custom Metrics Autoscaler, based on KEDA, available in the Operator Hub, we recommend using it. You may find help on how to authenticate with the OKD-included monitoring stack by creating the TriggerAuthentication
object in this documentation.
Otherwise, in case you want to configure the HPA manually using your own prometheus, the following rule should be added to your Prometheus Adapter Helm values to expose response times:
rules:
external:
- seriesQuery: '{__name__="haproxy_server_http_average_response_latency_milliseconds_internal_api",exported_service!=""}'
resources:
namespaced: false
metricsQuery: "avg by (exported_service) (quantile_over_time(0.95, haproxy_server_http_average_response_latency_milliseconds{exported_service="internal-api|public-api|internal-api-long|hook"}[2m]))"
AWS
We only support KEDA for AWS (as k8s-cloudwatch-adapter has been archived).
There are few additional steps to configure autoscaling on AWS:
As a best practice, we are going to use IRSA to authenticate against the AWS API. This method requires OIDC on your EKS cluster.
- Create a new IAM Role in your AWS account:
- Trust relationship (replace ACCOUNT_ID, REGION AND OIDC_ID with your EKS parameters):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/oidc.eks.REGION.amazonaws.com/id/OIDC_ID"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.REGION.amazonaws.com/id/OIDC_ID:sub": "system:serviceaccount:keda:keda-operator",
"oidc.eks.REGION.amazonaws.com/id/OIDC_ID:aud": "sts.amazonaws.com"
}
}
}
]
}
- Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowCloudWatchGetMetricData",
"Effect": "Allow",
"Action": "cloudwatch:GetMetricData",
"Resource": "*"
}
]
}
- Annotate the keda/keda-operator service account (replace ACCOUNT_ID and ROLE_NAME) with
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/ROLE_NAME
This will allow KEDA to assume the Role and hence fetch the Cloudwatch metrics of the AWS ALB.
- Create a KEDA TriggerAuthentication object to tell KEDA we are going to use AWS authentication (replace YOUR_GITGUARDIAN_NAMESPACE)
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: cw
namespace: YOUR_GITGUARDIAN_NAMESPACE
spec:
podIdentity:
provider: aws
identityOwner: keda
- Configure the KEDA trigger manually (here for internal-api)
You need to update dimensionValue and awsRegion
webapps:
internal_api:
autoscaling:
keda:
enabled: true
triggers:
- type: aws-cloudwatch
metadata:
namespace: AWS/ApplicationELB
dimensionName: LoadBalancer;TargetGroup
metricEndTimeOffset: "0" # default 0
metricStatPeriod: "300" # default 300
metricStat: "Average" # default "Average"
metricCollectionTime: "300" # default 300
minMetricValue: "0"
targetMetricValue: "1"
metricName: TargetResponseTime
#Parameters to update accordingly
# Here you need to indicate the Load balancer identifier and the targetgroup related to your Load balancer
dimensionValue: app/k8s-gimingresses-df6dc6292a/77eb76ffd4a58dd4;targetgroup/k8s-gautierg-internal-3060d734ea/0ddc856755effd7a
# Here indicate the region we lie in
awsRegion: "eu-west-3"
authenticationRef:
name: cw # this is the TriggerAuthentication object we've created earlier
Autoscaling Behavior
The following behavior will be applied:
- Scaling Up: If most requests to a specific API take more than 1000ms to answer, the number of replicas will be increased, provided the current number of replicas is below the specified maximum limit.
- Scaling Down: If most requests to a specific API take less than 1000ms to answer for a continuous period of 5 minutes, the number of replicas will be decreased, provided the current number of replicas is above the specified minimum limit.
KOTS-based installation
Autoscaling APIs is not supported on KOTS-based installations.
Navigate under Config > Scaling in the KOTS Admin Console, you will have access to the worker scaling options.
For each worker, you can enable autoscaling by ticking the option Enable Horizontal Pod Autoscaling
, then you will be able to specify the minimum and the maximum replicas.
Helm-based installation
Customize Helm applications using your local-values.yaml
file, submitted with the helm
command.
You can enable workers/apis autoscaling by setting the following Helm values (here, we enable HPA for both "worker" worker and "public-api" api):
celeryWorkers:
worker:
autoscaling:
hpa:
enabled: true
keda:
enabled: false
minReplicas: 1
maxReplicas: 10
webapps:
public-api:
autoscaling:
hpa:
enabled: true
keda:
enabled: false
minReplicas: 1
maxReplicas: 10
You can enable Machine Learning Secret Engine autoscaling by setting the following Helm values:
secretEngine:
autoscaling:
hpa:
enabled: true
keda:
enabled: false
minReplicas: 1
maxReplicas: 2
See the values reference documentation for further details.
autoscaling.hpa.enabled
and autoscaling.keda.enabled
Helm parameters are mutualy exclusive, you must choose between hpa (using Prometheus adapter) and KEDA controller.