Autoscaling

info

This page only concerns installation on an existing cluster using KOTS or Helm.

Requirements for autoscaling

You can use either Kubernetes HPA (Horizontal Pod Autoscaler) or KEDA (Kubernetes Event-Driven Autoscaler) for autoscaling. Both rely on the same metrics but have different requirements.

HPA: Kubernetes built-in, read metrics from Metrics server or external metrics pushed by Prometheus adapter. A bit less responsive than KEDA. Cannot scale under 1 replica.
KEDA: Read events from variety of sources (here Prometheus or Cloudwatch for AWS). Faster scaling. Can scale to 0 replica.

Kubernetes HPA

Install Prometheus adapter, it is required to expose external metrics in Kubernetes. You can install it using Helm. Ensure it is configured to connect to your Prometheus server.

KEDA (Helm only)

As an alternative of Prometheus adapter, you can use the KEDA controller to enable autoscaling. You can install it using Helm.

You must configure your Helm values to allow KEDA to connect to your Prometheus Server:

autoscaling:
  keda:
    prometheus:
      metadata:
        serverAddress: http://<prometheus-host>:9090
        # Optional. Custom headers to include in query
        customHeaders: X-Client-Id=cid,X-Tenant-Id=tid,X-Organization-Id=oid
        # Optional. Specify authentication mode (basic, bearer, tls)
        authModes: bearer
      # Optional. Specify TriggerAuthentication resource to use when authModes is specified.
      authenticationRef:
        name: keda-prom-creds

A ScaledObject and an hpa will be created in the GitGuardian namespace.

Autoscaling workers

Autoscaling allows for dynamic scaling of worker pods based on Celery task queue length as an external metric for scaling decisions, improving efficiency and performance while optimizing resource costs.

To enable autoscaling based on Celery queue lengths, you need first to enable application metrics following this guide.

If you use KEDA, configuring the Prometheus adapter is not necessary.

Prometheus adapter configuration

Configure Prometheus adapter to expose Celery queue lengths as external metrics. This is done by setting up a custom rule in the Prometheus Adapter configuration.

The following rule should be added to your Prometheus Adapter Helm values to expose Celery queue lengths:

rules:
  external:
    - seriesQuery: '{__name__="gim_celery_queue_length",queue_name!=""}'
      metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (queue_name)
      resources:
        namespaced: true
        overrides:
          namespace:
            resource: namespace

If you use Machine Learning, you will also need this rule:

rules:
  external:
    - seriesQuery: '{__name__="bentoml_service_request_in_progress",exported_endpoint!=""}'
      resources:
        namespaced: false
      metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)

Autoscaling Behavior

The following behavior will be applied:

Scaling Up: If the length of a Celery queue exceeds 10 tasks per current worker replica, the number of replicas will be increased, provided the current number of replicas is below the specified maximum limit.
Scaling Down: If the number of tasks per current worker replica remains below 10 for a continuous period of 5 minutes, the number of replicas will be decreased, provided the current number of replicas is above the specified minimum limit.

HPA behavior

info

Using KEDA, when the Celery queue is empty, the worker will transition to an idle state, resulting in the number of replicas being scaled down to zero.

Autoscaling APIs

caution

You must first enable ingress routes before proceeding with the following steps.

Autoscaling allows for dynamic scaling of API pods based on API response time as an external metric for scaling decisions, improving efficiency and performance while optimizing resource costs. This can be useful if the Gitguardian API is highly requested for a non-representative time period like historical scans or large number of pre-commits hooks at the end of the day.

Enabling autoscaling based on response time depends on the Ingress Controller you are using. We currently support: ingress-nginx, traefik, istio ingress, openshift/haproxy, contour and AWS Application Load Balancer. We will guide you through the configuration for all these controllers.

If you use KEDA, configuring the Prometheus adapter is not necessary.

ingress-nginx

(HPA only) The following rule should be added to your Prometheus Adapter Helm values to expose response times:

rules:
  external:
    - seriesQuery: '{__name__="nginx_ingress_controller_request_duration_seconds_bucket",exported_service!="",le!=""}'
      resources:
        namespaced: false
      metricsQuery: "histogram_quantile(0.95, sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (exported_service, le))*1000"

traefik

(HPA only) The following rule should be added to your Prometheus Adapter Helm values to expose response times:

rules:
  external:
    - seriesQuery: '{__name__="traefik_service_request_duration_seconds_bucket",service!="",le!=""}'
      resources:
        namespaced: false
      metricsQuery: "histogram_quantile(0.95, sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (service, le))*1000"

(HPA and KEDA) You will also need to add this configuration to prometheus:

additionalScrapeConfigs:
  - job_name: 'traefik'
    scrape_interval: 30s
    kubernetes_sd_configs:
      - role: service
    relabel_configs:
      # Select the namespace and service name of Traefik
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
        action: keep
        regex: ingress;traefik-metrics
      # Extract the target port
      - source_labels: [__meta_kubernetes_service_port_name]
        action: keep
        regex: metrics
    metric_relabel_configs:
      # only keep the service name, remove pre namespace and post port
      - source_labels: [service]
        regex: "^(.*)@kubernetes$"
        target_label: service
        replacement: "$2"
      - source_labels: [instance]
        regex: "^(.*)\\..+$"
        target_label: instance
        replacement: "$1"

Istio

(HPA only) The following rule should be added to your Prometheus Adapter Helm values to expose response times:

rules:
  external:
    - seriesQuery: '{__name__="istio_request_duration_milliseconds_bucket",destination_service_name!="",le!=""}'
      resources:
        namespaced: false
      metricsQuery: "histogram_quantile(0.95, sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (destination_service_name, le))"

Contour

(HPA only) The following rule should be added to your Prometheus Adapter Helm values to expose response times:

rules:
  external:
    - seriesQuery: '{__name__="envoy_cluster_upstream_rq_time_bucket",envoy_cluster_name!="",le!=""}'
      resources:
        namespaced: false
      metricsQuery: "histogram_quantile(0.95, sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (envoy_cluster_name, le))"

Openshift

This metric has been tested on OKD. It uses the Custom Metrics Autoscaler, based on KEDA, available in the Operator Hub, we recommend using it. You may find help on how to authenticate with the OKD-included monitoring stack by creating the TriggerAuthentication object in this documentation.

Otherwise, in case you want to configure the HPA manually using your own prometheus, the following rule should be added to your Prometheus Adapter Helm values to expose response times:

rules:
  external:
    - seriesQuery: '{__name__="haproxy_server_http_average_response_latency_milliseconds_internal_api",exported_service!=""}'
      resources:
        namespaced: false
      metricsQuery: "avg by (exported_service) (quantile_over_time(0.95, haproxy_server_http_average_response_latency_milliseconds{exported_service="internal-api|public-api|internal-api-long|hook"}[2m]))"

AWS

We only support KEDA for AWS (as k8s-cloudwatch-adapter has been archived).

There are few additional steps to configure autoscaling on AWS:

As a best practice, we are going to use IRSA to authenticate against the AWS API. This method requires OIDC on your EKS cluster.

Create a new IAM Role in your AWS account:

Trust relationship (replace ACCOUNT_ID, REGION AND OIDC_ID with your EKS parameters):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/oidc.eks.REGION.amazonaws.com/id/OIDC_ID"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
            "oidc.eks.REGION.amazonaws.com/id/OIDC_ID:sub": "system:serviceaccount:keda:keda-operator",
            "oidc.eks.REGION.amazonaws.com/id/OIDC_ID:aud": "sts.amazonaws.com"
        }
      }
    }
  ]
}

Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowCloudWatchGetMetricData",
      "Effect": "Allow",
      "Action": "cloudwatch:GetMetricData",
      "Resource": "*"
    }
  ]
}

Annotate the keda/keda-operator service account (replace ACCOUNT_ID and ROLE_NAME) with eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/ROLE_NAME

This will allow KEDA to assume the Role and hence fetch the Cloudwatch metrics of the AWS ALB.

Create a KEDA TriggerAuthentication object to tell KEDA we are going to use AWS authentication (replace YOUR_GITGUARDIAN_NAMESPACE)

---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: cw
  namespace: YOUR_GITGUARDIAN_NAMESPACE
spec:
  podIdentity:
      provider: aws
      identityOwner: keda

Configure the KEDA trigger manually (here for internal-api)

You need to update dimensionValue and awsRegion

webapps:
  internal_api:
    autoscaling:
      keda:
        enabled: true
        triggers:
          - type: aws-cloudwatch
            metadata:
              namespace: AWS/ApplicationELB
              dimensionName: LoadBalancer;TargetGroup
              metricEndTimeOffset: "0" # default 0
              metricStatPeriod: "300" # default 300
              metricStat: "Average" # default "Average"
              metricCollectionTime: "300" # default 300
              minMetricValue: "0"
              targetMetricValue: "1"
              metricName: TargetResponseTime

              #Parameters to update accordingly

              # Here you need to indicate the Load balancer identifier and the targetgroup related to your Load balancer
              dimensionValue: app/k8s-gimingresses-df6dc6292a/77eb76ffd4a58dd4;targetgroup/k8s-gautierg-internal-3060d734ea/0ddc856755effd7a

              # Here indicate the region we lie in
              awsRegion: "eu-west-3"

            authenticationRef:
              name: cw # this is the TriggerAuthentication object we've created earlier

Autoscaling Behavior

The following behavior will be applied:

Scaling Up: If most requests to a specific API take more than 1000ms to answer, the number of replicas will be increased, provided the current number of replicas is below the specified maximum limit.
Scaling Down: If most requests to a specific API take less than 1000ms to answer for a continuous period of 5 minutes, the number of replicas will be decreased, provided the current number of replicas is above the specified minimum limit.

KOTS-based installation

info

Autoscaling APIs is not supported on KOTS-based installations.

Navigate under Config > Scaling in the KOTS Admin Console, you will have access to the worker scaling options.

For each worker, you can enable autoscaling by ticking the option Enable Horizontal Pod Autoscaling, then you will be able to specify the minimum and the maximum replicas. Worker Autoscaling configuration

Helm-based installation

Customize Helm applications using your local-values.yaml file, submitted with the helm command.

Autoscaling APIs/Workers

You can enable workers/apis autoscaling by setting the following Helm values (here, we enable HPA for both "worker" worker and "public-api" api):

celeryWorkers:
  worker:
    autoscaling:
      hpa:
        enabled: true
      keda:
        enabled: false
      minReplicas: 1
      maxReplicas: 10
webapps:
  public-api:
    autoscaling:
      hpa:
        enabled: true
      keda:
        enabled: false
      minReplicas: 1
      maxReplicas: 10

Autoscaling the Machine Learning Secret Engine

For effective autoscaling of the Machine Learning Secret Engine, you must enable autoscaling for both:

The ML Worker processing the Celery queue (ml-api-priority): this worker is responsible for queuing and dispatching ML-related tasks. Without autoscaling, it could become a bottleneck, leading to delays in processing requests.
The Secret Engine handling the computation (secretEngine): enabling autoscaling for the Secret Engine ensures that it can scale in response to the demand for ML computations.

To enable autoscaling, configure the following Helm values:

# ML Secret Engine
secretEngine:
  autoscaling:
      hpa:
        enabled: true
      keda:
        enabled: false
      minReplicas: 1
      maxReplicas: 2

celeryWorkers:
  # ML Worker
  ml-api-priority:
    autoscaling:
      hpa:
        enabled: true
      keda:
        enabled: false
      minReplicas: 1
      maxReplicas: 2

See the values reference documentation for further details.

caution

autoscaling.hpa.enabled and autoscaling.keda.enabled Helm parameters are mutualy exclusive, you must choose between hpa (using Prometheus adapter) and KEDA controller.

Autoscaling

Requirements for autoscaling

Kubernetes HPA

KEDA (Helm only)

Autoscaling workers

Prometheus adapter configuration

Autoscaling Behavior

Autoscaling APIs

ingress-nginx

traefik

Istio

Contour

Openshift

AWS

Autoscaling Behavior

KOTS-based installation

Helm-based installation

Autoscaling APIs/Workers

Autoscaling the Machine Learning Secret Engine

Was this page helpful?

Something we didn’t cover?

See our Roadmap

Subscribe on GitHub

Submit a request

API status

Requirements for autoscaling​

Kubernetes HPA​

KEDA (Helm only)​

Autoscaling workers​

Prometheus adapter configuration​

Autoscaling Behavior​

Autoscaling APIs​

ingress-nginx​

traefik​

Istio​

Contour​

Openshift​

AWS​

Autoscaling Behavior​

KOTS-based installation​

Helm-based installation​

Autoscaling APIs/Workers​

Autoscaling the Machine Learning Secret Engine​

Was this page helpful?

Something we didn’t cover?

See our Roadmap

Subscribe on GitHub

Submit a request

API status

Subscribe to our newsletter

Requirements for autoscaling

Kubernetes HPA

KEDA (Helm only)

Autoscaling workers

Prometheus adapter configuration

Autoscaling Behavior

Autoscaling APIs

ingress-nginx

traefik

Istio

Contour

Openshift

AWS

Autoscaling Behavior

KOTS-based installation

Helm-based installation

Autoscaling APIs/Workers

Autoscaling the Machine Learning Secret Engine