GitGuardian application consists of several Kubernetes resources.
Helm-based installation facilitates the configuration of all deployments, while also offering enhanced customization options for the following:
- Creation of new classes of workers
- Customization of other resource types such as ephemeral storage, huge pages, etc.
- Provision of nodeSelector, toleration, and more
|Front||Dashboard Frontend and proxy for backend|
|Backend||(optional) Open Metrics Exporter for applicative metrics|
|Backend||VCS webhooks events receiver|
|Backend||Backend for the Dashboard (previously gitguardian-app in legacy)|
|Backend||Backend for the Dashboard (no timeout)|
|Backend||Public API and GGshield scans|
|Scheduler||Celery Beat task scheduler|
|Worker||Workers for queues: email, notifier|
|Worker||Workers for long tasks: check/install health, asynchronous cleanup tasks, ...|
|Worker||Workers for historical scans|
|Worker||Workers for queues: celery (default), check_run, realtime, realtime_retry|
|Job||Pre-deployment job performing database migrations|
|Job||Post-deployment job performing long data migrations|
Configure scaling settings
Each deployment can be configured using the
replicas property. For web pods, you will use the
webapps.[name].replicas property, and for async workers, you will use the
You can also configure
resources requests and limits.
# Set resources for pre-deploy and post-deploy jobs
# Set resources for nginx init containers
Scaling for historical scans of repositories up to 15GB in size
When you add a high number of sources, consider temporarily increasing the number pods for the time of the initial historical scan. Afterward, you can decrease those pods' replicas and resources.
When performing a historical scan, GitGuardian clones the git repository on the pod's ephemeral storage and will traverse all branches and commits in search of potential Policy Breaks. The full scan is done by a single Pod and can last from a few seconds to many hours depending on the repository size.
The more pods you'll add, the more historical scans can be done concurrently. When sizing your nodes, keep in mind that each Pod must have enough ephemeral storage and memory to run.
The following sizing has been tested for 7000 repositories up to 15GB with 16 pods:
|Compute nodes||8 vCPU |
64 GB RAM
50GB ephemeral disk space, 500 GB persistent disk space
|PostgreSQL Master||16 vCPU |
64 GB memory
300 GB disk space
|PostgreSQL Read Replica||8 vCPU |
32 GB memory
300 GB disk space
|Memory request and limit: 16GB||16|
On Helm-based installations, additional configuration of
scanner workers can bring more performance and stability.
In the following example, we specify that
scanner workers only use "On Demand" VMs with nvme disks and that pods'
ephemeral storage will use these disks
localStoragePath: /nvme/disk # Used for pods ephemeral storage
nodeSelector: # Must run on "On Demand" nodes with nvme disks
- key: worker-highdisk
Scaling real-time scans
Real-time scans are triggered by
Push events sent by the VCS to GitGuardian. Those scan duration is often under a second and should always be under 3 seconds. But to be able to handle peaks of pushes, you may want to increase the count of
worker Pods that are processing real-time scans.
We successfully tested peaks of 2000 pushes per hour with 8
worker Pods replicas, without changing default resources settings.