Skip to main content

Machine Learning

This section outlines the machine learning features implemented in GitGuardian, detailing how they work and how to activate them.

System Requirements

The Secret ML Engine requires significant computational resources to operate the machine learning model effectively.

Currently, it requires a minimum of 3 CPUs and 2.5 GiB of memory per replica. On AWS, it's worth noting that Gen 7 EC2 instances offer significantly better performance compared to Gen 6.

KOTS-Based Installation

To enable this feature, set a positive number of replicas for both the ML engine and workers in your configuration:

ML Secret Engine in KOTS

Helm-Based Installation

For Helm-based installations, consult the Helm Values Documentation for detailed configuration options. Ensure you configure a positive number of replicas for the ML engine and workers. Example minimal configuration:

secretEngine:
replicas: 1

celeryWorkers:
ml-api-priority:
replicas: 1

We strongly recommend using Horizontal Pod Autoscaling (HPA) for the ml-api-priority worker, especially when performing a backfill on all your incidents. For more information, refer to our Autoscaling Guide.

Enrich Past Incidents

Machine learning enhances your ability to analyze past incidents, making remediation more efficient.

When you activate Machine Learning on your GitGuardian instance for the first time, all real-time incidents will be automatically analyzed by the ML engine. To apply ML features to past incidents, you must manually trigger the process.

For more details about the machine learning capabilities, see the Machine Learning Secret Detection documentation.

Here are the steps to enrich past incidents:

  1. Navigate to the Admin area of your GitGuardian instance. You must have admin privileges to proceed.
  2. In the Admin area, navigate to Settings > Machine Learning.
  3. Click on Backfill Incidents to apply the ML model to past incidents.

ML backfill past incidents

After the backfill process is completed, a confirmation message will appear, along with a link that redirects you to your incident page, filtered by the 'false positives' tag.

ML backfill past incidents completed

caution
  • The backfill process cannot be canceled once started.
  • The duration of this process depends on the number of incidents being analyzed, ranging from a few minutes to several days.
  • During the backfill process, real-time incidents processing will be delayed. Therefore, it is recommended to schedule this task outside of business hours to minimize disruptions.