Machine Learning
This section outlines the machine learning features implemented in GitGuardian, detailing how they work and how to activate them.
System Requirements
The Secret ML Engine requires significant computational resources to operate the machine learning model effectively.
Currently, it requires a minimum of 3 CPUs and 2.5 GiB of memory per replica. On AWS, it's worth noting that Gen 7 EC2 instances offer significantly better performance compared to Gen 6.
KOTS-Based Installation
To enable this feature, set a positive number of replicas for both the ML engine and workers in your configuration:
Helm-Based Installation
For Helm-based installations, consult the Helm Values Documentation for detailed configuration options. Ensure you configure a positive number of replicas for the ML engine and workers. Example minimal configuration:
secretEngine:
replicas: 1
celeryWorkers:
ml-api-priority:
replicas: 1
We strongly recommend using Horizontal Pod Autoscaling (HPA) for the ml-api-priority
worker, especially when performing a backfill on all your incidents. For more information, refer to our Autoscaling Guide.
Enrich Past Incidents
Machine learning enhances your ability to analyze past incidents, making remediation more efficient.
When you activate Machine Learning on your GitGuardian instance for the first time, all real-time incidents will be automatically analyzed by the ML engine. To apply ML features to past incidents, you must manually trigger the process.
For more details about the machine learning capabilities, see the Machine Learning Secret Detection documentation.
Here are the steps to enrich past incidents:
- Navigate to the Admin area of your GitGuardian instance. You must have admin privileges to proceed.
- In the Admin area, navigate to Settings > Machine Learning.
- Click on Backfill Incidents to apply the ML model to past incidents.
After the backfill process is completed, a confirmation message will appear, along with a link that redirects you to your incident page, filtered by the 'false positives' tag.
- The backfill process cannot be canceled once started.
- The duration of this process depends on the number of incidents being analyzed, ranging from a few minutes to several days.
- During the backfill process, real-time incidents processing will be delayed. Therefore, it is recommended to schedule this task outside of business hours to minimize disruptions.