Machine learning
Detecting secrets with high quality results is a challenging and intricate task. To enhance our detection engine, we've implemented various machine learning models to analyze code like a professional developer, identify false positives, and enrich generic secrets with contextual information.
False Positive Remover
Only workspaces with a Business plan can access this functionality.
When it comes to avoiding false positives, we've pushed imperative programming and regular expressions to their limits. It is simply not possible to write conditions or regular expression patterns for every potential scenario.
To overcome this technological constraint we implemented machine learning to train machines to quickly and efficiently navigate this complex domain and identify the elements we are looking for.
False Positive Remover is an internally developed and trained model, independent of third-party services, that accurately identifies and tags incidents as 'false positives' through its thorough analysis.
How to use it?
You can improve your workflow by using the Filters > GitGuardian tags > False positive
filter located in the incidents list page.
This filter allows you to easily identify and manage false positive incidents, helping you streamline your incident resolution process.
FAQ
What does this model consider as "False positive"?
Something that cannot be a secret in any context.
In the example below ("signup_form_confirm_password": " Confirmar contrasinal"
) looks like a true positive for a regex but is not for our model which analyzes a context (lines before/after)
{
"signup_form_username": "Identificador",
"signup_form_password": "Contrasinal",
"signup_form_confirm_password": " Confirmar contrasinal", <- a regex may consider this a true positive, not our model.
"signup_form_button_submit": "Crear conta",
}
If these are false positives, why don't you just remove them?
During beta, we will safely evaluate the accuracy of the model before potentially using it to remove all false positives upfront.
Are you catching all the false positives I have?
We estimate that in v1 the model can detect 50% of your false positives, on average. We focus on being as accurate as possible and will try to improve our recall over time.
Generic Secret Enricher
The Generic Secret Enricher (GSE) is a specialized machine learning model designed to analyze the context around generic secrets and automatically classify them into categories and providers. This enhanced classification helps you prioritize remediation efforts by understanding the potential impact and criticality of each incident.
This feature is specifically designed for generic incidents that couldn't be matched to a specific detector. The GSE analyzes the surrounding context to provide category and provider insights that help with prioritization and remediation.
How GSE categories help with remediation
Understanding GSE categories helps you:
- Prioritize critical infrastructure secrets (Cloud providers, Databases, etc.)
- Focus on high-impact services (Payment systems, Identity providers, etc.)
- Identify secrets that could affect business operations (Messaging systems, E-commerce platforms, etc.)
- Streamline remediation workflows by grouping similar types of secrets
- Apply appropriate security policies based on the service type
How to use it?
Customize your views
From the incidents list, you can customize how your incidents are displayed by clicking on the "Columns" button in the top-right corner of the table.
This allows you to add the "Secret category" and "Secret provider" columns, which display the main properties inferred by the model.
With this customization, you can quickly spot important categories (such as "Data Storage") or specific providers that might require immediate attention among your generic incidents.
Filter your data
Three new filters (Provider, Category, Family) help you identify the most significant or critical generic incidents, such as those classified under "Data Storage" or linked to the provider "Postgresql."
To use these filters, first, filter your incidents by the "Generic" type. Then, apply a combination of the three aforementioned filters.
With these new filters you can explore your generic incidents and unveil the one that matters for your operations.
Categories and providers reference
For detailed definitions of all GSE categories and providers, including what they mean and how to prioritize them, see our comprehensive GSE Categories and Providers Reference.
FAQ
Why aren't these transformed into specific incidents?
The analysis might be incomplete. We may only be able to identify the Provider or the Category (or potentially neither). As we continue to refine this feature, our definitions will become more precise.
What is the model trained to discover?
The model can identify a comprehensive set of categories and providers:
The model can identify the following categories:
- AI
- CDN
- CI/CD
- Cloud provider
- Code analysis
- Collaboration tool
- CRM
- Cryptos
- Data storage
- E-commerce
- Identity provider
- Internal
- Messaging system
- Monitoring
- Other
- Package registry
- Payment system
- Private key
- Remote access
- Secret management
- Social network
- Version control platform
The model can identify hundreds of providers including:
- Amazon AWS and related services
- Microsoft Azure and related services
- Google Cloud Platform
- Popular databases (PostgreSQL, MySQL, MongoDB, Redis)
- CI/CD platforms (GitHub, GitLab, Jenkins, CircleCI)
- Payment systems (Stripe, PayPal, Square)
- AI services (OpenAI, Anthropic, Hugging Face)
- Messaging platforms (Slack, Discord, Twilio)
- And many more...
For the complete list of supported categories and providers, see the GSE Categories and Providers Reference.