Machine learning

Detecting secrets with high quality results is a challenging and intricate task. To enhance our detection engine, we've implemented various machine learning models to analyze code like a professional developer, identify false positives, and enrich generic secrets with contextual information.

False Positive Remover

Business Feature

Only workspaces with a Business plan can access this functionality.

When it comes to avoiding false positives, we've pushed imperative programming and regular expressions to their limits. It is simply not possible to write conditions or regular expression patterns for every potential scenario.

To overcome this technological constraint we implemented machine learning to train machines to quickly and efficiently navigate this complex domain and identify the elements we are looking for.

False Positive Remover is an internally developed and trained model, independent of third-party services, that accurately identifies and tags incidents as 'false positives' through its thorough analysis.

How to use it?

False positive Remover

You can improve your workflow by using the Filters > GitGuardian tags > False positive filter located in the incidents list page.

This filter allows you to easily identify and manage false positive incidents, helping you streamline your incident resolution process.

FAQ

What does this model consider as "False positive"?

Something that cannot be a secret in any context.

In the example below ("signup_form_confirm_password": " Confirmar contrasinal") looks like a true positive for a regex but is not for our model which analyzes a context (lines before/after)

{ 
	"signup_form_username": "Identificador",
	"signup_form_password": "Contrasinal",
	"signup_form_confirm_password": " Confirmar contrasinal", <- a regex may consider this a true positive, not our model.
	"signup_form_button_submit": "Crear conta",
}

If these are false positives, why don't you just remove them?

During beta, we will safely evaluate the accuracy of the model before potentially using it to remove all false positives upfront.

Are you catching all the false positives I have?

We estimate that in v1 the model can detect 50% of your false positives, on average. We focus on being as accurate as possible and will try to improve our recall over time.

Generic Secret Enricher

The Generic Secret Enricher (GSE) is a specialized machine learning model designed to analyze the context around generic secrets and automatically classify them into categories and providers. This enhanced classification helps you prioritize remediation efforts by understanding the potential impact and criticality of each incident.

info

This feature is specifically designed for generic incidents that couldn't be matched to a specific detector. The GSE analyzes the surrounding context to provide category and provider insights that help with prioritization and remediation.

How GSE categories help with remediation

Understanding GSE categories helps you:

Prioritize critical infrastructure secrets (Cloud providers, Databases, etc.)
Focus on high-impact services (Payment systems, Identity providers, etc.)
Identify secrets that could affect business operations (Messaging systems, E-commerce platforms, etc.)
Streamline remediation workflows by grouping similar types of secrets
Apply appropriate security policies based on the service type

How to use it?

Customize your views

From the incidents list, you can customize how your incidents are displayed by clicking on the "Columns" button in the top-right corner of the table.

This allows you to add the "Secret category" and "Secret provider" columns, which display the main properties inferred by the model.

With this customization, you can quickly spot important categories (such as "Data Storage") or specific providers that might require immediate attention among your generic incidents.

Filter your data

Three new filters (Provider, Category, Family) help you identify the most significant or critical generic incidents, such as those classified under "Data Storage" or linked to the provider "Postgresql."

To use these filters, first, filter your incidents by the "Generic" type. Then, apply a combination of the three aforementioned filters.

GSE-filters

With these new filters you can explore your generic incidents and unveil the one that matters for your operations.

Categories and providers reference

For detailed definitions of all GSE categories and providers, including what they mean and how to prioritize them, see our comprehensive GSE Categories and Providers Reference.

FAQ

Why aren't these transformed into specific incidents?

The analysis might be incomplete. We may only be able to identify the Provider or the Category (or potentially neither). As we continue to refine this feature, our definitions will become more precise.

What is the model trained to discover?

The model can identify a comprehensive set of categories and providers:

The model can identify the following categories:

AI
CDN
CI/CD
Cloud provider
Code analysis
Collaboration tool
CRM
Cryptos
Data storage
E-commerce
Identity provider
Internal
Messaging system
Monitoring
Other
Package registry
Payment system
Private key
Remote access
Secret management
Social network
Version control platform

The model can identify hundreds of providers including:

Amazon AWS and related services
Microsoft Azure and related services
Google Cloud Platform
Popular databases (PostgreSQL, MySQL, MongoDB, Redis)
CI/CD platforms (GitHub, GitLab, Jenkins, CircleCI)
Payment systems (Stripe, PayPal, Square)
AI services (OpenAI, Anthropic, Hugging Face)
Messaging platforms (Slack, Discord, Twilio)
And many more...

For the complete list of supported categories and providers, see the GSE Categories and Providers Reference.

Machine learning

False Positive Remover

How to use it?

FAQ

What does this model consider as "False positive"?

If these are false positives, why don't you just remove them?

Are you catching all the false positives I have?

Generic Secret Enricher

How GSE categories help with remediation

How to use it?

Customize your views

Filter your data

Categories and providers reference

FAQ

Why aren't these transformed into specific incidents?

What is the model trained to discover?

Was this page helpful?

Something we didn’t cover?

See our Roadmap

Subscribe on GitHub

Submit a request

API status

False Positive Remover​

How to use it?​

FAQ​

What does this model consider as "False positive"?​

If these are false positives, why don't you just remove them?​

Are you catching all the false positives I have?​

Generic Secret Enricher​

How GSE categories help with remediation​

How to use it?​

Customize your views​

Filter your data​

Categories and providers reference​

FAQ​

Why aren't these transformed into specific incidents?​

What is the model trained to discover?​

Was this page helpful?

Something we didn’t cover?

See our Roadmap

Subscribe on GitHub

Submit a request

API status

Subscribe to our newsletter

False Positive Remover

How to use it?

FAQ

What does this model consider as "False positive"?

If these are false positives, why don't you just remove them?

Are you catching all the false positives I have?

Generic Secret Enricher

How GSE categories help with remediation

How to use it?

Customize your views

Filter your data

Categories and providers reference

FAQ

Why aren't these transformed into specific incidents?

What is the model trained to discover?