Skip to main content

Introduction

Specific vs generic detectors

GitGuardian makes the distinction between two main categories of detectors:

  • Specific detectors are designed to detect one specific type of secret, such as an AWS detector that will only detect AWS secrets, or a MongoDB detector that will only detect MongoDB database credentials. The main advantage of such detectors is that they offer high recall and high precision, meaning that they will rapidly catch all specific secrets while raising a low number of false alerts.
  • Generic detectors are designed to detect a broad variety of secrets without focusing on finding what exact secret has been caught. Therefore, they often catch more secrets, but can also bring more false positives than specific detectors if special care is not taken. That's why the team developing GitGuardian's secrets detection engine is constantly fine tuning these generic detectors (adding additional post validators) to keep an acceptable rate of false positive (around 20%) while maintaining high recall.

Priority between generic and specific detectors

When a secret is matched by both a generic and specific detector, we will keep the secret matched by the specific detector since more information is available for the remediation.

Let's take a specific example with a SendGrid token in the following context:

# all SendGrid secrets start with SG.
TOKEN=SG.gxxxxxxxxx

The secret will be captured both by our SendGrid detector and our generic high entropy secret detector, yet we will keep the SendGrid secret since it provides more information for remediation.

Specific detectors always have priority over generic detectors

How to read a detector's file

General information

IPs allowlist: Indicates if the provider offers a way to restrict credentials usage base on incoming IP address.

Scopes: Indicates if the provider supports different ranges of permissions for credentials.

Revoke the secret: Steps to follow in order to revoke a key.

Check for suspicious activity: Procedure to inspect credentials usage and detect some suspicious activities.

Details section glossary

Family: Detectors can be classified in families. We currently have the following ones:

  • api
  • database
  • private_key
  • other

Company: The company that issues the credentials. We tend to use company names of holdings rather than subsidiaries.

High_recall: This flag indicates that the detector will not miss credentials. Usually this is caused by very well identified and unique patterns in credentials.

Prefixed: This flag indicates whether the credentials caught by the detector are prefixed. If this is True, it tends to ensure a very high precision and a very high recall for the detector.

Can be checked: This flag indicates whether the key can be checked as valid by using a non-intrusive API call.

Minimum number of matches: This corresponds to the number of matches contained in the actual secret. This would be 2 for a secret composed of a client_id and a client_secret.

Frequency_estimate: This figure gives the average number of credentials that the detector finds per million commits.