GitGuardian makes the distinction between two main categories of detectors:
- Specific detectors are designed to detect one specific type of secret, such as an AWS detector that will only detect AWS secrets, or a MongoDB detector that will only detect MongoDB database credentials. The main advantage of such detectors is that they offer high recall and high precision, meaning that they will rapidly catch all specific secrets while raising a low number of false alerts.
- Generic detectors are designed to detect a broad variety of secrets without focusing on finding what exact secret has been caught. Therefore, they often catch more secrets, but can also bring more false positives than specific detectors if special care is not taken. That's why the team developing GitGuardian's secrets detection engine is constantly fine tuning these generic detectors (adding additional post validators) to keep an acceptable rate of false positive (around 20%) while maintaining high recall.
When a secret is matched by both a generic and specific detector, we will keep the secret matched by the specific detector since more information is available for the remediation.
Let's take a specific example with a SendGrid token in the following context:
# all SendGrid secrets start with SG.TOKEN=SG.gxxxxxxxxx
The secret will be captured both by our
SendGrid detector and our
generic high entropy secret detector, yet we will keep the
SendGrid secret since it provides more information for remediation.
Specific detectors always have priority over generic detectors
IPs allowlist: Indicates if the provider offers a way to restrict credentials usage base on incoming IP address.
Scopes: Indicates if the provider supports different ranges of permissions for credentials.
Revoke the secret: Steps to follow in order to revoke a key.
Check for suspicious activity: Procedure to inspect credentials usage and detect some suspicious activities.
Family: Detectors can be classified in families. We currently have the following ones:
Company: The company that issues the credentials. We tend to use company names of holdings rather than subsidiaries.
High_recall: This flag indicates that the detector will not miss credentials. Usually this is caused by very well identified and unique patterns in credentials.
Prefixed: This flag indicates whether the credentials caught by the detector are prefixed. If this is True, it tends to ensure a very high precision and a very high recall for the detector.
Can be checked: This flag indicates whether the key can be checked as valid by using a non-intrusive API call.
Minimum number of matches: This corresponds to the number of matches contained in the actual secret. This would be 2 for a secret composed of a
client_id and a
Frequency_estimate: This figure gives the average number of credentials that the detector finds per million commits.