At any point when reading this FAQ section you can use our glossary to find some useful definitions.
- How does GitGuardian's detection engine work, roughly speaking?
- What is the difference between generic and specific detectors?
- Does GitGuardian check the validity of credentials?
- What do you call a false positive exactly in the context of secrets detection?
- Why didn't GitGuardian detect my secret?
- How to properly test GitGuardian detection capabilities?
- We’ve seen real credentials in .md files in the past already, why do some of your detectors drop .md files?
- Are cryptographic keys sensitive objects?
For an extensive response, you can consult this page. GitGuardian's detection engine revolves around the concept of detectors. A detector is a set of instructions that our detection engine will execute on a given input document to catch secrets in it. The flow of instructions is always the same :
- Pre-Validation : discard as early as possible documents that are of no interest for secrets detection.
- Matching : look for a given pattern.
- Post-Validation : apply some validation steps to select only relevant secrets.
All these steps are performed using a combination of regular expressions and heuristics based on contextual information.
Simply put, a specific detector is designed to find a well identified type of secret whereas a generic detector yields secrets that are not associated with a given provider. For a bit more details, you can refer to this documentation on detectors.
When possible, GitGuardian will check the validity of the credentials detected. To do so, GitGuardian performs the least intrusive call as possible to the service. We favor HEAD requests, or GET requests when we cannot. We also select endpoints that do not give any personal information. Once this is done, the secret will be labelled as valid or invalid.
In some well-identified cases, secrets can be labelled as invalid by our checker without GitGuardian having the certainty that this secret is a false positive : in that case, we will still display the secret in our applications. A good example of such a situation are database credentials : we are able to perform a check, but the checker can label credentials as invalid simply because the host is down. Yet, we still want to display the incident.
In the context of secrets detection, a false positive occurs when our detection engine raises an alert for a secret that is not one and has never been one.
For more details on false positives, recall and precision, we highly recommend reading this blog post.
To properly test our engine, we recommend reading carefully this documentation, and especially looking at detector's examples that will provide you with some good test cases.
You can also use this example repository to get familiar with our detection engine's behavior.
Eventually, if a secret were not to be detected by one of our detector, you can refer to this question to get some explanations.
First of all, we recommend building your test cases by following our detector's documentation. For each detector, we provide a set of examples that are detected by our engine. Here are also some possible reasons why we did not raise an alert for your secret :
- The associated detector is checked, and your secret is not valid anymore. In that case, the secret is labelled as invalid and no alert is raised.
- You somehow obfuscated your secret to test our detection capabilities, and that's a good practice. But you may have broken the pattern of the key in the process : make sure you kept an identical length and charset.
- Your secret could not pass our pre-validation steps : for certain detectors we ban markdown files, or we require a given context for the detection to occur. You can refer to the concerned detector's documentation here.
- Your secret is not part of the required assignment. Look at the detector's examples to see what patterns are detected.
We’ve seen real credentials in .md files in the past already, why do some of your detectors drop .md files?#
At GitGuardian one of our biggest challenges is to achieve a detection with the highest precision and the best recall possible, in other words squaring the circle. To do so, we battle test our algorithms on GitHub's live data feed. We also permanently monitor our detector's performance by looking at explicit feedback from developers or from our checkers, as well as implicit feedback : e.g. secrets removal.
Thanks to these feedbacks, we decided to drop markdown files in certain detectors in an effort to reduce alert fatigue and to increase our precision. To know which detector's are concerned, you can refer to detectors' pre-validation steps' documentation.
Cryptographic algorithms are tools used to secure communications over public channels such as the Internet. Based on mathematical hard problems, they are the building blocks to protocols such as TLS (for secure internet browsing via https) or SSH (for secure remote access to servers). The different security features provided by cryptography are authentication, authorization, and encryption. To this means, cryptographic algorithms are bound to cryptographic keys that are used to unlock or lock these functions.
We distinguish two types of keys, symmetric or asymmetric keys:
- A symmetric key is shared between the entities communicating.
- Asymmetric keys are composed of a public and a private key. The public key is distributed to everyone to initiate a communication or a protocol and the private key is used to verify and carry on the communication or the protocol.
Having access to someone's symmetric key or asymmetric private key can have devastating consequences. A malicious adversary could then impersonate an entity, tamper its communications, or simply have access to all its secure data.
How we detect private keys
After the introduction of the series of IETF RFC 1421, 1422, 1423, and 1424 most implementation libraries involving cryptography (such as OpenSSL) use a shared format to store the cryptographic keys called PEM (stands for Privacy-Enhanced Mail). This format has a very structured form, always starting with the same pattern. This is very convenient for detection as it implies a high recall on the different detectors implemented. We based our family of cryptographic key detectors on the particularity of the PEM format to get very efficient and precise detectors. Here are the list of the detectors currently implemented in our suit:
- Generic private key.
- DSA private key.
- Elliptic curve private key.
- RSA private key.
- OpenSSH private key.
- PGP private key.
- Encrypted private key.
- Putty private key.
We targeted the main cryptographic algorithms or protocols, which are the most commonly used ones and referenced one by standard entities. For each of those algorithms, we implemented a detector for both the PEM format form and the Base 64 encoded version.
What about public key certificates?
One frequently asked question by the public and our customers is about the sensitivity of a certificate. Public-key certificates are used in TLS protocols in order to establish authenticated and secure communication channels when browsing over the web, displayed as https and a green lock on the website. They are in essence just public keys augmented with a signature that everyone can access to (simply click on the lock). As such, they have no sensitivity and the augmented signatures just provide trust to users that this certificate was issued by a trusted party. The trusted party is usually referenced by either the browser or the OS (linux, windows, apple, etc) on installation.