Glossary
This document defines common words used to describe our secrets detection engine.
Detector
A set of rules that will be applied to a document to find one type of secret (e.g.: AWS keys, database URI, Google Key...).
Generic detector
We consider that a detector is generic if we are not able to infer the secret's provider directly. For example the detector looking for a pattern such as secret={high_entropy_string}
is a generic detector.
Specific detector
A specific detector is a detector designed to find a well identified type of secret such as AWS keys, MySQL URI, Slack token... Specific detectors are often opposed to generic detectors.
Assignment and assigned variable
We refer to an assignment as any statement of the form {assigned_variable} {assignment_token} {value}
. For example in this statement: my_variable = "HelloWorld"
, the assigned variable is: my_variable
.
Document
Any text with a filename. Filename is optional.
Entropy
Measure of randomness of a string. An API key should have a high entropy since it is a randomly generated sequence of characters. When mentioning entropy in this documentation, we mean Shannon entropy.
Filepath / Filename / Extension
We adopted the following conventions for naming paths. For example config/secrets.yaml
:
yaml
is the extension.secrets.yaml
is the filename.config/secrets.yaml
is the filepath.
Insight
Additional information on a document or a secret.
Match
A string that is part of a secret. A secret can be composed of one or several matches.
Matcher
A detection rule that is applied to a document and outputs matches.
PostValidator
A validation rule applied to a secret candidate (e.g.: validate that all the matches have sufficient entropy).
Precision
The fraction of secrets detected that are indeed true secrets. We can keep track of this metric with the feedbacks of our customers.
PreValidator
A validation rule applied to a document (e.g.: look for "datadog" in the document).
Priority
A rule that prioritizes one secret over another one if they are overlapping. A secret detected by a specific detector always has a higher priority than one detected by a generic detector.
Recall
The fraction of secrets we were able to detect and classify as such among all secrets that exist. This metric is almost impossible to measure without human labelling.
Scanner
A collection of detectors. In terms of code, this is the entry point to scan a document, and the only way of scanning one.
Secret
A combination of strings found by a detector in a document. This combination should grant access to a private service.
Secrets overlapping
Two secrets overlap if any of one's matches are partially or completely included in any of the other's secrets matches.
Validity Check
A non intrusive call to the concerned service that allows to determine whether a key is valid or invalid. Some validity checks can be used to improve our precision and be sure that we only raise alerts for valid secrets.