Glossary
This document defines common words used to describe our secrets detection engine.
#
DetectorA set of rules that will be applied to a document to find one type of secret (e.g.: AWS keys, database URI, Google Key...).
#
Generic detectorWe consider that a detector is generic if we are not able to infer the secret's provider directly. For example the detector looking for a pattern such as secret={high_entropy_string}
is a generic detector.
#
Specific detectorA specific detector is a detector designed to find a well identified type of secret such as AWS keys, MySQL URI, Slack token... Specific detectors are often opposed to generic detectors.
#
Assignment and assigned variableWe refer to an assignment as any statement of the form {assigned_variable} {assignment_token} {value}
. For example in this statement: my_variable = "HelloWorld"
, the assigned variable is: my_variable
.
#
DocumentAny text with a filename. Filename is optional.
#
EntropyMeasure of randomness of a string. An API key should have a high entropy since it is a randomly generated sequence of characters. When mentioning entropy in this documentation, we mean Shannon entropy.
#
Filepath / Filename / ExtensionWe adopted the following conventions for naming paths. For example config/secrets.yaml
:
yaml
is the extension.secrets.yaml
is the filename.config/secrets.yaml
is the filepath.
#
InsightAdditional information on a document or a secret.
#
MatchA string that is part of a secret. A secret can be composed of one or several matches.
#
MatcherA detection rule that is applied to a document and outputs matches.
#
PostValidatorA validation rule applied to a secret candidate (e.g.: validate that all the matches have sufficient entropy).
#
PrecisionThe fraction of secrets detected that are indeed true secrets. We can keep track of this metric with the feedbacks of our customers.
#
PreValidatorA validation rule applied to a document (e.g.: look for "datadog" in the document).
#
PriorityA rule that prioritizes one secret over another one if they are overlapping. A secret detected by a specific detector always has a higher priority than one detected by a generic detector.
#
RecallThe fraction of secrets we were able to detect and classify as such among all secrets that exist. This metric is almost impossible to measure without human labelling.
#
ScannerA collection of detectors. In terms of code, this is the entry point to scan a document, and the only way of scanning one.
#
SecretA combination of strings found by a detector in a document. This combination should grant access to a private service.
#
Secrets overlappingTwo secrets overlap if any of one's matches are partially or completely included in any of the other's secrets matches.
#
Validity CheckA non intrusive call to the concerned service that allows to determine whether a key is valid or invalid. Some validity checks can be used to improve our precision and be sure that we only raise alerts for valid secrets.