Skip to main content

Glossary

This document defines common words used to describe our secrets detection engine.

Detector#

A set of rules that will be applied to a document to find one type of secret (e.g.: AWS keys, database URI, Google Key...).

Generic detector#

We consider that a detector is generic if we are not able to infer the secret's provider directly. For example the detector looking for a pattern such as secret={high_entropy_string} is a generic detector.

Specific detector#

A specific detector is a detector designed to find a well identified type of secret such as AWS keys, MySQL URI, Slack token... Specific detectors are often opposed to generic detectors.

Assignment and assigned variable#

We refer to an assignment as any statement of the form {assigned_variable} {assignment_token} {value}. For example in this statement: my_variable = "HelloWorld", the assigned variable is: my_variable.

Document#

Any text with a filename. Filename is optional.

Entropy#

Measure of randomness of a string. An API key should have a high entropy since it is a randomly generated sequence of characters. When mentioning entropy in this documentation, we mean Shannon entropy.

Filepath / Filename / Extension#

We adopted the following conventions for naming paths. For example config/secrets.yaml:

  • yaml is the extension.
  • secrets.yaml is the filename.
  • config/secrets.yaml is the filepath.

Insight#

Additional information on a document or a secret.

Match#

A string that is part of a secret. A secret can be composed of one or several matches.

Matcher#

A detection rule that is applied to a document and outputs matches.

PostValidator#

A validation rule applied to a secret candidate (e.g.: validate that all the matches have sufficient entropy).

Precision#

The fraction of secrets detected that are indeed true secrets. We can keep track of this metric with the feedbacks of our customers.

PreValidator#

A validation rule applied to a document (e.g.: look for "datadog" in the document).

Priority#

A rule that prioritizes one secret over another one if they are overlapping. A secret detected by a specific detector always has a higher priority than one detected by a generic detector.

Recall#

The fraction of secrets we were able to detect and classify as such among all secrets that exist. This metric is almost impossible to measure without human labelling.

Scanner#

A collection of detectors. In terms of code, this is the entry point to scan a document, and the only way of scanning one.

Secret#

A combination of strings found by a detector in a document. This combination should grant access to a private service.

Secrets overlapping#

Two secrets overlap if any of one's matches are partially or completely included in any of the other's secrets matches.

Validity Check#

A non intrusive call to the concerned service that allows to determine whether a key is valid or invalid. Some validity checks can be used to improve our precision and be sure that we only raise alerts for valid secrets.