Skip to main content

Glossary

This document defines common words used to describe our secrets detection engine.

Detector

A set of rules that will be applied to a document to find one type of secret (e.g.: AWS keys, database URI, Google Key...).

Generic detector

We consider that a detector is generic if we are not able to infer the secret's provider directly. For example the detector looking for a pattern such as secret={high_entropy_string} is a generic detector.

Specific detector

A specific detector is a detector designed to find a well identified type of secret such as AWS keys, MySQL URI, Slack token... Specific detectors are often opposed to generic detectors.

Assignment and assigned variable

We refer to an assignment as any statement of the form {assigned_variable} {assignment_token} {value}. For example in this statement: my_variable = "HelloWorld", the assigned variable is: my_variable.

Document

Any text with a filename. Filename is optional.

Entropy

Measure of randomness of a string. An API key should have a high entropy since it is a randomly generated sequence of characters. When mentioning entropy in this documentation, we mean Shannon entropy.

Filepath / Filename / Extension

We adopted the following conventions for naming paths. For example config/secrets.yaml:

  • yaml is the extension.
  • secrets.yaml is the filename.
  • config/secrets.yaml is the filepath.

Insight

Additional information on a document or a secret.

Match

A string that is part of a secret. A secret can be composed of one or several matches.

Matcher

A detection rule that is applied to a document and outputs matches.

PostValidator

A validation rule applied to a secret candidate (e.g.: validate that all the matches have sufficient entropy).

Precision

The fraction of secrets detected that are indeed true secrets. We can keep track of this metric with the feedbacks of our customers.

PreValidator

A validation rule applied to a document (e.g.: look for "datadog" in the document).

Priority

A rule that prioritizes one secret over another one if they are overlapping. A secret detected by a specific detector always has a higher priority than one detected by a generic detector.

Recall

The fraction of secrets we were able to detect and classify as such among all secrets that exist. This metric is almost impossible to measure without human labelling.

Scanner

A collection of detectors. In terms of code, this is the entry point to scan a document, and the only way of scanning one.

Secret

A combination of strings found by a detector in a document. This combination should grant access to a private service.

Secrets overlapping

Two secrets overlap if any of one's matches are partially or completely included in any of the other's secrets matches.

Validity Check

A non intrusive call to the concerned service that allows to determine whether a key is valid or invalid. Some validity checks can be used to improve our precision and be sure that we only raise alerts for valid secrets.