Skip to main content

Glossary

This document explains the definition of common words used to describe our secrets detection engine.

Detector#

A set of rules that will be applied to a document to find one type of secret (e.g.: AWS keys, database URI, Google Key...).

Generic detector#

We consider that a detector is generic if we are not able to infer the secret's provider directly. For example the detector looking for a pattern such as secret={high_entropy_string} is a generic detector.

Specific detector#

A specific detector is a detector designed to find a well identified type of secret such as an AWS keys, MYSQL URI, Slack token ... Specific detectors are often opposed to generic detectors.

Document#

Any text with a filename. Filename is optional.

Entropy#

Measure of randomness of a string. An API key should have a high entropy since it is a randomly generated sequence of characters. When mentionning entropy in this documentation, we are referring to the Shannon entropy.

Insight#

Additional information on a document or a secret.

Match#

A string that is part of a secret. A secret can contain one or multiple matches.

Matcher#

A detection rule that is applied to a document and outputs matches.

PostValidator#

A validation rule applied to a secret candidate (e.g.: validate that all the matches have sufficient entropy).

Precision#

The precision is the fraction of secrets detected that are indeed true secrets. We can keep track of this metric with the feedbacks of our customers.

PreValidator#

A validation rule applied to a document (e.g.: look for "datadog" in the document).

Priority#

A rule that prioritizes one secret over another one if they are overlapping. A secret detected with a specific detector always has priority over one detected with a generic detector.

Recall#

The recall is the fraction of secrets we were able to detect and classify as such among all secrets that exist. This metric is almost impossible to measure without human labelling.

Scanner#

A collection of detectors. In terms of code, this is the entry point to scan a document, and the only way of scanning one.

Secret#

A combination of strings found by a detector in a document. This combination should grant access to a private service.

Secrets overlapping#

Two secrets overlap if any of one's matches are partially or completely included in any of the other's secrets matches.

Validity Check#

Validity check is a non intrusive call to the concerned service that allows to determine whether a key is valid or invalid. Some validity checks can be used to improve our precision and be sure that we raise alerts only for valid secrets.

Filepath / Filename / Extension#

We adopt the following conventions for naming paths. For example config/secrets.yaml:

  • .yaml is the extension.
  • secrets.yaml is the filename.
  • config/secrets.yaml is the filepath.