Skip to main content

Scanning Other Data Sources for Secrets

Prelude#

ggshield supports a generic input format via its secret scan docset command.
The input files for that command must be in JSONL format, each line containing a "docset" JSON object.

A docset represents a set of documents for a given type. Each document has authors and content. Authors may have a name, an email and a role. This is a simple and flexible model to ease preparing data for ggshield consumption. See below for more details.

The Docset Format#

Docset structure#

For a more detailed view of the format, here is the structure of a docset:

{  // Required. Defines the type of data stored in the docset.  "type": "",
  // Required. A string to uniquely identify this docset.  // Content depends on the type and is considered opaque but will be displayed in the output.  "id": "",
  // Optional. Authors of the doc set.  // Only set if the whole docset has the same authors.  "authors": [$author],
  // Required. The documents of the docset.  "documents": [$document]}

Author's structure#

{  // Required. Content depends on the format and is considered opaque.  // Could be an email, a username or a system specific ID.  "id": "",  // Optional. The author name, if available.  "name": "",  // Optional. The author email, if available.  // This field should be set even if email is used as the ID, since the ID is  // considered opaque.  "email": "",  // Optional. Meaning depends on the format.  // For example in a commit it would be "author" or "committer".  "role": ""}

Document's structure#

{  // Required. A string to uniquely identify the document inside the docset.  // Content depends on the type and is considered opaque but will be displayed in the output.  "id": "",
  // Optional. If defined, it replaces (not extend) the global docset authors.  "authors": [$author],
  // Required. The content of the document in UTF-8.  "content": ""}