Skip to main content

Generic high entropy secret

Description

General

The generic high entropy detector aims at catching any high entropy strings being assigned to a sensitive variable. This statement is pretty wide, therefore to avoid raising many false alerts, GitGuardian has come up with a range of validation steps and specifications to refine the perimeter to look at.

Specifications

About assignments

An assignment is any statement of the form {assigned_variable} {assignment_token} {value}, like for instance: my_variable = "HelloWorld".

For this detector, the {assigned_variable} to find must contain one of the following words to be considered sensitive and therefore valid:

  • secret
  • token
  • api[_.-]?key
  • credential
  • auth

Example: secret_id is a valid assigned_variable.

The {assignment_token} can be one of the following: :, =, :=, =>, ,, >, (,<-

Example: a valid assignment could thus be secret_id := {value} or service_credential <- {value}

Finally, the {value} must be be a high entropy string, that is to say it must:

  • Follow this regular expression: [a-zA-Z0-9_.+/~$-]([a-zA-Z0-9_.+/=~$-]|\\\\(?![ntr\"])){14,1022}[a-zA-Z0-9_.+/=~$-]
  • Have a Shannon entropy of at least 3
  • Pass the post validation steps (see hereunder)

Example: Overall, secret_id := hj65_klhz/trlupok76 is a valid assignment for this detector and will be caught.

About backslashes

The backslash \ is part of the secret's charset. Some extra rules were added to avoid raising an important number of false alerts.

  • The backslash cannot be the first or the last character of the secret.
  • It cannot be followed by an n a t or an r otherwise it would result in a line return, tab or carriage return.
  • The backslash cannot be followed by a quote ", otherwise it would be part of an escape sequence.
  • It cannot be used to write a unicode or ascii hexadecimal representation of a character, this is why a custom pattern was added to the banlist. This may seem a bit brutal, but it is the best trade-off between recall and precision that at hand.

For more examples, read sections below.

Revoke the secret

This detector catches generic secrets, hence GitGuardian cannot infer the concerned service. To properly revoke the secret :

  1. Understand what service is impacted.
  2. Refer to the corresponding documentation to know how to revoke and rotate the secret.

Examples

Examples that WILL be caught

- text: |
api_key = hj65_klhz/trlupok76
apikey: hj65_klhz/trlupok76

- text: |
secret_access = hj65_klhz/trlupok76
apikey: hj65_klhz/trlupok76

- text: |
o.set("auth", "bsaruceobkoraebisroaecbu89")
apikey: bsaruceobkoraebisroaecbu89

- text: |
token := buaroeuboesanubo234reacubrch
apikey: buaroeuboesanubo234reacubrch

- text: |
something_token := buaroeuboesanubo234reacubrch
apikey: buaroeuboesanubo234reacubrch

- text: |
set_apikey(buaroeuboesanubo234reacubrch)
apikey: buaroeuboesanubo234reacubrch

- text: |
secret: d1Hb1f\b497XGT75989e
apikey: d1Hb1f\b497XGT75989e

Examples that WILL NOT be caught

  • The high entropy string is too short :
- text: |
api_key = hj65_klhz/trlu
  • The entropy of the string is not high enough
- text: |
secret = xob1xob1xob1xob1xob1xob1xob1
  • The assigned variable is not considered sensitive
- text: |
object_id = hj65_klhz/trlupok76
  • The high entropy string is not part of an assignment
- text: |
my high entropy api_key
hj65_klhz/trlupok76
- text: |
secret = aes.hj65_klhz/trlupok76
  • The backslash character cannot be part of a unicode character hexadecimal representation:
- text: token=\u4356\u6543
apikey: \u4356\u6543

Details for Generic high entropy secret

  • High Recall: False

  • Validity Check: False

  • Minimum Number of Matches: 1

  • Occurrences found for one million commits: 7153

  • Prefixed: False

  • PreValidators:
    Here is a list of the validation steps the document must pass before being analyzed.

- type: FilenameBanlistPreValidator
banlist_extensions: []
banlist_filenames:
- hash
- list/k.txt$
- list/plex.txt$
- \.csproj$
- tg/mtproto\.json
check_binaries: false
- type: ContentWhitelistPreValidator
patterns:
- (secret|token|api[_.-]?key|credential|auth)
  • PostValidators:
    Here is a list of the validation steps the matched string must pass after being caught.
- type: MinimumDigitsPostValidator
digits: 2
- type: EntropyPostValidator
entropy: 3
- type: ValueBanlistPostValidator
patterns:
- ^id[_.-]
- ^mid[_.-]
- ^mnp[_.-]
- ^auth[_.-]
- ^trnsl[_.-]
- ^oqs_kem[_.-]
- ^pos[_.-]
- ^new[_.-]
- ^aes[_.-]
- ^wpa[_.-]
- ^ec[_.-]
- ^sec[_.-]
- ^zte[_.-]
- ^com\.
- parentkey
- auto
- enrich
- frontend
- options
- layout
- group
- field
- gatsby
- transform
- random
- ^tls[_.-]
- '12345'
- '4321'
- abcd
- _size$
- ^pub
- test
- country
- '[_.-]length$'
- template
- \.get
- get[_.-]
- preview
- alpha
- beta
- fake
- ^-
- keyring
- web[_.-]?app
- ^ds[_.-[token[_.-]
- ^pk[_.-]
- ^aizasy
- example
- ^0x[0-9a-fA-F]+$
- "dev[/\\_-]"
- "[/\\_-]dev"
- "([^a-z0-9]|^)v?\\d\\.\\d{1,3}\\.\\d{1,3}[_.-]"
- "^[0-9]{1,2}\\.[0-9]{1,2}\\.[0-9]{1,2}[=+]"
- ^/tmp/
- ^\$2[abxy]\$ # bcrypt hash
- \\u[a-f0-9]{4}
- \\x[a-f0-9]{2}
- type: ContextWindowBanlistPostValidator
window_width: 30
window_type: left
patterns:
- token_?address
- publishable_?key
- author
- sha
- propert(y|ies)
- foreign
- pubkey
- secret_key_base
- authenticity_token
- "credentials\\(['\"][a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$"
- "(?-i:(((?<![A-Z])Id(?![a-z]))|((?<![A-Z])ID(?![A-Z]))|((?<![a-z])id(?![a-z])))[^0-9@&\\n]{0,15}[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$)"
- type: ContextWindowBanlistPostValidator
window_width: 30
patterns:
- public[_.-]?key
- key[_.-]?user
- key[_.-]?id
- token[_.-]?id
- credential[_.-]?id
- document_?key
- client[_.-]?id # alone, this is not a secret
- secret[_.-]?id # alone, this is not a secret
- licensekey
- \.jpe?g
- \.png
- theme
- playlist
- hash
- sha
- localhost
- 127\.0\.0.\.1
- test
- xsrf
- csrf
- type: AssignmentBanlistPostValidator
patterns:
- 'id_token'
- '(credentials|session|secrets)id'
- 'encrypted'
- 'postman[_-]token'
- '^credentialsjson$'
- 'tokenizer'
- '^next[_-]?page[_-]?token'
- '^previous[_-]?page[_-]?token'
- '^ahoy_visit(or)?_token$'
- 'uuid'
- 'authorid'
- 'algolia_search_(only_)?api_key'
- type: HeuristicPostValidator
filters:
- url
- date
- file_name
- number
- heuristic_path
- type: DictFilterPostValidator

How can I help you ?