Skip to main content

Generic high entropy secret

Description#

General#

The generic high entropy detector aims at catching any high entropy strings being assigned to a sensitive variable. This statement is pretty wide, therefore to avoid raising many false alerts, GitGuardian has come up with a range of validation steps and specifications to refine the perimeter we are looking at.

Specifications#

About assignments#

We refer to an assignment as any statement of the form {assigned_variable} {assignment_token} {value}, like for instance : my_variable = "HelloWorld".

For this detector, the {assigned_variable} we are looking at must contain one of the following words to be considered sensitive and therefore valid :

  • secret
  • token
  • api[_.-]?key
  • credential
  • auth

Example : secret_id is a valid assigned_variable in our case.

The {assignment_token} can be one of the following : :, =, :=, =>, ,, >, (,<-

Example : a valid assignment could thus be secret_id := {value} or service_credential <- {value}

Finally, the {value} must be be a high entropy string, that is to say it must :

  • Follow this regular expression : [a-zA-Z0-9_.+/~$-]([a-zA-Z0-9_.+/=~$-]|\\\\(?![ntr\"])){14,1022}[a-zA-Z0-9_.+/=~$-]
  • Have a Shannon entropy of at least 3
  • Pass our post validation steps (see hereunder)

Example : Overall, secret_id := hj65_klhz/trlupok76 is a valid assignment for this detector and will be caught.

About backslashes#

The backslash \ is part of the secret's charset. We added some extra rules to avoid raising an important number of false alerts.

  • The backslash cannot be the first or the last character of the secret.
  • It cannot be followed by an n a t or an r otherwise it would result in a line return, tab or carriage return.
  • The backslash cannot be followed by a quote ", otherwise it would be part of an escape sequence.
  • It cannot be used to write a unicode or ascii hexadecimal representation of a character, this is why we added a custom pattern to our banlist. This may seem a bit brutal, but it is the best trade-off between recall and precision that we have found.

For more examples, read sections below.

Revoke the secret#

This detector catches generic secrets, hence GitGuardian cannot infer the concerned service. To properly revoke the secret :

  1. Understand what service is impacted.
  2. Refer to the corresponding documentation to know how to revoke and rotate the secret.

Examples#

Examples that WILL be caught

- text: |    api_key = hj65_klhz/trlupok76  apikey: hj65_klhz/trlupok76
- text: |    secret_access = hj65_klhz/trlupok76  apikey: hj65_klhz/trlupok76
- text: |    o.set("auth", "bsaruceobkoraebisroaecbu89")  apikey: bsaruceobkoraebisroaecbu89
- text: |    token := buaroeuboesanubo234reacubrch  apikey: buaroeuboesanubo234reacubrch
- text: |    something_token := buaroeuboesanubo234reacubrch  apikey: buaroeuboesanubo234reacubrch
- text: |    set_apikey(buaroeuboesanubo234reacubrch)  apikey: buaroeuboesanubo234reacubrch
- text: |    secret: d1Hb1f\b497XGT75989e  apikey: d1Hb1f\b497XGT75989e

Examples that WILL NOT be caught

  • The high entropy string is too short :
- text: |    api_key = hj65_klhz/trlu
  • The entropy of the string is not high enough
- text: |    secret = xob1xob1xob1xob1xob1xob1xob1
  • The assigned variable is not considered sensitive
- text: |    object_id = hj65_klhz/trlupok76
  • The high entropy string is not part of an assignment
- text: |    my high entropy api_key    hj65_klhz/trlupok76
- text: |    secret = aes.hj65_klhz/trlupok76
  • The backslash character cannot be part of a unicode character hexadecimal representation:
- text: token=\u4356\u6543  apikey: \u4356\u6543

Details for Generic high entropy secret#

  • High Recall: False

  • Validity Check: False

  • Minimum Number of Matches: 1

  • Occurrences found for one million commits: 7153

  • Prefixed: False

  • PreValidators:
    Here is a list of the validation steps the document must pass before being analyzed.

- type: FilenameBanlistPreValidator  banlist_extensions: []  banlist_filenames:    - hash    - list/k.txt$    - list/plex.txt$    - \.csproj$    - tg/mtproto\.json  check_binaries: false- type: ContentWhitelistPreValidator  patterns:    - (secret|token|api[_.-]?key|credential|auth)
  • PostValidators:
    Here is a list of the validation steps the matched string must pass after being caught.
- type: MinimumDigitsPostValidator  digits: 2- type: EntropyPostValidator  entropy: 3- type: ValueBanlistPostValidator  patterns:    - ^id[_.-]    - ^mid[_.-]    - ^mnp[_.-]    - ^auth[_.-]    - ^trnsl[_.-]    - ^oqs_kem[_.-]    - ^pos[_.-]    - ^new[_.-]    - ^aes[_.-]    - ^wpa[_.-]    - ^ec[_.-]    - ^sec[_.-]    - ^zte[_.-]    - ^com\.    - parentkey    - auto    - enrich    - frontend    - options    - layout    - group    - field    - gatsby    - transform    - random    - ^tls[_.-]    - "12345"    - "4321"    - abcd    - _size$    - ^pub    - test    - country    - "[_.-]length$"    - template    - \.get    - get[_.-]    - preview    - alpha    - beta    - fake    - ^-    - keyring    - web[_.-]?app    - ^ds[_.-[token[_.-]    - ^pk[_.-]    - ^aizasy    - example    - ^0x[0-9a-fA-F]+$    - "dev[/\\_-]"    - "[/\\_-]dev"    - "([^a-z0-9]|^)v?\\d\\.\\d{1,3}\\.\\d{1,3}[_.-]"    - "^[0-9]{1,2}\\.[0-9]{1,2}\\.[0-9]{1,2}[=+]"    - ^/tmp/    - ^\$2[abxy]\$ # bcrypt hash    - \\u[a-f0-9]{4}    - \\x[a-f0-9]{2}- type: ContextWindowBanlistPostValidator  window_width: 30  window_type: left  patterns:    - token_?address    - publishable_?key    - author    - sha    - propert(y|ies)    - foreign    - pubkey    - secret_key_base    - authenticity_token    - "credentials\\(['\"][a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$"    - "(?-i:(((?<![A-Z])Id(?![a-z]))|((?<![A-Z])ID(?![A-Z]))|((?<![a-z])id(?![a-z])))[^0-9@&\\n]{0,15}[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$)"- type: ContextWindowBanlistPostValidator  window_width: 30  patterns:    - public[_.-]?key    - key[_.-]?user    - key[_.-]?id    - token[_.-]?id    - credential[_.-]?id    - document_?key    - client[_.-]?id # alone, this is not a secret    - secret[_.-]?id # alone, this is not a secret    - licensekey    - \.jpe?g    - \.png    - theme    - playlist    - hash    - sha    - localhost    - 127\.0\.0.\.1    - test    - xsrf    - csrf- type: AssignmentBanlistPostValidator  patterns:    - "id_token"    - "(credentials|session|secrets)id"    - "encrypted"    - "postman[_-]token"    - "^credentialsjson$"    - "tokenizer"    - "^next[_-]?page[_-]?token"    - "^previous[_-]?page[_-]?token"    - "^ahoy_visit(or)?_token$"    - "uuid"    - "authorid"    - "algolia_search_(only_)?api_key"- type: HeuristicPostValidator  filters:    - url    - date    - file_name    - number    - heuristic_path- type: DictFilterPostValidator