Generic high entropy secret
Description
General
The generic high entropy detector
aims at catching any high entropy strings being assigned to a sensitive variable. This statement is pretty wide, therefore to avoid raising many false alerts, GitGuardian has come up with a range of validation steps and specifications to refine the perimeter to look at.
Specifications
About assignments
An assignment is any statement of the form {assigned_variable} {assignment_token} {value}
, like for instance: my_variable = "HelloWorld"
.
For this detector, the {assigned_variable}
to find must contain one of the following words to be considered sensitive and therefore valid:
secret
token
api[_.-]?key
credential
auth
Example: secret_id
is a valid assigned_variable
.
The {assignment_token}
can be one of the following: :
, =
, :=
, =>
, ,
, >
, (
,<-
Example: a valid assignment could thus be secret_id := {value}
or service_credential <- {value}
Finally, the {value}
must be be a high entropy string, that is to say it must:
- Follow this regular expression:
[a-zA-Z0-9_.+/~$-]([a-zA-Z0-9_.+/=~$-]|\\\\(?![ntr\"])){14,1022}[a-zA-Z0-9_.+/=~$-]
- Have a Shannon entropy of at least 3
- Pass the post validation steps (see hereunder)
Example: Overall, secret_id := hj65_klhz/trlupok76
is a valid assignment for this detector and will be caught.
About backslashes
The backslash \
is part of the secret's charset. Some extra rules were added to avoid raising an important number of false alerts.
- The backslash cannot be the first or the last character of the secret.
- It cannot be followed by an
n
at
or anr
otherwise it would result in a line return, tab or carriage return. - The backslash cannot be followed by a quote
"
, otherwise it would be part of an escape sequence. - It cannot be used to write a unicode or ascii hexadecimal representation of a character, this is why a custom pattern was added to the banlist. This may seem a bit brutal, but it is the best trade-off between recall and precision that at hand.
For more examples, read sections below.
Revoke the secret
This detector catches generic secrets, hence GitGuardian cannot infer the concerned service. To properly revoke the secret :
- Understand what service is impacted.
- Refer to the corresponding documentation to know how to revoke and rotate the secret.
Examples
Examples that WILL be caught
- text: |
api_key = hj65_klhz/trlupok76
apikey: hj65_klhz/trlupok76
- text: |
secret_access = hj65_klhz/trlupok76
apikey: hj65_klhz/trlupok76
- text: |
o.set("auth", "bsaruceobkoraebisroaecbu89")
apikey: bsaruceobkoraebisroaecbu89
- text: |
token := buaroeuboesanubo234reacubrch
apikey: buaroeuboesanubo234reacubrch
- text: |
something_token := buaroeuboesanubo234reacubrch
apikey: buaroeuboesanubo234reacubrch
- text: |
set_apikey(buaroeuboesanubo234reacubrch)
apikey: buaroeuboesanubo234reacubrch
- text: |
secret: d1Hb1f\b497XGT75989e
apikey: d1Hb1f\b497XGT75989e
Examples that WILL NOT be caught
- The high entropy string is too short :
- text: |
api_key = hj65_klhz/trlu
- The entropy of the string is not high enough
- text: |
secret = xob1xob1xob1xob1xob1xob1xob1
- The assigned variable is not considered sensitive
- text: |
object_id = hj65_klhz/trlupok76
- The high entropy string is not part of an assignment
- text: |
my high entropy api_key
hj65_klhz/trlupok76
- The high entropy string contains an excluded pattern (see banlist hereunder)
- text: |
secret = aes.hj65_klhz/trlupok76
- The backslash character cannot be part of a unicode character hexadecimal representation:
- text: token=\u4356\u6543
apikey: \u4356\u6543
Details for Generic high entropy secret
High Recall: False
Validity Check: False
Minimum Number of Matches: 1
Occurrences found for one million commits: 7153
Prefixed: False
PreValidators:
Here is a list of the validation steps the document must pass before being analyzed.
- type: FilenameBanlistPreValidator
banlist_extensions: []
banlist_filenames:
- hash
- list/k.txt$
- list/plex.txt$
- \.csproj$
- tg/mtproto\.json
check_binaries: false
- type: ContentWhitelistPreValidator
patterns:
- (secret|token|api[_.-]?key|credential|auth)
- PostValidators:
Here is a list of the validation steps the matched string must pass after being caught.
- type: MinimumDigitsPostValidator
digits: 2
- type: EntropyPostValidator
entropy: 3
- type: ValueBanlistPostValidator
patterns:
- ^id[_.-]
- ^mid[_.-]
- ^mnp[_.-]
- ^auth[_.-]
- ^trnsl[_.-]
- ^oqs_kem[_.-]
- ^pos[_.-]
- ^new[_.-]
- ^aes[_.-]
- ^wpa[_.-]
- ^ec[_.-]
- ^sec[_.-]
- ^zte[_.-]
- ^com\.
- parentkey
- auto
- enrich
- frontend
- options
- layout
- group
- field
- gatsby
- transform
- random
- ^tls[_.-]
- '12345'
- '4321'
- abcd
- _size$
- ^pub
- test
- country
- '[_.-]length$'
- template
- \.get
- get[_.-]
- preview
- alpha
- beta
- fake
- ^-
- keyring
- web[_.-]?app
- ^ds[_.-[token[_.-]
- ^pk[_.-]
- ^aizasy
- example
- ^0x[0-9a-fA-F]+$
- "dev[/\\_-]"
- "[/\\_-]dev"
- "([^a-z0-9]|^)v?\\d\\.\\d{1,3}\\.\\d{1,3}[_.-]"
- "^[0-9]{1,2}\\.[0-9]{1,2}\\.[0-9]{1,2}[=+]"
- ^/tmp/
- ^\$2[abxy]\$ # bcrypt hash
- \\u[a-f0-9]{4}
- \\x[a-f0-9]{2}
- type: ContextWindowBanlistPostValidator
window_width: 30
window_type: left
patterns:
- token_?address
- publishable_?key
- author
- sha
- propert(y|ies)
- foreign
- pubkey
- secret_key_base
- authenticity_token
- "credentials\\(['\"][a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$"
- "(?-i:(((?<![A-Z])Id(?![a-z]))|((?<![A-Z])ID(?![A-Z]))|((?<![a-z])id(?![a-z])))[^0-9@&\\n]{0,15}[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$)"
- type: ContextWindowBanlistPostValidator
window_width: 30
patterns:
- public[_.-]?key
- key[_.-]?user
- key[_.-]?id
- token[_.-]?id
- credential[_.-]?id
- document_?key
- client[_.-]?id # alone, this is not a secret
- secret[_.-]?id # alone, this is not a secret
- licensekey
- \.jpe?g
- \.png
- theme
- playlist
- hash
- sha
- localhost
- 127\.0\.0.\.1
- test
- xsrf
- csrf
- type: AssignmentBanlistPostValidator
patterns:
- 'id_token'
- '(credentials|session|secrets)id'
- 'encrypted'
- 'postman[_-]token'
- '^credentialsjson$'
- 'tokenizer'
- '^next[_-]?page[_-]?token'
- '^previous[_-]?page[_-]?token'
- '^ahoy_visit(or)?_token$'
- 'uuid'
- 'authorid'
- 'algolia_search_(only_)?api_key'
- type: HeuristicPostValidator
filters:
- url
- date
- file_name
- number
- heuristic_path
- type: DictFilterPostValidator