Skip to main content

AI Management Policy

info

This document describes GitGuardian's AI systems and the governance, security measures, and ethical standards we implement to ensure their safe and responsible use.

Overview

  • Secret values are redacted: We use GitGuardian's cutting-edge detection engine to identify and redact secrets before any data is sent to external AI models.
  • Minimal context exposure: When context is required for certain AI features, only an excerpt around the detection location is sent, with secrets redacted by GitGuardian's detection engine.
  • Secure, regional data transmission: All data is transmitted securely over encrypted connections and remains within your region (EU or US): external AI providers process requests in the same geographic region as your GitGuardian instance.
  • Limited metadata: When metadata is required, we send incident characteristics such as detector type, severity, validity status, and other non-sensitive attributes: no secret values.
  • No training, no retention by AI providers: Your data is not used to train models and is not retained by any external AI provider apart from the minimum necessary to answer the request.
  • Declared sub-processors: All AI providers (AWS Bedrock, Google Cloud Vertex AI, OpenAI) are formally declared sub-processors. Full list: gitguardian.com/legal/subprocessors
  • You are in control: You can route LLM calls through your own cloud (BYOC), or disable external LLM calls entirely.

GitGuardian's AI core principles

AI ethics principles

GitGuardian complies with all applicable AI laws and regulations and adheres to the following ethical principles:

  • Data protection and privacy: AI systems respect user privacy and comply with data protection regulations.
  • Accountability and responsibility: Clear ownership over AI outcomes with human-in-the-loop approvals and traceability.
  • Trustworthiness and integrity: Reliable systems with continuous improvement.

GitGuardian also respects all standard GitGuardian Security Policies, available on our Trust Center.

Protection of your data

Secret security is a core mission of GitGuardian. Every model and AI feature is designed, trained, evaluated, and maintained on internal benchmarks before rollout, then subject to manual validation.

GitGuardian uses a mix of proprietary, self-hosted models and, in some cases, validated third-party large language models.

Our systems are resilient to prompt injection by design. Most AI features implement one or more of the following safeguards:

  • Input/output separation: Input producers and output receivers are separated. An attacker cannot confirm whether their malicious input produced an effect, nor influence other users' responses.
  • Constrained and validated inputs and outputs: Models respond only within a predefined scope (e.g., Yes/No classification), preventing unintended actions or uncontrolled content.
  • Free-form context limited to the requester's allowed scope: RBAC and least-privilege by design guarantee strict tenant isolation.
  • Guardrails and monitoring: Defenses block injection attempts before they reach the model and catch unsafe behaviors after generation.

These safeguards are complemented by sensitive data redaction, safety filters (moderation on inputs/outputs), and input/output validators (schema and policy checks).

Compliance with the EU AI Act

GitGuardian does not develop general-purpose AI, and all AI implementations are purpose-built to improve Secrets and Non-Human Identity detection, classification, prioritization, and remediation — always with strict safeguards to protect customer data.

These implementations fall under a minimal-risk AI system according to the EU AI Act nomenclature. In practice, this means they present very limited or no risk to fundamental privacy, rights or safety, and therefore are permitted without additional compliance obligations beyond transparency. GitGuardian provides assurances with respect to its services' compliance with applicable laws and regulations through contractual protection.

Data usage & protection

To continuously improve secret detection, prioritization, and remediation, GitGuardian may use Customer data for service improvement only and Performance data for model and algorithm training.

  • Scope: Data is used exclusively for GitGuardian's internal model improvement. It is never used to train third-party models. Most data types can exist in two forms:
    • Public: already accessible to anyone on the internet (e.g., public GitHub or DockerHub files)
    • Internal: only accessible through access granted by Customer.
  • Data transformation: Before inclusion in training sets, data undergoes redaction of sensitive content (secrets, credentials, personal information).
  • Security practices: Data handling follows all GitGuardian security standards, including access controls and retention policies. Data from one customer cannot be exposed to another customer through model outputs.

(*) "Performance Data" means, where applicable, usage data and information compiled by GitGuardian on Customer's use of the Services, including Service Data, and statistical and performance information related to the provision and operation of the Services. Performance Data includes information concerning Customers' and Users' use of the various features and functionality of the Services and analytics and statistical data derived there from, and aggregated and anonymized data derived from Customer Data (where such data is processed in connection with the Services) so that such data does not identify a person.

Input and output usage & ownership

GitGuardian shall ensure that your inputs or outputs are not:

  • Available to other customers
  • Sent outside your data residency region (US or EU) during its processing by AI system
  • Sent to any third-party LLM provider other than our standard LLM Providers
  • Stored by any third-party LLM provider
  • Used to train or improve any third-party models

GitGuardian does not claim ownership of AI-generated output but retains a license right to inputs and outputs to improve the corresponding features. Customers retain full rights to use, modify, and distribute the output at their discretion.

GitGuardian shall indemnify customers against third-party intellectual property claims related to the use of AI-generated output, provided that the Customer: (i) has a valid right to use their inputs; (ii) has an active, paid subscription for the relevant AI feature; (iii) had no prior knowledge of the alleged infringement; (iv) The output has not been materially altered before the claim.

AI models & deployment strategies

GitGuardian uses a multi-tiered approach to balance performance and privacy:

  • Proprietary / Self-Hosted Models: Custom models for secret detection and prioritization, hosted entirely within GitGuardian's infrastructure.
  • Hybrid Models: Combining proprietary models with open-source LLMs hosted in our secure environment.
  • Third-Party LLMs: For specific features (e.g., chat, explanations), we integrate with trusted providers like AWS Bedrock and OpenAI via secure APIs with Zero Data Retention.

GitGuardian's AI system details

AI Processing PurposesSecuring Secrets and NHI and helping with remediation.
Proprietary or Third-Party AI System(s)Hybrid use of proprietary models and third-party LLMs (AWS Bedrock, Gemini, OpenAI).
AI Input

Depending on the AI Features (please refer to the AI Features Catalog below), AI Input may be:

  • Limited Secret Context (Public or Internal) — a snippet or document portion centered on the detected secret, with limited metadata (e.g., date, filename).
  • Performance Data — non-sensitive incident attributes such as whether the secret is valid, how many times it has leaked, its type/category, and severity classification (does not include the secret value, repository content, or context and feedback from user interaction).
  • User Input — only the text entered by the user in chat or queries.
  • Company description and keywords — high-level description and keywords describing the company (e.g., company name, domain), typically from OSINT sources.
  • Source Data (Public or Internal) — full access to current repository, project, or channel content (e.g., Git repos) and related metadata (descriptions, anonymized users, interactions).
  • GitGuardian Dashboard — information available in the Dashboard to the authenticated user, respecting RBAC and permissions.
AI Output TypeText and classification labels.
AI Output Disclaimers/WatermarksAI features are labelled as powered by AI in our docs but also when meaningful in the display.

GitGuardian's AI features catalog

For a detailed breakdown of each AI-powered capability, see AI-powered features.

Continuous improvement and monitoring

As our AI evolves, so will these principles. We'll keep reviewing and improving them to reflect the latest best practices in responsible AI.