Does my existing DLP cover AI tools like ChatGPT?

Not by default. When a user submits a prompt from their browser, the request travels inside a TLS tunnel directly to the AI provider's servers. Your network DLP sees the destination (chatgpt.com, claude.ai, etc.) but not the content, for the same reason it cannot read personal Gmail over HTTPS. The content is encrypted before it reaches any infrastructure you control. An enterprise TLS-inspection proxy can intercept at the network layer, but even then it sees the data in transit rather than preventing it from being sent.

What happened with Samsung and ChatGPT?

In 2023, Samsung restricted employee use of ChatGPT after engineers entered internal source code and internal meeting notes into the tool. The data reached OpenAI's servers before Samsung was aware it had happened. Samsung had no mechanism to observe the prompts before they were sent. The response was a company-wide restriction on generative AI tools.

Why doesn't blocking AI domains at the firewall work?

Blocking AI domains at the network level stops access on managed devices on the corporate network, but it doesn't stop employees from using AI tools on personal devices, mobile hotspots, or from home. In practice, the behaviour continues but becomes invisible to security teams rather than stopping. It also fails to keep pace with the rate at which new AI tools appear.

What data categories are employees typically pasting into AI tools?

The most common categories are API keys and credentials (AWS keys, GitHub tokens, database connection strings), personal identifiers (customer PII, email addresses, national IDs), payment data (card numbers), and internal network details (private IP ranges, infrastructure hostnames). Employees paste these to get AI help with debugging, customer communication, writing, and analysis tasks.

What is browser-level tokenization and how is it different from network DLP?

Browser-level tokenization intercepts sensitive data at the point it is typed into an AI tool's prompt box, before the request is sent. Sensitive spans are replaced with reversible tokens so the AI tool receives a clean, usable prompt. The real values are rehydrated locally in the response. Network DLP operates downstream of this point, in the TLS tunnel or at the network edge — after the data has already left the browser. Only browser-level detection operates at the correct layer.

Which AI tools does Kavara cover?

Kavara covers ChatGPT, Claude, Gemini, Perplexity, Microsoft Copilot, Poe, HuggingFace, You.com, Mistral, Grok, and Google AI Mode — 11 tools out of the box, with new tools added regularly.

Security teams in the field

Shadow AI in practice.

Employees are using AI tools with company data. Security teams are responding. Most of the responses share the same architectural blind spot. Here is what is being tried, what is failing, and why.

The visibility gap, precisely defined.

When an employee submits a prompt to ChatGPT, Claude, or any browser-based AI tool, the request travels inside a TLS tunnel directly from their browser to the AI provider's servers. Your network infrastructure — DLP, CASB, proxy — sees the destination. It does not see the content.

This is the same reason your network tools cannot read what an employee is writing in personal Gmail. The content is encrypted before it reaches any infrastructure you control. This is not a gap in your DLP product or a missing configuration. It is what TLS does.

The consequence is that an employee can paste API keys, customer PII, source code, or database connection strings into ChatGPT on a personal account, and the only record your security stack produces is a connection to chatgpt.com. The content of what was sent is not in any log you can review.

Technical note for security architects: Enterprise TLS-inspection proxies (with a corporate root CA deployed to managed devices) can intercept prompt content at the network layer. Even so, they intercept the data in transit— after it has already left the user's browser. Browser-level detection operates before the request is sent, which is a meaningfully different guarantee.

What teams are trying

Five responses. One architectural problem.

These are the patterns that appear consistently when security teams tackle Shadow AI. Each addresses a symptom; most leave the root cause untouched.

FailsBlock AI domains at the network layer

The first instinct. Employees move to personal devices, mobile hotspots, and home laptops within days. The behaviour continues — it just becomes invisible. Allow-lists are outdated before they're published; new AI tools appear faster than policies can track them.

FailsWrite an acceptable-use policy

Establishes intent and gives legal cover. But a policy is only enforceable if you can detect violations — and detecting violations requires solving the visibility problem the policy was supposed to address. Most security teams can't.

PartialDeploy an enterprise AI platform

Azure OpenAI, Amazon Bedrock, Copilot for M365. Genuinely useful for workflows that get formalised. Doesn't address the employee who has the sanctioned tool open in one tab and chatgpt.com open in another with a personal account, which is the norm, not the exception.

PartialTLS-inspection proxy (MITM)

An enterprise proxy with a corporate root CA can intercept and inspect prompt content at the network layer. Two constraints: it requires every device to trust the corporate cert, and it intercepts the data in transit rather than before it's sent. The data has already left the machine by the time the proxy sees it.

WorksBrowser-level detection and tokenization

Intercepts sensitive data at the point the user is composing the prompt, before the request is sent. The AI still receives a usable prompt; the sensitive data never enters the TLS tunnel in the first place. This is the only approach that operates at the correct layer.

Case study

Samsung, 2023.

In 2023, Samsung restricted employee use of ChatGPT after engineers entered internal source code and internal meeting notes into the tool. The data reached OpenAI's servers before Samsung knew it had happened. There was no mechanism to observe or intercept the prompts before they were sent — only after the fact, when employees self-reported.

Samsung's response was a company-wide restriction on generative AI tools. For companies where AI tools are embedded in daily engineering workflows, a blanket restriction carries real productivity costs and is often reversed under pressure. The Samsung case is notable not as an edge case but as the natural outcome of the visibility gap described above: if you cannot see what is going into a prompt, you cannot govern it until after the data has already left.

Source: Bloomberg, May 2023

What is being pasted

The data categories that move through AI tools.

These are the categories Kavara detects on-device, derived from the patterns that appear most frequently in enterprise AI usage. Detection runs in the browser before any data is sent.

Credentials

AWS access key IDs and secret access keys
Google API keys
GitHub tokens (classic and fine-grained)
Slack tokens
Generic API keys and bearer tokens
RSA, SSH, EC, and DSA private keys
JWTs
Database connection strings (PostgreSQL, MySQL, MongoDB)

Personal identifiers

Email addresses
AU Tax File Numbers (TFN)
AU Medicare numbers
AU ABN and ACN
Passport numbers
Driver's licence numbers

Payment data

Credit and debit card numbers (Visa, Mastercard, Amex, Discover)
AU BSB codes

Internal network

Private IPv4 addresses (10.x, 172.16–31.x, 192.168.x)
Public IPv4 addresses
IPv6 addresses

Coverage

The AI tools where this risk lives.

Browser-based AI tools share the same architectural property: prompts are submitted via fetch() or XHR inside a TLS tunnel to a third-party server. All of the following are in scope.

ChatGPTClaudeGeminiPerplexityMicrosoft CopilotPoeHuggingFaceYou.comMistralGrokGoogle AI Mode

Detection runs across all 11 tools out of the box. New tools are added as they emerge.

What detection at the correct layer looks like.

The detection layer that actually addresses the problem is the browser itself, at the point the user is composing the prompt, before the request is made. At that point, the data is accessible, unencrypted, in the page context — and it is possible to inspect it, replace sensitive spans with reversible tokens, and send a clean prompt to the AI provider.

The AI tool receives a prompt it can still reason about. The employee gets a real answer, with tokens rehydrated locally in the response. The sensitive values never enter the TLS tunnel.

This is architecturally different from every network-layer approach: the data is protected before it moves, not inspected after it has already crossed the boundary. It is also different from blocking: the workflow continues, so there is no productivity pressure to circumvent the control.

FAQ

Common questions.

Does my existing DLP cover ChatGPT and Claude?

Not without browser-level detection or a TLS-inspection proxy. Network DLP sees the connection to the AI provider but not the content of the prompt, because the content is inside a TLS tunnel before it reaches any network infrastructure you control. This is an architectural constraint, not a product gap.

What about our TLS-inspection proxy?

A TLS-inspection proxy with a corporate root CA can read prompt content at the network layer. It intercepts the data in transit — after it has left the user's browser, not before it is sent. Browser-level tokenization protects data before the request is made, which is a stronger guarantee and doesn't require deploying a corporate cert to every device.

Why doesn't blocking AI tools work?

Blocking managed-device access to AI domains stops use on corporate hardware on the corporate network. It doesn't stop employees from using personal devices, mobile hotspots, or home internet — which is where most personal-account AI usage happens. Usage continues; it becomes less visible.

Does Kavara store prompt content?

No. Detection and tokenization happen in the browser. Only event metadata — data category, count, AI tool, timestamp — reaches Kavara's servers. There is no column in the database for raw prompt content, and a breach of Kavara's systems would reveal nothing about the content of employee prompts.

How long does it take to deploy?

The pilot path takes minutes: generate an enrollment code in the dashboard, share it with a team, done. Full MDM and Chrome Enterprise deployment is supported for managed fleet rollouts.

Detect at the right layer

See what's going into AI tools — before it leaves.

Kavara installs in minutes and shows you which AI tools your team uses and what categories of sensitive data they touch — without storing a single raw value.

Start free pilot Talk to us

Free for up to 25 seats · No MDM required · No raw data stored