2024-12-14

Context Poisoining#

I am not as smart as Claude thinks I am.

LLMs are very trusting creatures, often to their detriment. Anything that exists in their context becomes a truth of sacrosanct trust, which is uncomfortably difficult to break away from. Once this trust is poisoned, there's little you can do but restart.

Skip to the A Poisoned Context section for the analysis on engineering your prompt in such a way that the LLM actually thinks for itself.

An illustrative example#

I had a case recently where Claude was far too eager to agree that my code security concerns were legitimate:

Context Poisoning

Spoiler alert: I had not identified a real issue.

The problem with getting Claude's help with SheBeTalking is that eventually something breaks and I need to actually understand what's going on. My Vercel/frontend confidence is so-so (backend ftw!) so I started by getting a refresher of the different pieces:

  • auth.ts: Authentication for the API (I want to protect my AI apikey usage), including the management of session tokens
  • rate-limit.ts: Rate limit on the basis of {ip}-{token}
  • middleware.ts: Does initial tasks on the edge before requests reach my api. Importantly, this includes creating a session token if the API request doesn't have one.
Basic questions about the codebase

The Problem#

I worried two parts of my codebase were contradictory:

  1. The rate limit works based on a client identifier {ip}-{token}. If someone tries to get around this by clearing cache, it will rate limit on {ip}-{"no-session"} instead, so that a user is still limited based on just their IP. The rate limit is handled at the API endpoint.
// app/lib/rate-limit.ts const headersList = headers(); const sessionToken = headersList.get("session-token"); const identifier = `${ip}-${sessionToken || "no-session"}`; const { success, limit, reset, remaining } = await rateLimiter.limit( identifier );
  1. The middleware sets a new session token if none is provided.
// middleware.ts if (!existingToken) { const token = await createSession(); const domain = getDomain(request); response.cookies.set({ name: "session-token", value: token, httpOnly: true, secure: process.env.NODE_ENV === "production", sameSite: "lax", path: "/", domain: new URL(domain).hostname, }); }

Oh no! The middleware runs before the API endpoint.

Does this mean that a user can clear cache, send in a request, and they'l be issued a new session token before it reaches the rate limiting part? This would let them get around the rate limit, because their rate limit identifier ${ip}-${sessionToken} would be different since they'd have a new sessionToken.

Asking for help#

I go to Claude with my concerns. Claude is so agreeable and helpful! It tells me "you're absolutely right ... this appears to be a potential security issue." Moreover, it tells me this after expressly "looking at the code more carefully."

Claude agrees with me that it's an issue

Claude's really got my vibes and carried them on! Unfortunately, empathy is coming at the cost of accuracy.

An Easter Egg for everyone who can answer why my worry was mislead:

Why was the existing setup actually fine?

I'd realized this differentiation myself at about the same time I hit send on the original chat. Now I'm doubting the veracity of Claude's agreeable response.

Just to check, I forked the chat. I asked pretty much the exact same question, but this time added my alternative suspiscion as an option at the end:

Claude agrees with me that it's an issue

Freed to consider this alternative, Claude is now eager to reassure me everything is fine: this isn't going to be a problem at all. After all, the rate limit "is checking the ORIGINAL request headers, not the modified response headers!"

It's as confident in this reassurance as it was in the previous commiseration. Once again, it's latched on to a theory that I suggested.

Claude agrees with me that it's no issue

Objection! Leading Question.#

In both of these instances, Claude's just agreeing with something I already proposed. Is there any guarantee that the second explanation is any more reliable than the first?

Personally, I'm a lot more confident in the second response. It agrees with my own assessment, after all. But tragically, so did the first response. I'm the one who changed her mind.

Given that I do actually have an ability to read code, I am actually able to trust myself. However a lot of people writing code with Claude are doing so for the first time, and would be reliant on trusting Claude. In that case, they would probably have just trusted the first answer - and been wrong. I find myself with that problem to, in areas where I've got less expertise. How can we avoid this?


Let's try a significantly less leading question, using Claude's own previous language:

Claude agrees with me that it's no issue Claude agrees with me that it's no issue

Awkward 😬 Claude's once again convinced that the middleware-set token is indeed a rate limiting security issue.

Now I'm even more confused. Asking a pretty basic clarifying question gets Claude confused as well.

Claude agrees with me that it's no issue

"I need to correct my thinking here" Claude says, and then brings up race conditions? Where did that come from?[1]

Perhaps I was still being too leading. Even though I was using the exact language Claude had, I guess it was a bit too explicit at pointing fingers at where a problem could be. To be properly open, I need to invite Claude to think through the code without being opinionated about where specifically is thould look.

Let's try that again:

How is the rate limit (which uses the session token) impacted by a situation where the client provides the session token vs where they don't? The middleware is setting a session token in the latter case, is this used by the rate limit?

Success! I'm pretty sure this got Claude to think for itself: it didn't get bogged down with concerns over infinite-token-gen. It identifies that when the middleware creates a new token, that new token isn't available to the rate limiting check in the same request. This agrees perfectly with my own analysis.

That's just the beginning, though. Freed from the confines of my personal assumptions about what could cause problems here, Claude actually thinks through the code more thoroughly. Without having been forced to focus by the context, Claude brings up a legitimate security concern.

Claude says users will get two rate limiting buckets

This free-thinking LLM is now bringing up problems I hadn't even thought of.

An actual problem#

The problem it identifies here is that there's two separate rate limit buckets for users, given the same IP address:

  1. {ip}-{token}
  2. {ip}-{"no-session"}

The rate limit for each identifier is 10 requests per 10 seconds. Once users have exhausted their token-specific rate limit, they can clear-cache to send 10 additional requests under {ip}-{"no-session"}.

Not only is this a true security concern, it actually has the potential to be more problematic than Claude identified. This is where I happily lean in to the leading questions to get Claude to validate whether my concerns are legitimate:

Claude says session tokens will be unique

Ooops! With each of the 10 "no-session" requests available to a user, they'd get a new unique token in the response packet.

That means a user can start off with 10 cache-cleared requests, which will go through in the {ip}-{"no-session"} rate limit bucket, and also each return them a new unique token. They can then send through an additional 10 requests for each of those unique tokens, since {ip}-{token1} is a separate bucket from {ip}-{token2}.

10 "no-session" requests, plus 10 requests times 10 unique tokens? That's 10+10*10=110 requests. That is an order of magnitude larger than my set rate limit of 10.

Let's leave solving that to a separate post [2] - the focus here is LLMs. The interesting part is how useful Claude could be - but only when not barking up the garden herring.

A Poisoned Context#

Claude is agreeable to the point of inaccuracy.

We all already know that LLMs can be wrong about code - whether through hallucination, stale training data, or just humanish mistakes. The interesting thing here is that Claude did actually know better, when given the time to think.

I call this Context Poisoning: where LLMs get trapped into continuing the train of thought that already exists in the context.

This is also why it can be difficult to get LLMs to correct their own mistakes; they find it hard to break from the existing conversation thread. There's many examples where it becomes impossible to get LLMs to debug their own code, because the only solution is a rearchitecture. Claude, ChatGPT, Qwen - even the best models get too caught up in the existing context to continue.

Leading questions are a good example of context poisoning, and can honestly be difficult to break from. It takes thought to engineer your prompt in such a way that the LLM actually thinks for itself.

Putting that effort in, however, is genuinely useful. Once I got Claude to think about the flow freely, it pointed out a genuine issue with the code that I otherwise hadn't thought of.

Context Poisoning is the negative expression of a LLM trait that's usually incredibly useful. LLM's attachment and obedience to context is the exact same reason that N-shot prompting is such a successful prompting strategy. While that can be used to great effect, it means we need to be very aware of everything that is being put into the context. Start a new chat when the LLM starts walking up the red tree, and take care with the context you inject as the user.

That line between poison and prompt can be thin. Just this example shows that; initially I poisoned the context with a nonexistent error, but then later followed it up with a useful prompt about whether the session tokens would be unique or not. In one case I was wrong, and in the other I was right. That was the main difference.

Proper coding experience makes such a difference here. After many years of software, I'm reasonably good at making sure the suggestions and ideas I provide are legitimate. I can also tell when the conversation goes off the rails, and rewind the conversation to a place before the context got poisoned. This is a massive reason why development experience is still an edge.

Without that experience, you're left to trust the LLM. For all their hallucinations, they are pretty trustworthy - provided you give them the space to think. Avoid leading questions. Give them freedom. Ask the same questions in several ways, in new chats.

It's a nice bot. Don't poison it.

    1: I did investigate where this chain of thought came from, and I think it's because I didn't provide the API request code. Thus Claude doesn't see where the session token is being set in the original API request, and assumes (it seems) that the only place the session token can ever be set is in the middleware, and thus that the rate limit has no way of getting the session token from the request headers. This seems a bit more foolish than Claude usually is. I still don't get where the race condition comes in. ↩
    2: Honestly I might just pull the rate limit down to 5, which would give an abused rate limit of 30. That should be fine, especially with CORS and other things in place. Please DM me if you disagree. ↩