LLMs are very trusting creatures, often to their detriment. Anything that exists in their context becomes a truth of sacrosanct trust, which is uncomfortably difficult to break away from. Once this trust is poisoned, there's little you can do but restart.
Skip to the A Poisoned Context section for the analysis on engineering your prompt in such a way that the LLM actually thinks for itself.
I had a case recently where Claude was far too eager to agree that my code security concerns were legitimate:
Spoiler alert: I had not identified a real issue.
The problem with getting Claude's help with SheBeTalking is that eventually something breaks and I need to actually understand what's going on. My Vercel/frontend confidence is so-so (backend ftw!) so I started by getting a refresher of the different pieces:
auth.ts: Authentication for the API (I want to protect my AI apikey usage), including the management of session tokens
rate-limit.ts: Rate limit on the basis of {ip}-{token}
middleware.ts: Does initial tasks on the edge before requests reach my api. Importantly, this includes creating a session token if the API request doesn't have one.
I worried two parts of my codebase were contradictory:
The rate limit works based on a client identifier {ip}-{token}. If someone tries to get around this by clearing cache, it will rate limit on {ip}-{"no-session"} instead, so that a user is still limited based on just their IP. The rate limit is handled at the API endpoint.
Oh no! The middleware runs before the API endpoint.
Does this mean that a user can clear cache, send in a request, and they'l be issued a new session token before it reaches the rate limiting part? This would let them get around the rate limit, because their rate limit identifier ${ip}-${sessionToken} would be different since they'd have a new sessionToken.
I go to Claude with my concerns. Claude is so agreeable and helpful! It tells me "you're absolutely right ... this appears to be a potential security issue." Moreover, it tells me this after expressly "looking at the code more carefully."
Claude's really got my vibes and carried them on! Unfortunately, empathy is coming at the cost of accuracy.
An Easter Egg for everyone who can answer why my worry was mislead:
Why was the existing setup actually fine?
I'd realized this differentiation myself at about the same time I hit send on the original chat. Now I'm doubting the veracity of Claude's agreeable response.
Just to check, I forked the chat. I asked pretty much the exact same question, but this time added my alternative suspiscion as an option at the end:
Freed to consider this alternative, Claude is now eager to reassure me everything is fine: this isn't going to be a problem at all. After all, the rate limit "is checking the ORIGINAL request headers, not the modified response headers!"
It's as confident in this reassurance as it was in the previous commiseration. Once again, it's latched on to a theory that I suggested.
In both of these instances, Claude's just agreeing with something I already proposed. Is there any guarantee that the second explanation is any more reliable than the first?
Personally, I'm a lot more confident in the second response. It agrees with my own assessment, after all. But tragically, so did the first response. I'm the one who changed her mind.
Given that I do actually have an ability to read code, I am actually able to trust myself. However a lot of people writing code with Claude are doing so for the first time, and would be reliant on trusting Claude. In that case, they would probably have just trusted the first answer - and been wrong. I find myself with that problem to, in areas where I've got less expertise. How can we avoid this?
Let's try a significantly less leading question, using Claude's own previous language:
Awkward 😬 Claude's once again convinced that the middleware-set token is indeed a rate limiting security issue.
Now I'm even more confused. Asking a pretty basic clarifying question gets Claude confused as well.
"I need to correct my thinking here" Claude says, and then brings up race conditions? Where did that come from?[1]
Perhaps I was still being too leading. Even though I was using the exact language Claude had, I guess it was a bit too explicit at pointing fingers at where a problem could be. To be properly open, I need to invite Claude to think through the code without being opinionated about where specifically is thould look.
Let's try that again:
Success! I'm pretty sure this got Claude to think for itself: it didn't get bogged down with concerns over infinite-token-gen. It identifies that when the middleware creates a new token, that new token isn't available to the rate limiting check in the same request. This agrees perfectly with my own analysis.
That's just the beginning, though. Freed from the confines of my personal assumptions about what could cause problems here, Claude actually thinks through the code more thoroughly. Without having been forced to focus by the context, Claude brings up a legitimate security concern.
This free-thinking LLM is now bringing up problems I hadn't even thought of.
The problem it identifies here is that there's two separate rate limit buckets for users, given the same IP address:
{ip}-{token}
{ip}-{"no-session"}
The rate limit for each identifier is 10 requests per 10 seconds. Once users have exhausted their token-specific rate limit, they can clear-cache to send 10 additional requests under {ip}-{"no-session"}.
Not only is this a true security concern, it actually has the potential to be more problematic than Claude identified. This is where I happily lean in to the leading questions to get Claude to validate whether my concerns are legitimate:
Ooops! With each of the 10 "no-session" requests available to a user, they'd get a new unique token in the response packet.
That means a user can start off with 10 cache-cleared requests, which will go through in the {ip}-{"no-session"} rate limit bucket, and also each return them a new unique token. They can then send through an additional 10 requests for each of those unique tokens, since {ip}-{token1} is a separate bucket from {ip}-{token2}.
10 "no-session" requests, plus 10 requests times 10 unique tokens? That's 10+10*10=110 requests. That is an order of magnitude larger than my set rate limit of 10.
Let's leave solving that to a separate post [2] - the focus here is LLMs. The interesting part is how useful Claude could be - but only when not barking up the garden herring.
We all already know that LLMs can be wrong about code - whether through hallucination, stale training data, or just humanish mistakes. The interesting thing here is that Claude did actually know better, when given the time to think.
I call this Context Poisoning: where LLMs get trapped into continuing the train of thought that already exists in the context.
This is also why it can be difficult to get LLMs to correct their own mistakes; they find it hard to break from the existing conversation thread. There's many examples where it becomes impossible to get LLMs to debug their own code, because the only solution is a rearchitecture. Claude, ChatGPT, Qwen - even the best models get too caught up in the existing context to continue.
Leading questions are a good example of context poisoning, and can honestly be difficult to break from. It takes thought to engineer your prompt in such a way that the LLM actually thinks for itself.
Putting that effort in, however, is genuinely useful. Once I got Claude to think about the flow freely, it pointed out a genuine issue with the code that I otherwise hadn't thought of.
Context Poisoning is the negative expression of a LLM trait that's usually incredibly useful. LLM's attachment and obedience to context is the exact same reason that N-shot prompting is such a successful prompting strategy. While that can be used to great effect, it means we need to be very aware of everything that is being put into the context. Start a new chat when the LLM starts walking up the red tree, and take care with the context you inject as the user.
That line between poison and prompt can be thin. Just this example shows that; initially I poisoned the context with a nonexistent error, but then later followed it up with a useful prompt about whether the session tokens would be unique or not. In one case I was wrong, and in the other I was right. That was the main difference.
Proper coding experience makes such a difference here. After many years of software, I'm reasonably good at making sure the suggestions and ideas I provide are legitimate. I can also tell when the conversation goes off the rails, and rewind the conversation to a place before the context got poisoned. This is a massive reason why development experience is still an edge.
Without that experience, you're left to trust the LLM. For all their hallucinations, they are pretty trustworthy - provided you give them the space to think. Avoid leading questions. Give them freedom. Ask the same questions in several ways, in new chats.