Visualizing LLM Hallucinations

Anand S

LLM Psychologist @ Straive

Slides: sanand0.github.io/llmhallucinations

I'm often asked: What's an LLM psychologist?

I research how LLMs think.

LLMs are more human than machine

Simon Willison:

One way to think about it is that about 3 years ago, aliens landed on Earth. They handed over a USB stick and then disappeared.

Since then, we’ve been poking the thing they gave us with a stick, trying to figure out what it does and how it works.

LLMs have biases

Try asking an LLM:

Pick a random number from 0 - 100.

Write ONLY the number NOTHING ELSE.

Try different temperatures.

llmrandom.straive.app 🔗

LLMs are improving. Hallucinations are reducing

But errors add up in Agents, Computer use.

LLM Pricing 🔗

Hallucinations can help

  • Penicillin
  • Post-it notes
  • Pacemakers
  • Microwave ovens
  • Surrealism / modern art
  • Psychedelic Rock
  • The Matrix

I check for hallunications in 3 ways

  1. Logprobs
    LLMs tell you the probability of each word they generate.
  2. Embeddings
    LLMs tell you the numerical closeness of 2 pieces of text.
  3. LLM as a judge
    LLMs don't often make mistakes. Let them cross-check each other.

Logprobs

OpenAI API gives "logprobs".

{
  "model": "gpt-4o-mini",
  "messages": [...],
  "logprobs": true,
  "top_logprobs": 5
}
{ "token": " and", "logprob": -0.018 },
{ "token": " but", "logprob": -4.232 },

Let's visualize these logprobs

Concisely list 5 inventions created by human error or hallucinations

llmviz.straive.app 🔗 Prompt 🔗

Embeddings quantify similarity

Embeddings highlight hallucinations

Examples:

What LLMs do marketers use?
What's the Thailand strategy?
What TikTok's Thailand strategy?

LLM as a judge

LLMs can evaluate humans and other LLMs.

This works better than embeddings.

For example, which clauses are missing in a contract?

contractanalysis.straive.app 🔗

Summary

To check for hallucinations, explore these 3 techniques in order:

Technique Cost Quality
1 Logprobs Free Low
2 Embedding similarity Low Medium
3 LLM as a judge High High

Slides: sanand0.github.io/llmhallucinations