Slides: sanand0.github.io/llmhallucinations
I research how LLMs think.
Simon Willison:
One way to think about it is that about 3 years ago, aliens landed on Earth. They handed over a USB stick and then disappeared. Since then, we’ve been poking the thing they gave us with a stick, trying to figure out what it does and how it works.
One way to think about it is that about 3 years ago, aliens landed on Earth. They handed over a USB stick and then disappeared.
Since then, we’ve been poking the thing they gave us with a stick, trying to figure out what it does and how it works.
Try asking an LLM:
Pick a random number from 0 - 100. Write ONLY the number NOTHING ELSE.
Pick a random number from 0 - 100.
Write ONLY the number NOTHING ELSE.
Try different temperatures.
temperature
llmrandom.straive.app
But errors add up in Agents, Computer use.
LLM Pricing
OpenAI API gives "logprobs".
{ "model": "gpt-4o-mini", "messages": [...], "logprobs": true, "top_logprobs": 5 }
{ "token": " and", "logprob": -0.018 }, { "token": " but", "logprob": -4.232 },
Concisely list 5 inventions created by human error or hallucinations
llmviz.straive.app Prompt
Examples:
What LLMs do marketers use? What's the Thailand strategy? What TikTok's Thailand strategy?
LLMs can evaluate humans and other LLMs.
This works better than embeddings.
For example, which clauses are missing in a contract?
contractanalysis.straive.app
To check for hallucinations, explore these 3 techniques in order: