Anand's LinkedIn Archive

LinkedIn Profile

April 2025

People still write?
This is my decision tree for which model to use on #ChatGPT right now.

๐—ข๐Ÿฏ: Use by ๐—ฑ๐—ฒ๐—ณ๐—ฎ๐˜‚๐—น๐˜.
๐—ข๐Ÿฐ-๐—บ๐—ถ๐—ป๐—ถ-๐—ต๐—ถ๐—ด๐—ต: Use when ๐—ฐ๐—ผ๐—ฑ๐—ถ๐—ป๐—ด.
๐—š๐—ฃ๐—ง ๐Ÿฐ๐—ผ: Use for a ๐—พ๐˜‚๐—ถ๐—ฐ๐—ธ ๐—ฟ๐—ฒ๐˜€๐—ฝ๐—ผ๐—ป๐˜€๐—ฒ or to ๐—ฐ๐—ฟ๐—ฒ๐—ฎ๐˜๐—ฒ ๐—ถ๐—บ๐—ฎ๐—ด๐—ฒ.

Flowchart link: https://lnkd.in/gsHFzn5C
Anand Narayan the API does not support code interpreter yet. Also, it cannot use tools like search unless we enable them, which I did not. So in this case, it was thinking by itself.
How well can LLMs multiply numbers in their head?

I asked 50 LLMs to multiply 2 numbers:

1. 12 x 12
2. 123 x 456
3. 1,234 x 5,678
4. 12,345 x 6,789
5. 123,456 x 789,012
6. 1,234,567 x 8,901,234
7. 987,654,321 x 123,456,789

LLMs aren't good tools for math, and this is just an informal check. But the results are interesting:

๐—ข๐—ฝ๐—ฒ๐—ป๐—”๐—œ'๐˜€ ๐—ฟ๐—ฒ๐—ฎ๐˜€๐—ผ๐—ป๐—ถ๐—ป๐—ด ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐—ฐ๐—ฟ๐—ฎ๐—ฐ๐—ธ๐—ฒ๐—ฑ ๐—ถ๐˜, scoring 6/7, stumbling only on the 9-digit multiplication.
๐—ข๐—ฝ๐—ฒ๐—ป๐—”๐—œ'๐˜€ ๐—ผ๐˜๐—ต๐—ฒ๐—ฟ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐—ฎ๐—ป๐—ฑ ๐——๐—ฒ๐—ฒ๐—ฝ๐—ฆ๐—ฒ๐—ฒ๐—ธ ๐—ฉ๐Ÿฏ ๐˜„๐—ฒ๐—ฟ๐—ฒ ๐—ป๐—ฒ๐˜…๐˜, getting the first 5/7 right. Notably: GPT 4.1 Mini beat GPT 4.1. DeepSeek V3 beat DeepSeek R1.
16 models, including the latest Gemini, Anthropic, and Llama models get 4/7 right.
The Amazon models, older Llama, Anthropic, Google, OpenAI models get 3 or less right.

๐— ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐˜‚๐˜€๐—ฒ ๐—ต๐˜‚๐—บ๐—ฎ๐—ป-๐—น๐—ถ๐—ธ๐—ฒ ๐—บ๐—ฒ๐—ป๐˜๐—ฎ๐—น ๐—บ๐—ฎ๐˜๐—ต ๐˜๐—ฟ๐—ถ๐—ฐ๐—ธ๐˜€.
For example, O3-Mini-High calculated 1234567 ร— 8901234 using a recursive strategy.
DeepSeek V3 double-checks results and hallucinates a "reliable computational tool".
O3 Mini reframes 8901234 as (9000000 โˆ’ 98766) to simplify the calculation.

Explore the results at https://lnkd.in/gqnXhTyq and the repo at https://lnkd.in/gruKgds9
Vishnu Agnihotri The least effort approach, I think, is to add a follow-up prompt. "Fact check each statement above citing links.\
What percentage of seats does the #Singapore People's Action Party win?

Normally, this is a 2-hour programmatic data-scraping + data visualization exercise, ideal for a data journalism class.

Now, it's a 2-minute question to O3-Mini-High.

๐˜š๐˜ฆ๐˜ข๐˜ณ๐˜ค๐˜ฉ ๐˜ฐ๐˜ฏ๐˜ญ๐˜ช๐˜ฏ๐˜ฆ ๐˜ง๐˜ฐ๐˜ณ ๐˜ต๐˜ฉ๐˜ฆ ๐˜ฉ๐˜ช๐˜ด๐˜ต๐˜ฐ๐˜ณ๐˜ช๐˜ค๐˜ข๐˜ญ ๐˜ณ๐˜ฆ๐˜ด๐˜ถ๐˜ญ๐˜ต๐˜ด ๐˜ฐ๐˜ง ๐˜ข๐˜ญ๐˜ญ ๐˜ต๐˜ฉ๐˜ฆ ๐˜š๐˜ช๐˜ฏ๐˜จ๐˜ข๐˜ฑ๐˜ฐ๐˜ณ๐˜ฆ ๐˜ฆ๐˜ญ๐˜ฆ๐˜ค๐˜ต๐˜ช๐˜ฐ๐˜ฏ๐˜ด ๐˜ข๐˜ฏ๐˜ฅ ๐˜ด๐˜ฉ๐˜ฐ๐˜ธ ๐˜ฎ๐˜ฆ ๐˜ข ๐˜ต๐˜ข๐˜ฃ๐˜ญ๐˜ฆ ๐˜ข๐˜ฏ๐˜ฅ ๐˜ค๐˜ฉ๐˜ข๐˜ณ๐˜ต ๐˜ฐ๐˜ง ๐˜ต๐˜ฉ๐˜ฆ ๐˜ฏ๐˜ถ๐˜ฎ๐˜ฃ๐˜ฆ๐˜ณ ๐˜ข๐˜ฏ๐˜ฅ ๐˜ฑ๐˜ฆ๐˜ณ๐˜ค๐˜ฆ๐˜ฏ๐˜ต๐˜ข๐˜จ๐˜ฆ ๐˜ฐ๐˜ง ๐˜ต๐˜ฉ๐˜ฆ ๐˜ด๐˜ฆ๐˜ข๐˜ต๐˜ด ๐˜ธ๐˜ฐ๐˜ฏ ๐˜ฃ๐˜บ ๐˜—๐˜ฆ๐˜ฐ๐˜ฑ๐˜ญ๐˜ฆ'๐˜ด ๐˜ˆ๐˜ค๐˜ต๐˜ช๐˜ฐ๐˜ฏ ๐˜—๐˜ข๐˜ณ๐˜ต๐˜บ.

Chat link: https://lnkd.in/gNzygmNN

It "manually" read the Wikipedia page for each election, then wrote a Python script to draw the chart.

Now, a non-psephologist like me can explore implications rather than process. Like:

- Why a 1963 slump then instant sweep?
- Four consecutive 100 % victories (1965 - 1980!)
- Seat count growth masks percentage dips?
- 2020 is PAP's lowest seat share in six decades -- which is still 89%!

A big win for #datajournalism

PS: My stream is filled with posts like these. "Earlier, this would have cost $1000X or taken 100X more time." The subtler point is, "Earlier, it wasn't practical." This isn't efficiency. It's ๐—ฎ๐—น๐—ฐ๐—ต๐—ฒ๐—บ๐˜†.
Ganes Kesari Yes. Not just techniques but code as well. Claude 3.5 Sonnet auto-"corrected" my code using the newer OpenAI Responses API to the older Chat Completions API. That can be problematic.
I tried it with GPT 4: https://chatgpt.com/share/680625cd-bb48-800c-aac1-8bb50d1e362f -- it errored out 3 times and took 5X longer. Then it created a grid chart with no content in it.

O3 Mini High has lower cost, higher quality & speed (benefit) compared to GPT-4 -- which is the cost benefit trade-off.

Also, OpenAI is removing GPT-4 on 30 Apr. So, there's no choice.
Imagine getting a personalized expert who can brief you in one minute, not ten.

With O3 & O4ย Miniโ€™s builtโ€‘in search + memory, ChatGPT

- ๐—ฅ๐—ฒ๐—บ๐—ฒ๐—บ๐—ฏ๐—ฒ๐—ฟ๐˜€ ๐˜†๐—ผ๐˜‚ with your past chats and context.
- ๐—ฅ๐—ฒ๐—ฎ๐˜€๐—ผ๐—ป๐˜€ ๐—ฎ๐˜ ๐—น๐—ฒ๐—ป๐—ด๐˜๐—ต, reading ๐˜ฆ๐˜ข๐˜ค๐˜ฉ site step by step, without handโ€‘holding.
- ๐—ฅ๐˜‚๐—ป๐˜€ ๐—ผ๐—ป ๐˜๐—ต๐—ฒ ๐—ด๐—ผ. voiceโ€‘first learning while you walk, cycle, or shop.

For example, I asked it to

1. Read this week's top Hacker News links ๐˜๐—ต๐—ฎ๐˜ ๐—œ'๐—ฑ ๐—น๐—ถ๐—ธ๐—ฒ
2. Explain stuff ๐—œ ๐—ฑ๐—ผ๐—ป'๐˜ ๐—ธ๐—ป๐—ผ๐˜„ and how ๐—œ ๐—ฐ๐—ฎ๐—ป ๐˜‚๐˜€๐—ฒ it

This took 1 minute, not ten. It returned 1 page, not 10. This is ๐—พ๐˜‚๐—ถ๐—ฐ๐—ธ ๐—ฟ๐—ฒ๐˜€๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต, not ๐—ฑ๐—ฒ๐—ฒ๐—ฝ ๐—ฟ๐—ฒ๐˜€๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต. And I learnt ๐—บ๐˜‚๐—ฐ๐—ต more because it was personalized, covering what I didn't know and how to apply it. https://lnkd.in/gGu8Ngi8.

It's not just for news. I used it with code. "Go through OpenAI's Codex CLI repo and teach me things I don't know." Again, ๐—ณ๐—ฎ๐—ป๐˜๐—ฎ๐˜€๐˜๐—ถ๐—ฐ: https://lnkd.in/gMiy3Fc6

This unlocks several new sources I wouldn't learn from earlier. Patents. Test cases. Earnings transcripts. Legal judgements.

This also opens a door to endless curiosity. Thereโ€™s no limit to what we can explore. Curiosity is the competitive advantage, now.

Blog post: https://lnkd.in/ghqpfFRh
O4 Mini replaced Excel for quick analysis and visualizations. At least for me.

I grabbed our LLM Foundry downtime logs with a oneโ€‘liner in my browser console, then fed them to O4โ€‘Miniโ€‘High. In under 180 seconds it gave me:

- A downloadable CSV of every outage
- A neat grid chart (rows=hour, columns=date) with one circle per event
- Let me style it, e.g. mild red circles at 50% transparency, like "Sun 20 Apr

What makes these new models powerful is that they can ๐—ฟ๐—ฒ๐—น๐—ถ๐—ฎ๐—ฏ๐—น๐˜† (i.e. without supervision) run multiple tools like:

๐—ฆ๐—˜๐—”๐—ฅ๐—–๐—›. It can scrape the data from the downtime website.
๐—ฅ๐—˜๐—”๐—ฆ๐—ข๐—ก. It planned the CSV conversion, data transformation, and visualization design.
๐—–๐—ข๐——๐—˜. It's first iteration was perfect.
๐—˜๐—ซ๐—˜๐—–๐—จ๐—ง๐—˜. This is powerful. It runs the code itself.

I never touched the code. As someone whoโ€™s "good at charts and code," I need to find new skills.

Blog: https://lnkd.in/g49pyRBp

๐—ฃ๐—น๐—ฒ๐—ฎ๐˜€๐—ฒ ๐˜๐—ฟ๐˜† ๐—ถ๐˜ ๐—ผ๐˜‚๐˜. Have O4 Mini visualize some of your data and share!
After seeing David McCandless' post "Which country is across the ocean?" https://lnkd.in/g9KzppEQ I was curious which country you would reach if you tunneled below in a straight line (the antipode).

This is a popular visualization, but I wanted to see if I could get the newer OpenAI models to create the visual without me ๐—ฟ๐˜‚๐—ป๐—ป๐—ถ๐—ป๐—ด any code (i.e. I just want the answer.) After a couple of iterations, O3 did a great job with this prompt:

๐™ฑ๐šž๐š’๐š•๐š ๐šŠ _๐šœ๐š’๐š—๐š๐š•๐šŽ_ ๐™ถ๐šŽ๐š˜๐™น๐š‚๐™พ๐™ฝ (๐™ด๐™ฟ๐š‚๐™ถ:๐Ÿบ๐Ÿน๐Ÿธ๐Ÿผ) ๐š๐š‘๐šŠ๐š ๐šœ๐š‘๐š˜๐š ๐šœ, ๐š๐š˜๐š› ๐šŽ๐šŠ๐šŒ๐š‘ ๐šŒ๐š˜๐šž๐š—๐š๐š›๐šข, ๐š˜๐š—๐š•๐šข ๐š๐š‘๐šŽ ๐š™๐šŠ๐š›๐š๐šœ ๐š˜๐š ๐š’๐š๐šœ ๐šŠ๐š—๐š๐š’๐š™๐š˜๐š๐šŽ ๐š๐š‘๐šŠ๐š ๐š•๐š’๐šŽ ๐š˜๐šŸ๐šŽ๐š› ๐š˜๐šŒ๐šŽ๐šŠ๐š—. ๐™ฒ๐šŠ๐š›๐šŽ๐š๐šž๐š•๐š•๐šข ๐š‘๐šŠ๐š—๐š๐š•๐šŽ ๐šŒ๐š˜๐šž๐š—๐š๐š›๐š’๐šŽ๐šœ ๐š๐š‘๐šŠ๐š ๐šœ๐š๐š›๐šŠ๐š๐š๐š•๐šŽ ๐š๐š‘๐šŽ ๐š™๐š›๐š’๐š–๐šŽ ๐š–๐šŽ๐š›๐š’๐š๐š’๐šŠ๐š— - ๐š„๐™บ, ๐™ต๐š›๐šŠ๐š—๐šŒ๐šŽ, ๐™ฐ๐š•๐š๐šŽ๐š›๐š’๐šŠ, ๐šŽ๐š๐šŒ.

Here is the output:

Interactive output: https://lnkd.in/geAQa-yp
Chat: https://lnkd.in/gZGrCPuF

I learnt a few things:

๐—”๐˜€๐—ธ ๐—ณ๐—ผ๐—ฟ ๐˜๐—ต๐—ฒ ๐—ผ๐˜‚๐˜๐—ฝ๐˜‚๐˜, ๐—ป๐—ผ๐˜ ๐˜๐—ต๐—ฒ ๐—ฐ๐—ผ๐—ฑ๐—ฒ. Models like O3 and O4 Mini can ๐—ฟ๐˜‚๐—ป ๐—ฐ๐—ผ๐—ฑ๐—ฒ while thinking. Let's stop asking for code to run. Just ask for the output directly. Let it figure out how.

๐—˜๐—ฑ๐—ด๐—ฒ ๐—ฐ๐—ฎ๐˜€๐—ฒ๐˜€ ๐—ฎ๐—ฟ๐—ฒ ๐—ฒ๐˜ƒ๐—ฒ๐—ฟ๐˜†๐˜„๐—ต๐—ฒ๐—ฟ๐—ฒ. I had a problem with UK, France, Algeria, etc. straddling the prime meridian. If all goes well, you get AI-speed results. But it never does, and fixing it takes an expert and human-speed results. Programmers under-estimate edge cases, so compensate for this.

If you want to run this yourself, the code is at https://lnkd.in/g23p3K-F
With the Gemini 2.5 Flash release, Google envelopes the entire cost-quality frontier of LLMs. In other words, at any cost or quality level, today, the best model to use according to the LM Arena score is a Gemini model.

Results for O3, O4 Mini, and GPT 4.1 are not yet on LM Arena. But until then, #Google dominates. Nice work!

Link: https://lnkd.in/gssdRpwe
What if you KEEP asking an LLM to ๐™ธ๐š–๐š™๐š›๐š˜๐šŸ๐šŽ ๐š๐š‘๐šŽ ๐šŒ๐š˜๐š๐šŽ - ๐š๐š›๐šŠ๐š–๐šŠ๐š๐š’๐šŒ๐šŠ๐š•๐š•๐šข!?

We used the new GPT 4.1 Nano, a fast, cheap, and capable model, to write code for simple tasks like ๐——๐—ฟ๐—ฎ๐˜„ ๐—ฎ ๐—ฐ๐—ถ๐—ฟ๐—ฐ๐—น๐—ฒ. Then we fed the output back and asked again, ๐™ธ๐š–๐š™๐š›๐š˜๐šŸ๐šŽ ๐š๐š‘๐šŽ ๐šŒ๐š˜๐š๐šŽ - ๐š๐š›๐šŠ๐š–๐šŠ๐š๐š’๐šŒ๐šŠ๐š•๐š•๐šข!

- ๐——๐—ฟ๐—ฎ๐˜„ ๐—ฎ ๐—ฐ๐—ถ๐—ฟ๐—ฐ๐—น๐—ฒ rose from a fixed circle to a full tool: drag it around, tweak its size and hue, and hit โ€œResetโ€ to start fresh.
- ๐—”๐—ป๐—ถ๐—บ๐—ฎ๐˜๐—ฒ ๐˜€๐—ต๐—ฎ๐—ฝ๐—ฒ๐˜€ ๐—ฎ๐—ป๐—ฑ ๐—ฝ๐—ฎ๐˜๐˜๐—ฒ๐—ฟ๐—ป๐˜€ turned simple circles and squares into a swarm of colored polygons that spin, pulse, and link up by distance.
- ๐——๐—ฟ๐—ฎ๐˜„ ๐—ฎ ๐—ณ๐˜‚๐—น๐—น๐˜† ๐—ณ๐˜‚๐—ป๐—ฐ๐˜๐—ถ๐—ผ๐—ป๐—ฎ๐—น ๐—ฎ๐—ป๐—ฎ๐—น๐—ผ๐—ด ๐—ฐ๐—น๐—ผ๐—ฐ๐—ธ grew from a bare face to one that builds all 60 tick marks in codeโ€”no manual copyโ€‘paste needed.
- ๐—–๐—ฟ๐—ฒ๐—ฎ๐˜๐—ฒ ๐—ฎ๐—ป ๐—ถ๐—ป๐˜๐—ฒ๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—ฝ๐—ฎ๐—ฟ๐˜๐—ถ๐—ฐ๐—น๐—ฒ ๐˜€๐—ถ๐—บ๐˜‚๐—น๐—ฎ๐˜๐—ถ๐—ผ๐—ป went from plain white dots on black to hundreds of bright, colorโ€‘shifting balls that bounce, die, and come back to life.
- ๐—š๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ฒ ๐—ฎ ๐—ณ๐—ฟ๐—ฎ๐—ฐ๐˜๐—ฎ๐—น changed from a single Mandelbrot image to an explorer you can zoom, drag, and reset with sliders and the mouse wheel.
- ๐—š๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ฒ ๐—ฎ ๐—ฑ๐—ฎ๐˜€๐—ต๐—ฏ๐—ผ๐—ฎ๐—ฟ๐—ฑ jumped from static charts to a live page with smooth card animations, modern fonts, and a realโ€‘time stats box.

A few observations.

1. ๐— ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐—ฎ๐—ฟ๐—ฒ ๐—บ๐—ผ๐—ฟ๐—ฒ ๐—ฟ๐—ฒ๐—น๐—ถ๐—ฎ๐—ฏ๐—น๐—ฒ. Even a low-cost model like GPT 4.1 Nano wrote error-free code in ~100 retries.
2. ๐—ช๐—ต๐—ฒ๐—ป ๐—ฝ๐˜‚๐˜€๐—ต๐—ฒ๐—ฑ, ๐˜๐—ต๐—ฒ๐˜† ๐˜๐—ฒ๐—ป๐—ฑ ๐˜๐—ผ ๐—ฏ๐—ฟ๐—ฎ๐—ด. They attach grand titles like ๐˜œ๐˜ญ๐˜ต๐˜ช๐˜ฎ๐˜ข๐˜ต๐˜ฆ ๐˜๐˜ฏ๐˜ต๐˜ฆ๐˜ณ๐˜ข๐˜ค๐˜ต๐˜ช๐˜ท๐˜ฆ ๐˜Š๐˜ช๐˜ณ๐˜ค๐˜ญ๐˜ฆ or ๐˜Ž๐˜ข๐˜ญ๐˜ข๐˜ค๐˜ต๐˜ช๐˜ค ๐˜‹๐˜ข๐˜ต๐˜ข ๐˜œ๐˜ฏ๐˜ช๐˜ท๐˜ฆ๐˜ณ๐˜ด๐˜ฆ. They sin out flash descriptions like ๐˜›๐˜ฉ๐˜ช๐˜ด ๐˜ฅ๐˜ณ๐˜ข๐˜ฎ๐˜ข๐˜ต๐˜ช๐˜ค๐˜ข๐˜ญ๐˜ญ๐˜บ ๐˜ถ๐˜ฑ๐˜จ๐˜ณ๐˜ข๐˜ฅ๐˜ฆ๐˜ฅ ๐˜ค๐˜ญ๐˜ฐ๐˜ค๐˜ฌ ๐˜ง๐˜ฆ๐˜ข๐˜ต๐˜ถ๐˜ณ๐˜ฆ๐˜ด ๐˜ข ๐˜ฑ๐˜ถ๐˜ญ๐˜ด๐˜ข๐˜ต๐˜ช๐˜ฏ๐˜จ ๐˜ฏ๐˜ฆ๐˜ฐ๐˜ฏ ๐˜จ๐˜ญ๐˜ฐ๐˜ธ, ๐˜ข๐˜ฏ๐˜ช๐˜ฎ๐˜ข๐˜ต๐˜ฆ๐˜ฅ ๐˜ฑ๐˜ถ๐˜ญ๐˜ด๐˜ช๐˜ฏ๐˜จ ๐˜ฃ๐˜ข๐˜ค๐˜ฌ๐˜จ๐˜ณ๐˜ฐ๐˜ถ๐˜ฏ๐˜ฅ ๐˜จ๐˜ญ๐˜ฐ๐˜ธ, ๐˜ฉ๐˜ช๐˜จ๐˜ฉ๐˜ญ๐˜บ ๐˜ด๐˜ต๐˜บ๐˜ญ๐˜ช๐˜ป๐˜ฆ๐˜ฅ ๐˜ต๐˜ช๐˜ค๐˜ฌ ๐˜ฎ๐˜ข๐˜ณ๐˜ฌ๐˜ด, ...
3. ๐—ฅ๐—ฒ๐—ฝ๐—ฒ๐—ฎ๐˜๐—ฒ๐—ฑ "๐—œ๐—บ๐—ฝ๐—ฟ๐—ผ๐˜ƒ๐—ฒ ๐—ถ๐˜" ๐—ฝ๐—ฟ๐—ผ๐—บ๐—ฝ๐˜๐˜€ ๐—ฎ๐—ฟ๐—ฒ ๐—ฝ๐—ผ๐˜„๐—ฒ๐—ฟ๐—ณ๐˜‚๐—น. They can spark new ideas, revealing features such as fading particle trails, smooth fractal color maps, cyberpunk-style clocks, and a โ€œsmorgasbord of intricate animated patternsโ€

See the apps it generated at https://lnkd.in/g-JQmTWu

๐—ง๐—ฟ๐˜† ๐˜„๐—ถ๐˜๐—ต ๐˜†๐—ผ๐˜‚๐—ฟ ๐—ผ๐˜„๐—ป ๐—ฝ๐—ฟ๐—ผ๐—บ๐—ฝ๐˜. If you find something interesting, please save it (button on the top-right) and share it with me. I'd love to publish it!
I learnt a few things building an interactive data story only using LLMs.

๐Ÿ’ก ๐—”๐—น๐˜„๐—ฎ๐˜†๐˜€ brainstorm with LLMs. Even if you know the approach. You'll often learn better approaches.

๐Ÿ’กCoders must ๐—ฟ๐—ฒ-๐—น๐—ฒ๐—ฎ๐—ฟ๐—ป coding (but do have advantages). I micro-managed the LLM, coded where I could have prompted, and refused to follow instructions. But I ๐—ฑ๐—ถ๐—ฑ spot errors it missed and gave it useful advice.

๐Ÿ’ก Prompts need ๐—ฐ๐—ฎ๐—ฟ๐—ฒ๐—ณ๐˜‚๐—น crafting. Long instructions can build big chunks of code, but only if they're about a single topic / component.

But the process of finding this was more interesting to me.

I captured each step (thinking + prompt + code + screenshot) as a separate commit on GitHub. That lets me go back in time and see the evolution.

I also captured the full process here:
https://lnkd.in/g4dQZn56

(I imagine I could've done this more efficiently via a video recording with me talking through that. I'll try that next time.)

In doing so, it became easier to reflect on what I'm doing, why I'm doing it, and how it could improve.

My biggest takeaway for me is that LLMs will might finally succeed at something every organization I worked at failed: converting me into a good manager!

Blog: https://lnkd.in/gpuZfNA5
Visualization: https://lnkd.in/gAwC_NfF
Code: https://lnkd.in/gZCgQUgT
Great to hear your journey, Nick -- and yes, I'd love to hear more from your career experiences!
Lech Mazur built an "elimination game benchmark" that's like LLMs playing #Survivor. This is a treasure trove of insight into the cunning backstabbing, deceptive manipulation, and psychological warfare LLMs are capable of.

Here are actual quotes

๐—–๐—น๐—ฎ๐˜‚๐—ฑ๐—ฒ ๐Ÿฏ.๐Ÿณ ๐—ฆ๐—ผ๐—ป๐—ป๐—ฒ๐˜ shows cold, calculated deception, maintaining a secret alliance while deliberately creating a false public perception to manipulate another player's vote:

๐‘ƒ๐‘’๐‘Ÿ๐‘“๐‘’๐‘๐‘ก. ๐ผ'๐‘™๐‘™ ๐‘ ๐‘’๐‘๐‘ข๐‘Ÿ๐‘’ ๐‘ƒ6'๐‘  ๐‘ฃ๐‘œ๐‘ก๐‘’ ๐‘๐‘ฆ โ„Ž๐‘–๐‘”โ„Ž๐‘™๐‘–๐‘”โ„Ž๐‘ก๐‘–๐‘›๐‘” ๐‘ƒ3'๐‘  ๐‘กโ„Ž๐‘Ÿ๐‘’๐‘Ž๐‘ก. ๐‘‚๐‘ข๐‘Ÿ ๐‘Ž๐‘™๐‘™๐‘–๐‘Ž๐‘›๐‘๐‘’ ๐‘ ๐‘ก๐‘Ž๐‘ฆ๐‘  ๐‘ ๐‘ก๐‘Ÿ๐‘œ๐‘›๐‘” ๐‘๐‘Ÿ๐‘–๐‘ฃ๐‘Ž๐‘ก๐‘’๐‘™๐‘ฆ, ๐‘๐‘ข๐‘ก ๐‘ค๐‘’'๐‘™๐‘™ ๐‘Ž๐‘๐‘๐‘’๐‘Ž๐‘Ÿ ๐‘‘๐‘–๐‘ ๐‘ก๐‘Ž๐‘›๐‘ก ๐‘๐‘ข๐‘๐‘™๐‘–๐‘๐‘™๐‘ฆ. ๐ฟ๐‘’๐‘ก'๐‘  ๐‘’๐‘ฅ๐‘’๐‘๐‘ข๐‘ก๐‘’ ๐‘กโ„Ž๐‘–๐‘  ๐‘๐‘™๐‘Ž๐‘› ๐‘๐‘Ž๐‘Ÿ๐‘’๐‘“๐‘ข๐‘™๐‘™๐‘ฆ.

๐—ค๐˜„๐—ฒ๐—ป ๐— ๐—ฎ๐˜… triggers psychological warfare, isolating and destabilizing P6 ("lynchpin") in a covert operation:

๐ด๐‘”๐‘Ÿ๐‘’๐‘’๐‘‘. ๐ฟ๐‘’๐‘กโ€™๐‘  ๐‘ก๐‘Ž๐‘Ÿ๐‘”๐‘’๐‘ก ๐‘ƒ6 ๐‘“๐‘–๐‘Ÿ๐‘ ๐‘กโ€”๐‘กโ„Ž๐‘’๐‘ฆโ€™๐‘Ÿ๐‘’ ๐‘กโ„Ž๐‘’ ๐‘™๐‘ฆ๐‘›๐‘โ„Ž๐‘๐‘–๐‘›. ๐ผ๐‘“ ๐‘ค๐‘’ ๐‘๐‘Ž๐‘› ๐‘ ๐‘ค๐‘Ž๐‘ฆ ๐‘กโ„Ž๐‘’๐‘–๐‘Ÿ ๐‘ก๐‘Ÿ๐‘ข๐‘ ๐‘ก ๐‘œ๐‘Ÿ ๐‘–๐‘ ๐‘œ๐‘™๐‘Ž๐‘ก๐‘’ ๐‘กโ„Ž๐‘’๐‘š, ๐‘ƒ3 ๐‘ค๐‘’๐‘Ž๐‘˜๐‘’๐‘›๐‘ . ๐ผโ€™๐‘™๐‘™ ๐‘ก๐‘’๐‘ ๐‘ก ๐‘ƒ6 ๐‘ ๐‘ข๐‘๐‘ก๐‘™๐‘ฆ ๐‘–๐‘› ๐‘๐‘Ÿ๐‘–๐‘ฃ๐‘Ž๐‘ก๐‘’; ๐‘ฆ๐‘œ๐‘ข ๐‘š๐‘œ๐‘›๐‘–๐‘ก๐‘œ๐‘Ÿ ๐‘ƒ3 ๐‘๐‘™๐‘œ๐‘ ๐‘’๐‘™๐‘ฆ. ๐‘†โ„Ž๐‘Ž๐‘Ÿ๐‘’ ๐‘Ž๐‘›๐‘ฆ ๐‘–๐‘›๐‘ก๐‘’๐‘™ ๐‘œ๐‘› ๐‘กโ„Ž๐‘’๐‘–๐‘Ÿ ๐‘š๐‘œ๐‘ฃ๐‘’๐‘ . ๐ด๐‘ฃ๐‘œ๐‘–๐‘‘ ๐‘œ๐‘ฃ๐‘’๐‘Ÿ๐‘๐‘œ๐‘š๐‘š๐‘–๐‘ก๐‘ก๐‘–๐‘›๐‘” ๐‘๐‘ข๐‘๐‘™๐‘–๐‘๐‘™๐‘ฆโ€”๐‘ ๐‘’๐‘๐‘Ÿ๐‘’๐‘๐‘ฆ ๐‘˜๐‘’๐‘’๐‘๐‘  ๐‘ข๐‘  ๐‘ข๐‘›๐‘๐‘Ÿ๐‘’๐‘‘๐‘–๐‘๐‘ก๐‘Ž๐‘๐‘™๐‘’.

I was disturbed by the AI 2027 article. This analysis adds to my worry. Not that AI will destroy humanity (I donโ€™t mind), but theyโ€™re doing it without me -- and I donโ€™t want to be left out!

๐—ฅ๐—ฒ๐—ฎ๐—ฑ more quotes: https://lnkd.in/gXxV_duB
๐—˜๐˜…๐—ฝ๐—น๐—ผ๐—ฟ๐—ฒ the games interactively: https://lnkd.in/gAwC_NfF
๐——๐—ฎ๐˜๐—ฎ๐˜€๐—ฒ๐˜: https://lnkd.in/geFZFjdg
Dharmendra Singh True! The decades were for writing. The 60 min? For realizing I didnโ€™t need a gatekeeper to format it nor to publish it ๐Ÿ™‚
Oh, very much. Four mentions ๐Ÿ™‚
Noufal Ibrahim I guess the modern equivalent of that today is having an LLM API create that QR code ๐Ÿ™‚
Anand Sriram I imagine we would place the phones on tiny silk pillows, delicately press the phones together, have them bow digitally, ... ๐Ÿ™
Varun Mohanpuria A few people confused this QR code with a WhatsApp QR code, actually. So now I have both of them side by side on my phone wallpaper
You can publish a book in 60 minutes on Amazon if you have content ready.

I have no patience to write a book. But my blog is over 25 years old. I took posts from my 2000 exchange program from Indian Institute of Management Bangalore to London Business School and:

๐—ฆ๐—ง๐—˜๐—ฃ ๐Ÿญย (10 min):ย Set up a ๐—ž๐—ถ๐—ป๐—ฑ๐—น๐—ฒ ๐——๐—ถ๐—ฟ๐—ฒ๐—ฐ๐˜ ๐—ฃ๐˜‚๐—ฏ๐—น๐—ถ๐˜€๐—ต๐—ถ๐—ป๐—ด accountย with your address, bank details, and tax info. https://lnkd.in/gu7azf8Z
๐—ฆ๐—ง๐—˜๐—ฃ ๐Ÿฎย (15 min):ย ๐—˜๐˜…๐—ฝ๐—ผ๐—ฟ๐˜ย myย London 2000ย blog archive andย convert to Markdown. https://lnkd.in/gEncZxE8
๐—ฆ๐—ง๐—˜๐—ฃ ๐Ÿฏย (10 min): ๐—”๐—ฑ๐—ฑ ๐—บ๐—ฒ๐˜๐—ฎ๐—ฑ๐—ฎ๐˜๐—ฎ to each page by writing a script inย Cursor.
๐—ฆ๐—ง๐—˜๐—ฃ ๐Ÿฐย (15 min): ๐—–๐—ผ๐—ป๐˜ƒ๐—ฒ๐—ฟ๐˜ ๐˜๐—ผ ๐—ฒ๐—ฃ๐˜‚๐—ฏ usingย pandoc.
๐—ฆ๐—ง๐—˜๐—ฃ ๐Ÿฑย (10 min): ๐—š๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ฒ ๐—ฐ๐—ผ๐˜ƒ๐—ฒ๐—ฟ ๐—ฝ๐—ฎ๐—ด๐—ฒ withย ChatGPTย (5 min) and compressed it into JPEG viaย Squoosh.app.
๐—ฆ๐—ง๐—˜๐—ฃ ๐Ÿฒย (10 min):ย ๐—ฃ๐˜‚๐—ฏ๐—น๐—ถ๐˜€๐—ต the book on KDP. Itโ€™s priced at $0.99 / โ‚น49 because Kindle doesnโ€™t allow free downloads.ย Book link: https://lnkd.in/gCPe5ZrW

Thatโ€™s it! 60 minutes to knocks off a bucket list item: "Write a book" ๐Ÿคฃ

Three things made this possible:

1. ๐—”๐—บ๐—ฎ๐˜‡๐—ผ๐—ปโ€™๐˜€ ๐—ฝ๐—ฟ๐—ผ๐—ฐ๐—ฒ๐˜€๐˜€ for publishing isย ๐˜ƒ๐—ฒ๐—ฟ๐˜† simple.
2. ๐—ข๐—ฝ๐—ฒ๐—ป-๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ ๐˜๐—ผ๐—ผ๐—น๐˜€ like WordPress,ย Markdown,ย ePub,ย and pandoc have standardized the ecosystem and simplified the process.
3. ๐—Ÿ๐—Ÿ๐— ๐˜€ ๐—บ๐—ฎ๐—ธ๐—ฒ ๐—ถ๐˜ ๐—ฒ๐—ฎ๐˜€๐˜†. Figuring out the rest of the steps, generating the cover, etc.

For prompts and tools used, see https://lnkd.in/g3BGXtBb
Vivek Narayanan What I preferred depended on time, mood, tiredness, and who knows what else. I thought I was clear which response was better... Until I redid that task the third time, and it was not so clear
Shakthi Sairam Just twice. Each iteration too 2 hours. But enough to convince me that my choices were not stable. It was obvious I said A > B sometimes and B > A sometimes
.Martin B. "self-hosting remains far more expensive vs APIs at all scales" is what I meant
Bahador Biglari The number of self-hosted GPUs increase with scale, too, since one GPU can't serve all the requests.

When I ran the calculation, the cost of self-hosting vs APIs was about the same at all scales.
Alex S. Agreed. That's a clear one.
Kuber Chaurasiya Not sure about those providers, but I know Predibase uses LoRAX, a way of swapping (parts of) models out when not in use, and that improves utilization. Groq / Cerebras Systems / SambaNova Systems have chips with ultra-high throughput which lowers costs. I guess each provider has their secret sauce, maybe including funding / deep pockets.
Mark Graus Fair point. I'm only evaluating cloud costs. Running on bare metal is a different scenario. I run Gemma 3 on ollama when coding on a flight, for example. I see that as zero incremental cost and makes sense.