Data Design by Dialogue

Visualizing Data with LLMs | Slides

VizChitra · 27 June 2025, 4:15 pm IST · Bangalore
Anand S · LLM Psychologist · Straive
Video · Transcript

LLMs can help in every step of the data to story value chain

  • Data Engineering: Scraping, Cleaning, ...
  • Data Analysis: Modeling, Insight generation, ..
  • Data Visualization: Charting, Narrations, ...

Best democratization of data stories since Open Data.

Scraping WhatsApp

Let's have ChatGPT build us a scraper:

I want to scrape data from a WhatsApp group by pasting JavaScript code on the DevTools console. I want you to write the code for that.

Give me code that I can paste in the console that will copy any HTML or anything else from the WhatsApp web page (making sure it's not too long -- trim as required) into the clipboard. I'll paste it here. You can then write the code to scrape all the useful information in each message as an array of JSON objects.

Tactics I learnt

LLMs err. Have them write code!

So, now, anyone can code.

Expertise helps. But it's easier to develop.

Cleaning Data

Here are WhatsApp messages. Any missing values?

Reactions blank for ~64% (no reactions).
Author null ~6% (system messages).
Time null ~5.5% (parsing glitches).
Text blank ~0.8% (likely media or deleted).

Interpolate/extrapolate time from nearby messages. Give me a .json.gz download.

Tactics I learnt

Practice delegating more. LLMs keep improving.

Modeling Topics

LLMs also use embeddings. From these messages, let's see:

  1. What messages are similar
  2. Cluster into groups
  3. Name each clusters

Let me download all the .text fields as a CSV file.

Apply a Topic Modeling tool

Modeling Topics

Let's use Claude Code to code it:

whatsapp.json.gz has WhatsApp msgs. Write topics.py to

  • calculate the embeddings of each .text
  • cluster embeddings using K-Means into 12 clusters
  • use gpt-4.1-mini to name all clusters
  • create tagged.json that adds a cluster: to each message
  • test on 20 messages & 3 clusters. Then, run for all.

Tactics I learnt

Throw it away and redo. It's easier than fixing.

Visualizing Stories

These messages are from the VizChitra group.
What interesting insights can we derive from these?
List 10 diverse data stories we can explore. Include quirky ones.
Then, for each of those, write the code to analyze the data.

Show the results as tables and charts and interpret each as a story with surprise & human appeal.

Amuse me!

Tactics I learnt

Have LLMs Write code to analyze.

Don't ask for one. Ask for a dozen analyses.

Keep an impossibility list. Review monthly.

Let's jam!

What questions would you like to ask? E.g.:

Correlate all metric pairs you can think of into a single scatterplot matrix and give me interesting stories.

More metrics! More quirky!

Redraw with Seaborn (not matplotlib) and make it beautiful! Award winning!!

Make an interactive D3 data viz to visualize these.

LLMs can help in every step of the data to story value chain

  • Data Engineering: Scraping, Cleaning, ...
  • Data Analysis: Modeling, Insight generation, ..
  • Data Visualization: Charting, Narrations, ...

Try it! See where it works and fails.

Data Design by Dialogue

Visualizing Data with LLMs | Slides

VizChitra · 27 June 2025, 4:15 pm IST · Bangalore
Anand S · LLM Psychologist · Straive
Video · Transcript

with OpenAI's text-embedding-3-small

Collate ALL messages for ALL clusters and send a single message asking it to name each cluster - especially in a way that CLEARLY differentiates the clusters. Use OpenAI's structured JSON output feature to get the cluster names as a JSON array

Add inline script metadata using use uv run topics.py. Print progress.