Browser History Data Visualization

I vibe-coded data visualizations of my browser history over the last 4 months.

Planning

STEP 1: Ran codex --search with GPT-5-codex (high) to create an ideation plan.md

Suggest interesting analyses & visual storytelling ideas based on my browser history that I can publish as journalistic data stories on my blog.

Use my Edge/Ubuntu browser history at ~/.config/microsoft-edge/Default/History - a SQLite database. (Open it as read-only with no lock.)

Search online for more creative ideas.

Evaluate each based on analysis novelty, visual impact, usefulness for the reader, and reliability/robustness of the analysis.

Save as plan.md.

STEP 2: Ran the same with codex--search with GPT-5 (high) to create an ideation plan-b.md

⭐ I like GPT-5's ideation over GPT-5-Codex's ideation.

STEP 3: Continued the thread with GPT-5 asking it to merge the plans into plan-spec.md.

Read plan.md. Merge ideas from there and write a revised story idea list in plan-spec.md. Re-evaluate the idea list on the same criteria. Pick the top 3 ideas. For the top 3, create a concise prompt for Claude Code to generate the visual story. Include the SQL query and analysis steps. Do not explain what visual to create or how. Claude Code has a better visual aesthetic and can decide its course. Instead, explain the effect we want the analysis and visual to have on the audience; the mood / feeling to evoke.

STEP 4: Reviewed plan-spec.md. I like the goals.

... but would rather have Codex do only the analysis and Claude Code do only the visualization. So I moved plan-spec.md to /tmp (temporarily) and ...

STEP 5: Re-merge into plan-spec.md and create story-wise specs using codex --search with GPT-5 (high)

Write prompts for interesting journalistic visual data stories from my browser history for my blog.

Based on my Edge/Ubuntu browser history at ~/.config/microsoft-edge/Default/History - a SQLite database (open it as read-only with no lock) we have ideas in plan.md and plan-b.md. Merge ideas and write a revised story idea list in plan-spec.md.

Evaluate each based on analysis novelty, visual impact, usefulness for the reader, and reliability/robustness of the analysis. Pick the top 3 ideas.

For each of these top 3 ideas:

Attention Clock

STEP 5: Ran npx -y @anthropic-ai/claude-code --permission-mode acceptEdits in attention-clock/.

Create a beautiful award winning data journalistic visualization on my browsing history as per spec.md.

It created attention-clock which was beautiful but not too insightful.

Screenshot

Digital Life

STEP 6: Asked Claude Code to generate a full story by itself.

I copied the history into history.db and ran npx -y @anthropic-ai/claude-code --permission-mode acceptEdits in history/ with:

Create a beautiful award winning data journalistic visualization on Anand's browsing history.

Use this copy of Edge's SQLite history.db.

Be insightful, funny/witty, gently instructive.

It generated a version which looked OK but had this problem:

Two charts were empty

See screenshot1.webp. Two charts are empty. Fix these.

That was OK, not too impressive either.

Screenshot

Rabbit Holes

STEP 7: Ran npx -y @anthropic-ai/claude-code --permission-mode acceptEdits in rabbit-holes/ with a revised version of spec.md using a local copy of the history.

Create a beautiful award winning data journalistic visualization on Anand's browsing history. Focus on how a single spark becomes a journey. Make readers feel the momentum of a rabbit hole β€” curiosity tightening into flow β€” and the tenderness of losing it. Celebrate deep dives, avoid shaming detours.

Effect/mood to evoke

Analysis hints:

Some existing CSVs for reference:

While working, it said something I quite liked:

The visualization will use a particle flow system where each chain is a flowing stream of particles, showing momentum building as curiosity deepens.

It created rabbit-holes/index.html which was beautiful! But I needed more details.

STEP 7: I really liked what it created, so I continued the session, asking it to expand.

I love the design and style of this! But this does not give me any specifics about any of these: the spirals, the branching explorations, etc. like what sites I visited, what the chain was about, or even what the branching structure looked like.

Expand the story to provide specifics. Be insightful and instructive.

STEP 8: This was beautiful, so I took a backup of this version and had it tweak:

Fantastic! Hyperlink all pages. Begin with the patterns of exploration. Then, for each pattern, show two interesting examples. The current examples are perfect. Add more as required. Pick the most interesting ones.

Then:

Avoid localhost examples

Then:

Ensure that the three patterns of exploration fit in a single row. Ensure that each category begins with a public site example, i.e. not a site that requires a login or that the audience cannot access.

The result is beautiful!

Screenshot

Search Funnels

STEP 9: Same approach. Ran npx -y @anthropic-ai/claude-code --permission-mode acceptEdits in search-funnels/ with a revised version of spec.md using a local copy of the history.

Create a beautiful award winning data journalistic visualization on Anand's browsing history. Reveal whether questions actually find answers. Make the journey from a search to its first destination feel tangible β€” hopeful when swift, curious when meandering. Focus on empathy: everyone googles in loops; clarity comes from seeing the loops.

Effect/mood to evoke

Analysis hints:

Some existing CSVs for reference:

Version 1 was nice. So I added:

Nice! In the search journeys, highlight the most interesting individual (search, time-to-click) items and add a sentence explaining these. Allow filtering by these (apart ) Allow sorting by time, clicks, searches, destinations, and restoring to the original order. For each journey type, synthesize and share insights based on the patterns you see. Add links to search queries (e.g. google.com/search?q=...). Replace the Rhythm of Search visual with a more insightful, visually appealing one.

Version 2 could be improved a bit:

Great!

Drop the "Where Do Searches Lead" section For the "Interesting" searches, hand-craft the explanation for each. Unique, insightful, leading up to a collective story. Improve the "What Each Journey Type Reveals" section. Rewrite with big, useful, non-obvious/surprising insights.

Modify "The Search Landscape" into a scatterplot that fills the width of the window, where:

Rewrite the "Reading the Landscape" chart to explain the chart AND share with big, useful, non-obvious/surprising insights.

This led to a squashed chart. So....

The search landscape chart is squeezed into the left. See screenshot-landscape.webp

Version 3 didn't have much insight on the search landscape. So:

Delete "The Search Landscape" and "Reading the Landscape" sections.

Screenshot

Scrub

To review for privacy issues, I asked codex with GPT-5 (high) to create leaks.md:

Review files added for commit and identify privacy issues / data leaks. For each file, mention a list of issues (if any) along with severity. Provide enough information for the reader to decide whether to publish or not. Save this in browser-history/leaks.md.

After reviewing, I went to the search-funnels folder and, in a new session of codex with GPT-5 (high) to:

Go through v1.html, v2.html, v3.html and index.html. Find out which terms would be loaded from search_funnels_terms.csv based on the code's functionality. Filter the file so that only those terms remain and delete the rest. Ensure that this does not break any functionality.

That went too aggressive, so:

Go through index.html. Find out which terms would be loaded from search_funnels_terms.csv based on the code's functionality. Ensure that every filter (instant, quick, ...) and every sort order will have at least 20 results. Ensure that all interesting terms (e.g. those mentioned in the story) are included. Delete the rest from search_funnels_terms.csv.

Browsing patterns

I continued exploring rabbit holes with this prompt.

Create a beautiful award winning data journalistic visualization on Anand's browsing history as a single page web app.

Write about the following three patterns of browsing: Linear Spirals, Hub & Spoke, Wide Survey.

Linear Spirals: One page β†’ next β†’ next. Deep focus, single thread. Branching β‰ˆ 1.0
Root
└─ A
   └─ B
     └─ C
       └─ D
         └─ E
           └─ F

Hub & Spoke: Return to index, open new tab. Configuration work. Branching 2-6Γ—
Root
β”œβ”€ A
β”œβ”€ B
β”‚  β”œβ”€ B1
β”‚  └─ B2
β”œβ”€ C
└─ D

Wide Survey: Many tabs from root. Cataloging, surveying. Branching 10-24Γ—
Root
β”œβ”€ A
β”œβ”€ B
β”œβ”€ C
β”œβ”€ D
β”œβ”€ E
β”œβ”€ ...

Analysis: Run SQL on history.db. Use these queries as hints:

-- chain_summary
WITH recursive edges AS
(
          SELECT    v.id AS visit_id,
                    v.from_visit,
                    datetime((v.visit_time/1000000)-11644473600,'unixepoch')          AS ts_utc,
                    COALESCE(c.total_foreground_duration, v.visit_duration)/1000000.0 AS fg_sec,
                    (v.transition & 255)                                              AS base_t,
                    u.url,
                    u.title,
                    CASE
                              WHEN u.url LIKE 'http%' THEN substr(u.url, instr(u.url,'://')+3,
                                        CASE instr(substr(u.url, instr(u.url,'://')+3),'/')
                                                  WHEN 0 THEN length(u.url)
                                                  ELSE instr(substr(u.url, instr(u.url,'://')+3),'/')-1
                                        END)
                              ELSE u.url
                    END AS host
          FROM      visits v
          LEFT JOIN context_annotations c
          ON        c.visit_id=v.id
          JOIN      urls u
          ON        u.id=v.url
          WHERE     (
                              v.transition & 255) NOT IN (3,4) ), roots AS
(
          SELECT    e.visit_id
          FROM      edges e
          LEFT JOIN edges p
          ON        e.from_visit=p.visit_id
          WHERE     e.from_visit IS NULL
          OR        p.visit_id IS NULL), walk(root_id, visit_id, from_visit, ts_utc, fg_sec, host, title, depth) AS
(
       SELECT r.visit_id AS root_id,
              e.visit_id,
              e.from_visit,
              e.ts_utc,
              e.fg_sec,
              e.host,
              e.title,
              1
       FROM   roots r
       JOIN   edges e
       ON     e.visit_id = r.visit_id
       UNION ALL
       SELECT w.root_id,
              e.visit_id,
              e.from_visit,
              e.ts_utc,
              e.fg_sec,
              e.host,
              e.title,
              w.depth+1
       FROM   edges e
       JOIN   walk w
       ON     e.from_visit = w.visit_id)
SELECT   root_id                                                                  AS chain_id,
         count(*)                                                                 AS visit_count,
         count(DISTINCT host)                                                     AS distinct_domains,
         min(ts_utc)                                                              AS start_ts_utc,
         max(ts_utc)                                                              AS end_ts_utc,
         cast((julianday(max(ts_utc))-julianday(min(ts_utc)))*86400.0 AS integer) AS duration_sec,
         round(sum(
         CASE
                  WHEN fg_sec>0 THEN fg_sec
                  ELSE 0
         END),2) AS total_foreground_sec
FROM     walk
GROUP BY root_id
HAVING   visit_count >= 3
ORDER BY visit_count DESC


-- chain_depths.csv
WITH recursive edges AS
(
       SELECT v.id AS visit_id,
              v.from_visit
       FROM   visits v
       WHERE  (
                     v.transition & 255) NOT IN (3,4)), roots AS
(
          SELECT    e.visit_id
          FROM      edges e
          LEFT JOIN edges p
          ON        e.from_visit=p.visit_id
          WHERE     e.from_visit IS NULL
          OR        p.visit_id IS NULL), walk(root_id, visit_id, depth) AS
(
       SELECT r.visit_id,
              r.visit_id,
              1
       FROM   roots r
       UNION ALL
       SELECT w.root_id,
              e.visit_id,
              w.depth+1
       FROM   edges e
       JOIN   walk w
       ON     e.from_visit = w.visit_id)
SELECT   root_id    AS chain_id,
         max(depth) AS max_depth,
         count(*)   AS total_nodes
FROM     walk
GROUP BY root_id
ORDER BY max_depth DESC limit 5000;

But the stories were not very impressive (and also hard to anonymize). So, here are a few observations.

Learnings