RIP, Data Scientists

DataHack Summit · 21 Aug 2025 · Bangalore
Anand S · LLM Psychologist · Straive
Video · Slides · Transcript

R.I.P? Seriously!?

... because in July, I saw, first-hand, what happened to two experienced data scientists

... who struggled (2 weeks) to model floor material choice given floor dimensions to minimize load, cost, CO2.

... and ChatGPT did it in 15 min.

Your manager gives you a dataset

Your client, Naya Airline, is launching in India. Their operations head wants data to tell them exactly which cities will make most money, which routes will always be full, and what prices people will happily pay.

You do what data scientists do

  1. Explore it
  2. Clean it
  3. Model it
  4. Explain it
  5. Deploy it
  6. Anonymize it

Explore it - automate hypotheses

Explore it - with vibe-analysis

Vibe coding is coding like code does not even exist.

Vibe analysis is analyzing like analysis does not even exit.

Give ChatGPT full context, speak your desire, and review just the answers. Skip the rest.

  • Route leaderboard, growth, seasonality
  • Network exploration, weighted edges, centrality
  • Compare per-capita, fairness
  • Carrier overlay and variation, carrier entry/exit
  • Summarize insights as poetry

Clean it - automate quality

Clean it - with vibe analysis

Perform COMPREHENSIVE and ADVANCED data quality analysis. For each dataset here, list every clear and potential data quality issue. Make sure these are non-obvious, non-trivial, and mind-blowing! Suggest approaches to automatically fix these.

  • Hyderabad → Hyderabad is an active route!
  • Cumulative drone certificates actually decrease some days.
  • Jun 7, 2023: 86K international arrivals. Exceeds global traffic!
  • Dec 4, 2023: 894k footfalls vs 804k people. 90K teleported?

Model it - with vibe analysis

  • Forecast next-month passengers by route. Which model is best?
  • Also:
    • Which routes will revive? Why? Neighbor networks?
    • Predict carrier share. Where does route mix change most?
  • Let me download the models

Explain it - as charts and infographics

ChatGPT generates reliable (though ugly) Matplotlib charts.

You can also use Codex, Jules, Claude Code, Cursor, etc. to build D3 / ChartJS data visualizations.

Here are data visualizations by non-programmers vibe-coding in a 2 hour workshop.

Deploy it - with vibe coding

Build the best model with parameters to forecast the traffic on a given day in the future across routes. Also write a Streamlit app that can take the route and date as input and show the forecast along with accuracy estimate. Allow me to download the model(s) and app as a zip file so I can run it locally.

Anonymize it - aligning to hypotheses

Don't generate random fake data. Don't anonymize blindly.

Have LLMs generate synthetic data that aligns with hypotheses.

Generate realistic fake data for ______

1. List columns + distribution
2. List 5 hypotheses
3. Generate 2K random rows. Align to
hypotheses statistically significantly.
4. Test hypothesis. Download CSV.

RIP tasks, not talent

What's dying

  • EDA. Profile, types, missing-values, deduping, anomalies
  • Scaffolds. Loaders, schemas, docstrings, README, tests
  • AutoML. Feature, model, parameter, metric choices
  • Code. Write specs. LLMs fill in 80%
  • Explainers. Narratives, visuals, slide
  • Forms. Project reports, timesheets

What's rising

  • Leadership. Right goal + allocation
  • Problem framing. Prompt & right-scope to reduce iterations
  • Eval design. Automate. Binary checks, domain-driven LLM-as-judge
  • Invariants. Define ontologies, declare truths/constraints
  • Verticalization. Domain-specific datasets & workflows as moats
  • Trust & taste. Auditors, storytellers

RIP, Data Scientists

Your role has changed

DataHack Summit · 21 Aug 2025 · Bangalore
Anand S · LLM Psychologist · Straive
Slides · Transcript

Chat: https://chatgpt.com/c/68a2a755-80ac-832d-b0f3-ab085fb85460 - #TODO: Public Animated GIFs and memes