There Were Two Exams · TDS Jan 2026

Imagine designing a course that teaches students to use AI tools. You'd expect the students who are good at one AI task to be generally good at others. You would be wrong — by a correlation of 0.020.

THE SETUP

The Assumption Behind Every Multi-Part Exam

When a course has multiple questions, there is usually an implicit assumption: the skills they test are related. A student who masters one question will likely have an easier time with the others. The questions reinforce each other. They form a coherent assessment of a coherent set of abilities.

The TDS Jan 2026 course had twelve questions measuring eleven distinct tasks. The assumption seemed safe. They all involved AI tools. They were all in the same course. The students who knew how to use AI for network analysis should have some advantage on AI-assisted image generation.

The correlation between those two skill families is 0.020.

0.020. For practical purposes, these are two independent exams sitting inside one course.

The Skill Map: Three Clusters, One Chasm

The top-left 3×3 block (network games) shows correlations of 0.78–0.82. The bottom-right 4×4 block (image briefs) shows 0.10–0.27. Between them: essentially nothing.

THE COMPLICATION

Inside the Network: One Skill Family

The three network game questions — Labyrinth, Graph Detective, and The Signal — correlate with each other at 0.78 to 0.82. In psychometric terms, that's the range you'd expect from three tests of the same underlying ability. They are, for practical purposes, measuring the same thing three times.

This is useful information for course design. It suggests that the three network questions, despite their different surface features, draw on a shared skill: probably something like "ability to navigate structured environments using AI assistance." A student who masters Labyrinth has very likely already mastered Detective and Signal.

THE REVELATION

Inside the Image Unit: Four Different Skills

The image questions tell the opposite story. The four image briefs — Affective Chart, Concept Incarnation, Paradox Portrait, Style Transplant — show internal correlations of only 0.099 to 0.266. These are questions within the same assignment unit, evaluated by the same rubric, submitted in the same week. They should be measuring the same thing.

They are not measuring the same thing.

Four image briefs. Four different skills masquerading as one assignment.

Network Pairs Stay Tight; Image Pairs Fall Apart

Every network-network correlation exceeds 0.78. The highest image-image correlation is 0.266. The gap between these worlds is structural.

THE THEORY VERDICTS

What Three Learning Theories Say About This

WHAT THE DATA PROVES

What Transfers — and What Doesn't

Three learning theories survive contact with this dataset. One — a popular one — doesn't.

Thorndike's Identical Elements Theory (1901) holds that transfer between tasks is strongest when they share the same underlying structure. The three network games — Data Labyrinth, Graph Detective, The Signal — all demand the same cognitive operation: traverse a graph under constraints, read state, make decisions. Their correlations of 0.78–0.82 are exactly what Thorndike would predict. The image briefs share almost nothing with graph traversal — hence the 0.020 correlation. Writing 125 years ago, Thorndike could have forecast this result to two decimal places.

Situated Cognition (Lave & Wenger) argues that knowledge is bound to the context in which it was acquired — not portable by default. A student who mastered "how to iterate through graph nodes" did not thereby master "how to make emotionally resonant abstract art." These are not harder and easier versions of the same skill. They are different skills, acquired in different contexts, activated by different triggers. The 0.020 cross-cluster correlation is situated cognition theory expressed as a single number.

Threshold Concepts (Meyer & Land) predict that once a learner crosses a conceptual threshold, performance transforms discontinuously — a one-way door to a different understanding. The semantic match thresholds — concept-match, paradox-match, tradition-match — are literal threshold concepts in action. Once students crossed the semantic threshold, once Gemini could recognize what they intended, scores jumped from 5.2–5.9 to 8.2–9.0. Below the threshold, polish didn't help. Above it, everything worked.

The theory this data breaks is the "AI Literacy" unitary construct — the assumption, widespread among educators and researchers, that AI proficiency is a single dimension you either have or don't. The 0.020 correlation shows this is false. The course contained two independent exams measuring skills that barely predict each other. Students who aced one had no systematic advantage on the other.

Practical takeaways for educators

Treat image-generation tasks and network/systems tasks as separate skill domains — separate rubrics, separate feedback, possibly separate instructional sequences.
If your goal is cross-domain AI fluency, design explicitly for transfer. It will not happen organically from proximity alone.
The threshold effect means remedial interventions should target semantic understanding, not more iterations. Students below the threshold don't need to submit more; they need to understand what the brief is asking for.

For students

Being good at network puzzles doesn't make you good at AI art generation. Both are real skills; neither predicts the other.
In image generation, the prompt that makes Gemini recognize your concept matters more than a beautiful image. Master the semantics before optimizing aesthetics.

IMPLICATIONS

The Model Playbook Is Brief-Specific

If AI proficiency were a single skill, you'd use the same model for everything. The correlation data says you shouldn't. DALL·E is strong on Concept Incarnation (8.557). It's weak on Affective Chart (7.013). That's not random variation — it's structural. Different briefs activate different cognitive and generative requirements, and different models have different strengths.

The practical implication: treat each image brief as a separate skill to develop, with its own tool recommendations. And treat the network game questions as a single skill cluster — mastering one of them is strong evidence you can handle the others.

There were not twelve questions in this course. There were two exams — one measuring network navigation, one measuring image generation — assembled, perhaps accidentally, into a single gradebook. The students who aced both had to learn two different things. The ones who only learned one had no idea what the other one would ask of them. Neither did anyone else, until now.