How Students Learn Python
Anand S — LLM Psychologist @ Straive · PyConf Hyderabad · 14 March 2026
sanand0.github.io/talks/
QR Code
A
Inside the Exam
What we measured, and how

The Exam Records Everything

The Python OPPE is a 90-minute online programming exam taken by students in IIT Madras's BS Data Science programme. Every save is timestamped. Every test run is logged. Every submission has a score.

Students solve problems in a browser-based IDE. They can run against public test cases whenever they want. The private tests only run on final submission. They don't know what the private tests contain — they only see pass/fail counts.

Public Tests
Run anytime. Full input/output visible.
The only feedback loop during the exam.
Typically 2–4 test groups per question.
Private Tests
Run only on submission. Results visible after.
The actual grade. Unseen inputs, edge cases.
This is where the mental models get tested.

This is Very Rich Data

13,623
Students
251
Questions across 35 exam slots
2.06M
Test runs (saves + submissions)
55,142
Final submissions

45.5% of submitters didn't get full marks. Those submissions have stories. Today we're going to read four of them.

Replays Show Us Students' Thoughts

For a subset of students, we recorded full session replays: every keystroke, every save, every test run, with timestamps. This gives us something unusual — not just what students wrote, but how they got there.

Each event is a snapshot: timestamp, event type (save / run public / run private / submit), the full source code at that moment, and the test results.

We can watch a student's mental model evolve in real time. Or fail to evolve.

The unit of analysis isn't the final submission. It's the path.

Four Questions, Four Mental Models

Q1 · Card to Value Tuple
Parsing by shape, not contract. What happens when the only examples are one-character-rank cards and the private tests send 10D.
Q2 · Check for Greeting Prefix
Negotiating with the tests. Three different failure modes on a function that's essentially one line of Python.
Q3 · Shuffle a Three-Word Sentence
Solving the examples, not the problem. The purest overfitting exhibit in the dataset.
Q4 · Pangram Check
Heuristics vs invariants. The function that forgets a letter, and the function that counts letters instead of checking them.

For each question: one student who got it immediately, one who struggled productively, and one who got stuck in an interesting way.

Q1. Convert Cards like 2H, KC, ... to Numbers

def card_to_value_tuple(card: str) -> tuple: """Converts a card "{rank}{suit}" to a value tuple (suit_value, rank_value). Suits: S=1, H=2, D=3, C=4 Ranks: A=1, 2-10=face value, J=11, Q=12, K=13 >>> card_to_value_tuple('AH') (2, 1) >>> card_to_value_tuple('7D') (3, 7) >>> card_to_value_tuple('QS') (1, 12) >>> card_to_value_tuple('9C') (4, 9) """
Every public test case has a one-character rank. 'A', '7', 'Q', '9'. The function signature says nothing about this. The docstring doesn't either. It's just how the examples happened to be.

Ten Seconds. Full Marks.

👤 Quick-Gun Qadir ✓ Public 4/4 ✓ Private 4/4 ⏱ ~10 s 3 events
solution.py
def card_to_value_tuple(card: str) -> tuple:
    suit_map = {'S': 1, 'H': 2, 'D': 3, 'C': 4}
    rank_map = {'A': 1, 'J': 11, 'Q': 12, 'K': 13,
                '2': 2, '3': 3, '4': 4, '5': 5,
                '6': 6, '7': 7, '8': 8, '9': 9, '10': 10}
    rank = card[:-1]
    suit = card[-1]
    return (suit_map[suit], rank_map[rank])

card[:-1] for rank, card[-1] for suit. The student looked at the function signature, understood that suits are always one character, and wrote the general solution immediately. First run passes everything. This is what reading the spec before reading the examples looks like in practice.

Everything Is Right, But Not Quite

👤 Tuple Thammudu ✗ TypeError ⏱ 0 s event 1
solution.py
def card_to_value_tuple(card: str) -> tuple:
    color = card[-1]
    if color == 'S':   b = 1
    elif color == 'H': b = 2
    elif color == 'D': b = 3
    elif color == 'C': b = 4

    a = rank_map[card[:-1]]

    return tuple(a, b)

This raises TypeError: tuple expected at most 1 argument, got 2.
Did you know this? I didn't!

Everything Is Right Except Left and Right

👤 Tuple Thammudu ✗ Public 0/4 ⏱ ~15 s event 3
solution.py
def card_to_value_tuple(card: str) -> tuple:
    color = card[-1]
    if color == 'S':   b = 1
    elif color == 'H': b = 2
    elif color == 'D': b = 3
    elif color == 'C': b = 4

    a = rank_map[card[:-1]]

    return (a, b)
  • card_to_value_tuple('AH') → expected (2, 1), got (1, 2)
  • card_to_value_tuple('7D') → expected (3, 7), got (7, 3)
  • card_to_value_tuple('QS') → expected (1, 12), got (12, 1)
  • card_to_value_tuple('9C') → expected (4, 9), got (9, 4)

Now runnable. Still wrong. (a, b) returns (rank, suit). The spec says (suit, rank). The parentheses are fixed. The order is backwards.

Everything Is Right After A Fight

👤 Tuple Thammudu ✓ Public 4/4 ✓ Private 4/4 ⏱ ~40 s total event 5
solution.py
def card_to_value_tuple(card: str) -> tuple:
    color = card[-1]
    if color == 'S':   b = 1
    elif color == 'H': b = 2
    elif color == 'D': b = 3
    elif color == 'C': b = 4

    a = rank_map[card[:-1]]

    return (b, a)

Swap (a, b) to (b, a). Public: 4/4. Private: 4/4. The gap between first attempt and full marks is forty seconds — mostly spent fighting the tuple constructor and then left-right order. Two mistakes, both recoverable, both obvious in retrospect.

Public ✅ — Private: ❌

👤 Index Ishwar ✓ Public 4/4 ⚠ Private 2/4 — 50 ~8 events
solution.py
def card_to_value_tuple(card: str) -> tuple:
    value_1 = card[1]   # suit?
    value_2 = card[0]   # rank?
    try:
        value_1 = suit_values[value_1]
    except:
        value_1 = int(value_1)
    try:
        value_2 = rank_values[value_2]
    except:
        value_2 = int(value_2)
    return (value_1, value_2)
  • card_to_value_tuple('9S') → (1, 9)
  • card_to_value_tuple('KD') → (3, 13)
  • card_to_value_tuple('10H') → expected (2, 10), got wrong value
  • card_to_value_tuple('AC') → expected (4, 1), got wrong value

card[1] for the suit, card[0] for the rank. Works for 'AH', '7D', 'QS', '9C'. But '10H' destroys the parser.

Q2. Does It Start With A Greeting?

def starts_with_greeting(s: str) -> bool: """Returns True if s starts with 'Hello ' or 'Hi ' (with trailing space). >>> starts_with_greeting('Hello there') True >>> starts_with_greeting('Hi friend') True >>> starts_with_greeting('Hithere') False >>> starts_with_greeting('Welcome') False """
The rule is in the docstring in plain English: starts with 'Hello ' or 'Hi ' (with trailing space). Three different students find three different ways not to implement exactly this.

One Second. Full Marks.

👤 One-Line Olu ✓ Public 4/4 ✓ Private 3/3 ⏱ ~1 s 2 events
solution.py
def starts_with_greeting(s: str) -> bool:
    return s.startswith('Hello ') or s.startswith('Hi ')

The docstring says "starts with 'Hello ' or 'Hi '." The student wrote exactly that sentence in Python. Nothing more. The trailing space is there because the docstring says it's there. One second, because this is exactly what reading the spec looks like.

One Error Fails The Test.

👤 Spacebar Sania ✗ Public 3/4 ⚠ Private 2/3 — 67 event 4
solution.py
def starts_with_greeting(s: str) -> bool:
    return s.startswith('Hello') or s.startswith('Hi')
  • starts_with_greeting('Hello there') → True
  • starts_with_greeting('Hi friend') → True
  • starts_with_greeting('Hithere') → expected False, got True
  • starts_with_greeting('Welcome') → False

startswith('Hi') matches 'Hi friend' and also 'Hithere'. The trailing space disambiguates them. The fix is obvious.

Two Spaces Solve The Problem.

👤 Spacebar Sania ✓ Public 4/4 ✓ Private 3/3 event 6
solution.py
def starts_with_greeting(s: str) -> bool:
    return s.startswith('Hello ') or s.startswith('Hi ')

Two spaces added — one to each argument. Public: 4/4. Private: 100. The entire gap between 67 and 100 is two characters. This is the cleanest illustration of how test cases can reveal hidden issues.

Python Or JavaScript?

👤 JavaScript Jaya ✗ SyntaxError ⏱ 0 s event 1
solution.py
def starts_with_greeting(s: str) -> bool:
    if s.startswith('Hello' || 'Hi'):
        return True
    return False

|| is logical OR in JavaScript. In Python it's a syntax error. The intent here is completely correct — the student wants OR between two strings — but the operator is from a different language. Surface intent: perfect. Language fluency: still catching up.

The Operator Is Inside The Machine.

👤 JavaScript Jaya ✗ Public 3/4 ⏱ ~30 s event 3
solution.py
def starts_with_greeting(s: str) -> bool:
    if s.startswith('Hello' or 'Hi'):
        return True
    return False
  • starts_with_greeting('Hello there') → True
  • starts_with_greeting('Hi friend') → expected True, got False
  • starts_with_greeting('Hithere') → False
  • starts_with_greeting('Welcome') → False

'Hello' or 'Hi' evaluates to 'Hello', because 'Hello' is truthy. So startswith('Hello' or 'Hi') is just startswith('Hello'). 'Hi friend' now returns False. The fix — two separate startswith calls — comes a few events later.

Bargaining Instead of Debugging.

👤 JavaScript Jaya ✓ Public 4/4 ✗ Private: fails event 81
solution.py
def starts_with_greeting(s: str) -> bool:
    if s == 'Hithere':
        return False
    if s.startswith('Hello' or 'Hi' or 'hello' or 'hi'):
        return True
    elif s.startswith('Hi'):
        return True
    return False

The student saw that 'Hithere' should return False and their code was failing on it. So they added an explicit exception for that exact string. The public test is satisfied. The rule — trailing space — remains unimplemented. This is not debugging. It's treaty negotiations with the grader. One visible case, one explicit patch. You have seen this in a pull request.

The Solution Is Hidden In Comments.

👤 Comment Khanna ✓ Public 4/4 ⚠ Private 2/3 — 67 final submission
solution.py
def starts_with_greeting(s: str) -> bool:
    if not isinstance(s, str):
        return False
    s = s.strip()
    return s.startswith('Hello') or s.startswith('Hi')

    '''cleaned = s.strip().lower()
    is_hello = cleaned.startswith("hello ") and (
        len(cleaned) == 6 or not cleaned[6].isalpha())
    is_hi = cleaned.startswith("hi ") and (
        len(cleaned) == 3 or not cleaned[3].isalpha())
    return is_hello or is_hi'''

Student began the right solution, commented it out, and ignored it. Commenting code may be OK. Ignoring it may not.

Q3. Shuffle A Three-Word Sentence

def shuffle_sentence(sentence: str, order: tuple) -> str: """Rearranges the words of a 3-word sentence by the given index order. >>> shuffle_sentence('apple banana orange', (0, 2, 1)) 'apple orange banana' >>> shuffle_sentence('cat dog mouse', (2, 1, 0)) 'mouse dog cat' >>> shuffle_sentence('red green yellow', (1, 0, 2)) 'green red yellow' """
The order parameter is a tuple of indices. The question is whether the student treats it as data (general) or treats it as a pattern to match (specific).

Treats order as Data. One Line.

👤 Data-Driven Durga ✓ Public 3/3 ✓ Private 3/3 ⏱ ~1 s 3 events
solution.py
def shuffle_sentence(sentence: str, order: tuple) -> str:
    s = sentence.split(" ")
    return " ".join([s[i] for i in order])

Split, index, join. The student saw order as a general index plan and used it directly. The abstraction is visible in three words: s[i] for i in order. This works for any sentence, any order, any length — because the student solved the rule, not the examples.

Let's Solve One Sentence.

👤 Hard-Coder Hari ⚠ Public 1/3 event 3
solution.py
def shuffle_sentence(sentence: str, order: tuple) -> str:
    if order == (0, 2, 1):
        return 'apple orange banana'
  • shuffle_sentence('apple banana orange', (0,2,1)) → 'apple orange banana'
  • shuffle_sentence('cat dog mouse', (2,1,0)) → expected 'mouse dog cat'
  • shuffle_sentence('red green yellow', (1,0,2)) → expected 'green red yellow'

The public tests show specific inputs with specific outputs. This student learned the first one.

Let's Solve Two Sentences.

👤 Hard-Coder Hari ⚠ Public 2/3 event 7
solution.py
def shuffle_sentence(sentence: str, order: tuple) -> str:
    if order == (0, 2, 1):
        return 'apple orange banana'
    elif order == (2, 1, 0):
        return 'mouse dog cat'
  • shuffle_sentence('apple banana orange', (0,2,1)) → 'apple orange banana'
  • shuffle_sentence('cat dog mouse', (2,1,0)) → 'mouse dog cat'
  • shuffle_sentence('red green yellow', (1,0,2)) → expected 'green red yellow'

Still learning examples. Not the rule.

Let's Solve Three. But Private Tests Fail

👤 Hard-Coder Hari ✓ Public 3/3 ✗ Private 0/3 final submission
solution.py
def shuffle_sentence(sentence: str, order: tuple) -> str:
    if order == (0, 2, 1):
        return 'apple orange banana'
    elif order == (2, 1, 0):
        return 'mouse dog cat'
    elif order == (1, 0, 2):
        return 'yellow red green'
  • Private group 1: different sentence, same order → wrong output
  • Private group 2: different sentence, same order → wrong output
  • Private group 3: different sentence, same order → wrong output

Three public greens, zero private passes. The function is convinced that all sentence shuffling is secretly about one fruit salad. The private tests use different words.

But The Universe Is Small Enough to List

👤 Hard-Coder Hari ✓ Public 3/3 ✓ Private 3/3 ~12 events
solution.py
def shuffle_sentence(sentence: str, order: tuple) -> str:
    s = sentence.split(" ")
    n0, n1, n2 = s[0], s[1], s[2]
    if order == (0, 1, 2):   return n0 + " " + n1 + " " + n2
    elif order == (0, 2, 1): return n0 + " " + n2 + " " + n1
    elif order == (1, 0, 2): return n1 + " " + n0 + " " + n2
    elif order == (1, 2, 0): return n1 + " " + n2 + " " + n0
    elif order == (2, 0, 1): return n2 + " " + n0 + " " + n1
    else:                    return n2 + " " + n1 + " " + n0

Six branches, one for each permutation of three words. It is correct — there are exactly six three-word orderings, and all six are here. Private: 100. Brute force wins a small, perfectly legal victory over abstraction. The student isn't wrong. They've just decided that abstraction is optional when the universe is small enough to enumerate. This is a strategy that won't survive scaling. But it worked here.

Q4. Does It Have All Letters?

def is_pangram(text: str) -> bool: """Returns True if text contains all 26 letters of the alphabet. >>> is_pangram('the quick brown fox jumps over the lazy dog') True >>> is_pangram('this is not a pangram') False >>> is_pangram('abcdefghijklmnopqrstuvwxyz') True """
Two students, two shortcuts that almost work. One forgets a letter. One counts letters instead of checking them.

Solve It In One Expression

👤 Set-Logic Selvi ✓ Public 3/3 ✓ Private 3/3 ⏱ ~0 s 2 events
solution.py
def is_pangram(text: str) -> bool:
    alphabet = set("abcdefghijklmnopqrstuvwxyz")
    return alphabet.issubset(set(text.lower()))

Build the alphabet as a set. Check if it's a subset of the unique characters in the input. The invariant — every letter must appear — maps directly to set membership. The entire logic is one expression. This student saw the mathematical structure of the problem before reaching for a loop.

Can You Spot The First Bug?

👤 Spelling Spandana ⚠ Private 1/3 — 33 event 35
solution.py
def is_pangram(text: str) -> bool:
    letters = "absdefghijklmnopqrstuvwxyz"
    count = 0
    for char in text.lower():
        if char in letters:
            count += 1
    return count >= 26

Take a moment.

Can You Spot The First Bug?

👤 Spelling Spandana ⚠ Private 1/3 — 33 event 35
solution.py
def is_pangram(text: str) -> bool:
    letters = "absdefghijklmnopqrstuvwxyz"
    count = 0
    for char in text.lower():
        if char in letters:
            count += 1
    return count >= 26

"absdefghijklmnopqrstuvwxyz" — 'c' is missing, replaced by a second 's'. The function whose entire job is to verify all 26 letters starts by not including one. Under cognitive load, even the checking mechanism needs to be checked.

If It Longer Than 26 Letters...

👤 Counter Gounder ✓ Public 3/3 ⚠ Private 1/3 — 33 event 61
solution.py
def is_pangram(text: str) -> bool:
    alphabets = string.ascii_lowercase
    text1 = text.lower().replace(' ', '')
    count = 0
    for i in range(len(text1)):
        if text1[i] in alphabets:
            count += 1
    return count >= 26
  • is_pangram("the quick brown fox jumps over the lazy dog") → True
  • is_pangram("this is not a pangram") → False
  • is_pangram("abcdefghijklmnopqrstuvwxyz") → True
  • is_pangram("aaaaaaaaaaaaaaaaaaaaaaaaaaaa") → expected False, got True

Count how many letters appear in the string. If 26 or more, return True. The public tests pass because "the quick brown fox..." has well over 26 letters. The private tests do not.

Oh, Wait, I Meant Unique Letters.

👤 Counter Gounder ✓ Public 3/3 ✓ Private 3/3 event 78
solution.py
def is_pangram(text: str) -> bool:
    alphabets = string.ascii_lowercase
    text1 = text.lower().replace(' ', '')
    uniq = set()
    for i in range(len(text1)):
        if text1[i] in alphabets:
            uniq.add(text1[i])
    return len(uniq) >= 26

Switch from counting total letters to counting unique letters. Private: 100. The false summit is the moment count >= 26 passed three public tests. The actual solution came later, when the question changed from "does my code pass?" to "what family of inputs would break it?"

She Had It Right. Then Abandoned It.

right structure, one bug abandoned fix count 4/6 right 100 edits 6/6 right
👤 Rewind Rita

At event 1, Rita had the right approach: iterate through characters, check membership, return False on failure. One bug — return True was inside the loop. She didn't fix the bug. She replaced the approach entirely with count >= 26, which passed all public tests and nothing else.

It took 103 events and 1 hour, 51 minutes to find her way back to essentially the same structure — minus the indentation error.

The right solution was there at the start. Without a save point, she couldn't go back to it.

Right Approach. One Bug...

👤 Rewind Rita ⏱ 0:00 · Event 1 ✗ Public 2/3
solution.py
def is_pangram(text: str) -> bool:
    alphabets=string.ascii_lowercase
    text1=text.lower()
    for i in text1:
        if i not in alphabets:
            return False
        return True
  • is_pangram("the quick brown fox jumps over the lazy dog") → True
  • is_pangram("this is not a pangram") → expected False, got True
  • is_pangram("abcdefghijklmnopqrstuvwxyz") → True

The right structure is already here: iterate through characters, check membership, return False on failure. One bug — return True is inside the for loop. Fix the indentation and this is a correct solution. She didn't fix it. She replaced the whole approach.

... Means We're Just Counting Letters.

👤 Rewind Rita ⏱ 1:35:09 · Event 61 ✓ Public 3/3 ✗ Private 1/3
solution.py
def is_pangram(text: str) -> bool:
    alphabets=string.ascii_lowercase
    text1=text.lower().replace(' ','')

    '''for i in range(len(text1)):
        if text[i] not in alphabets:
            return False

        return True'''

    count=0
    for i in range(len(text1)):
        if text1[i] in alphabets:
            count+=1

    if count>=26:
        return True
    return False

The original loop code is in triple quotes — buried alive. The new approach counts every letter. If count ≥ 26, it's a pangram. The problem: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" also has 26+ letters. The hidden tests know this. The public tests don't care.

Cleaner Code. Same Wrong Answer. Still 33.

👤 Rewind Rita ⏱ 1:40:07 · Event 81 ✓ Public 3/3 ✗ Private 1/3
solution.py
def is_pangram(text: str) -> bool:
    alphabets=string.ascii_lowercase
    text1=text.lower().replace(' ','')

    count=0
    for i in range(len(text1)):
        if text1[i] in alphabets:
            count+=1

    if count>=26:
        return True
    return False

She removed the triple-quoted graveyard. The file is clean. The logic is unchanged. The count heuristic is still wrong. Tidiness doesn't move the grader.

103 Events. Set-Based. Private: 100.

👤 Rewind Rita ⏱ 1:51:05 · Event 103 ✓ Private 3/3
solution.py
def is_pangram(text: str) -> bool:
    alphabets=string.ascii_lowercase
    text1=text.lower().replace(' ','')

    '''count=0
    for i in range(len(text1)):
        if text1[i] in alphabets:
            count+=1
    if count>=26:
        return True
    return False'''

    uniq=set()
    for i in range(len(text1)):
        if text1[i] in alphabets:
            uniq.add(text1[i])

    if len(uniq)>=26:
        return True
    return False

The count code goes into triple quotes. The set-based approach — collect unique letters, check if there are 26 — is live. This is structurally what she had at event 1, fixed. It took 103 events and 1:51 to get here.

फिर, आप pattern पहचान नहीं पाए

Photo of the scene in Taare Zameen Par where Ram Nikhumb, the teacher, tells Ishaan's parents they couldn't spot the pattern in his errors.

Four Mental Models

The Memorizer
if order == (0,2,1): return 'apple orange banana'
Solves the visible world, not the rule. Completely rational given limited information.
The Guesser
return count >= 26
Finds a correlated shortcut instead of the invariant. Good instinct, insufficient precision.
The Misreader
tuple(a, b) · (a, b) · startswith('Hello')
Computes the right thing, returns the wrong structure or misses one condition.
The Regressor
107 events, ends worse than started
Overwrites working partial solutions while chasing the hidden test. No save points.

Spot the pattern from the kinds of mistakes.

How Students Attempt Questions

27,577 exam sessions · 4 navigation patterns · one very large performance gap

Scan First. Or Score Half as Much.

Linears · 38%
Q5 → Q7 → Q9 → Q10 → Q13 ↩ Q12
Scan the whole paper before committing to any question. Mean score: 18.2%. Above 50%: 19.7%.
Cyclers · 31%
Q14 → Q17 → Q15 → Q14 → Q17 → Q15
Keep cycling back before finishing anything. Mean score: 12.6%. Above 50%: 11.3%.
Jumpers · 14%
Q10 → Q5 → Q6 → Q9 → Q10 → Q7
Skip to questions you know. Ignore the rest. Mean score: 9.6%. Above 50%: 6.8%.
Togglers · 17%
Q5 → Q9 → Q12 → Q9 → Q12 → Q9
Bounce between two or three questions until time runs out. Mean score: 6.5%. Above 50%: 2.1%.

Kruskal-Wallis H=1570, p < 10−300. This is not noise.

Linears: Survey First, Solve Later.

🥇 Best performers · 38.4% of students Mean score 18.2% · 19.7% above 50%
0 30m 60m 90m 120m Q5 Q7 Q9 Q10 Q12 Q13 pass fail

100% first-sweep coverage — they touched every question before going back to any of them. Revisit rate: 22.5%. The strategy is: survey the paper, find what you can solve, solve it, then come back for the hard ones. This is also what the test-prep industry has been telling you for decades. Most students didn't do it.

Cyclers Revisit Before Finishing.

🥈 Second · 30.8% of students Mean score 12.6% · 11.3% above 50%
0 30m 60m 90m 120m Q14 Q15 Q16 Q17 Q18 Q20 pass fail partial

First-sweep coverage: 56.9% — they start revisiting before they've seen the whole paper. Revisit rate: 50%. The problem isn't effort. They're spending time re-reading problems they haven't solved yet instead of seeing whether other questions are easier. It's studying during the exam, which is a fine idea, except the exam is also happening.

Jumpers Skip Questions With Points.

🥉 Third · 14.1% of students Mean score 9.6% · 6.8% above 50%
0 30m 60m 90m 120m Q5 Q6 Q7 Q9 Q10 pass fail

Paper coverage: 71%. They started on the hardest question (Q10), couldn't solve it, jumped to easy wins (Q5, Q6, Q9), then kept returning to Q10. The approach of "find easy points first" is sound. The problem is that Q8 went unseen — and was probably solvable.

Togglers Get Stuck In Few Bad Questions.

⚠ Lowest · 16.8% of students Mean score 6.5% · Only 2.1% above 50%
0 30m 60m 90m 120m Q5 Q9 Q10 Q12 pass fail

Paper coverage: 50.5%. This student solved Q5 in 90 seconds, then spent the rest of the exam on Q9 and Q12. They couldn't solve either. They also couldn't see that other questions existed until it was too late. 42.8% of their moves were local toggles. The exam was two hours long. They used all of it on two wrong answers.

Linears Are Far More Likely to Pass.

0% 10% 15% 20% 19.7% 11.3% 6.8% 2.1% Linear Cyclers Jumpers Togglers % of students scoring above 50%

Cliff's delta 0.32 vs Togglers. p < 10−296. Caveat: correlation, not causation — stronger students probably also navigate better. Both matter.

The Winning Strategy Is One Sentence.

Skim every question before writing any code.
For teaching
Teach exam survey as an explicit skill. Most students who don't do it, don't know they should. Name it. Practice it. "Read every problem statement before writing line one" is a teachable behavior, not an innate trait.
For exam design
Order questions roughly by difficulty. An easy opener gives students a foothold and reveals the paper's shape. A hard first question punishes sweepers and rewards students who happen to know that one thing.
For early intervention
The Q9→Q12→Q9 toggle is detectable in real time. A system that spots A→B→A oscillation and says "you've been here three times — try a different question" could redirect students before the clock runs out.
The honest caveat
This is correlational. Students who sweep broadly may do so because they can — strategy is downstream of skill. Teaching the strategy is still worth doing. Either direction of causality supports it.

If You Taught only Three Things

The visible tests are not the whole task.
Students who wrote count >= 26 passed all public tests. Students who hardcoded 'apple orange banana' passed all the examples. That gap is where most of the failing happens.
Save when it works.
Rita had the right structure at event 1. 103 events later, she arrived back at the same approach — minus one indentation bug. Teach how to save when undo is missing.
Scan the whole paper first.
Students who scanned all questions first were 9× more likely to pass. Most students don't do this. It's easy to teach.

Anand S · PyConf Hyderabad · 14 March 2026 · sanand0.github.io/talks/2026-03-14-how-students-learn-python/

फिर, आप pattern पहचान नहीं पाए

Photo of the scene in Taare Zameen Par where Ram Nikhumb, the teacher, tells Ishaan's parents they couldn't spot the pattern in his errors.
How Students Learn Python
Anand S — LLM Psychologist @ Straive · PyConf Hyderabad · 14 March 2026
sanand0.github.io/talks/
QR Code