How Students Learn Python

Anand S — LLM Psychologist @ Straive · PyConf Hyderabad · 14 March 2026
sanand0.github.io/talks/

Inside the Exam

What we measured, and how

A · Inside the Exam

The Exam Records Everything

The Python OPPE is a 90-minute online programming exam taken by students in IIT Madras's BS Data Science programme. Every save is timestamped. Every test run is logged. Every submission has a score.

Students solve problems in a browser-based IDE. They can run against public test cases whenever they want. The private tests only run on final submission. They don't know what the private tests contain — they only see pass/fail counts.

Public Tests

Run anytime. Full input/output visible.
The only feedback loop during the exam.
Typically 2–4 test groups per question.

Private Tests

Run only on submission. Results visible after.
The actual grade. Unseen inputs, edge cases.
This is where the mental models get tested.

A · Inside the Exam

This is Very Rich Data

13,623

Students

251

Questions across 35 exam slots

2.06M

Test runs (saves + submissions)

55,142

Final submissions

45.5% of submitters didn't get full marks. Those submissions have stories. Today we're going to read four of them.

A · Inside the Exam

Replays Show Us Students' Thoughts

For a subset of students, we recorded full session replays: every keystroke, every save, every test run, with timestamps. This gives us something unusual — not just what students wrote, but how they got there.

Each event is a snapshot: timestamp, event type (save / run public / run private / submit), the full source code at that moment, and the test results.

We can watch a student's mental model evolve in real time. Or fail to evolve.

The unit of analysis isn't the final submission. It's the path.

A · Inside the Exam

Four Questions, Four Mental Models

Q1 · Card to Value Tuple

Parsing by shape, not contract. What happens when the only examples are one-character-rank cards and the private tests send 10D.

Q2 · Check for Greeting Prefix

Negotiating with the tests. Three different failure modes on a function that's essentially one line of Python.

Q3 · Shuffle a Three-Word Sentence

Solving the examples, not the problem. The purest overfitting exhibit in the dataset.

Q4 · Pangram Check

Heuristics vs invariants. The function that forgets a letter, and the function that counts letters instead of checking them.

For each question: one student who got it immediately, one who struggled productively, and one who got stuck in an interesting way.

B · Q1 · Card to Value Tuple

Q1. Convert Cards like 2H, KC, ... to Numbers

def card_to_value_tuple(card: str) -> tuple: """Converts a card "{rank}{suit}" to a value tuple (suit_value, rank_value). Suits: S=1, H=2, D=3, C=4 Ranks: A=1, 2-10=face value, J=11, Q=12, K=13 >>> card_to_value_tuple('AH') (2, 1) >>> card_to_value_tuple('7D') (3, 7) >>> card_to_value_tuple('QS') (1, 12) >>> card_to_value_tuple('9C') (4, 9) """

Every public test case has a one-character rank. 'A', '7', 'Q', '9'. The function signature says nothing about this. The docstring doesn't either. It's just how the examples happened to be.

B · Q1 · Baseline

Ten Seconds. Full Marks.

👤 Quick-Gun Qadir ✓ Public 4/4 ✓ Private 4/4 ⏱ ~10 s 3 events

solution.py

def card_to_value_tuple(card: str) -> tuple:
    suit_map = {'S': 1, 'H': 2, 'D': 3, 'C': 4}
    rank_map = {'A': 1, 'J': 11, 'Q': 12, 'K': 13,
                '2': 2, '3': 3, '4': 4, '5': 5,
                '6': 6, '7': 7, '8': 8, '9': 9, '10': 10}
    rank = card[:-1]
    suit = card[-1]
    return (suit_map[suit], rank_map[rank])

card[:-1] for rank, card[-1] for suit. The student looked at the function signature, understood that suits are always one character, and wrote the general solution immediately. First run passes everything. This is what reading the spec before reading the examples looks like in practice.

B · Q1 · Productive Struggle

Everything Is Right, But Not Quite

👤 Tuple Thammudu ✗ TypeError ⏱ 0 s event 1

solution.py

def card_to_value_tuple(card: str) -> tuple:
    color = card[-1]
    if color == 'S':   b = 1
    elif color == 'H': b = 2
    elif color == 'D': b = 3
    elif color == 'C': b = 4

    a = rank_map[card[:-1]]

    return tuple(a, b)

This raises TypeError: tuple expected at most 1 argument, got 2.
Did you know this? I didn't!

B · Q1 · Productive Struggle

Everything Is Right Except Left and Right

👤 Tuple Thammudu ✗ Public 0/4 ⏱ ~15 s event 3

solution.py

def card_to_value_tuple(card: str) -> tuple:
    color = card[-1]
    if color == 'S':   b = 1
    elif color == 'H': b = 2
    elif color == 'D': b = 3
    elif color == 'C': b = 4

    a = rank_map[card[:-1]]

    return (a, b)

✗card_to_value_tuple('AH') → expected (2, 1), got (1, 2)
✗card_to_value_tuple('7D') → expected (3, 7), got (7, 3)
✗card_to_value_tuple('QS') → expected (1, 12), got (12, 1)
✗card_to_value_tuple('9C') → expected (4, 9), got (9, 4)

Now runnable. Still wrong. (a, b) returns (rank, suit). The spec says (suit, rank). The parentheses are fixed. The order is backwards.

B · Q1 · Productive Struggle

Everything Is Right After A Fight

👤 Tuple Thammudu ✓ Public 4/4 ✓ Private 4/4 ⏱ ~40 s total event 5

solution.py

def card_to_value_tuple(card: str) -> tuple:
    color = card[-1]
    if color == 'S':   b = 1
    elif color == 'H': b = 2
    elif color == 'D': b = 3
    elif color == 'C': b = 4

    a = rank_map[card[:-1]]

    return (b, a)

Swap (a, b) to (b, a). Public: 4/4. Private: 4/4. The gap between first attempt and full marks is forty seconds — mostly spent fighting the tuple constructor and then left-right order. Two mistakes, both recoverable, both obvious in retrospect.

B · Q1 · The Hidden Boss

Public ✅ — Private: ❌

👤 Index Ishwar ✓ Public 4/4 ⚠ Private 2/4 — 50 ~8 events

solution.py

def card_to_value_tuple(card: str) -> tuple:
    value_1 = card[1]   # suit?
    value_2 = card[0]   # rank?
    try:
        value_1 = suit_values[value_1]
    except:
        value_1 = int(value_1)
    try:
        value_2 = rank_values[value_2]
    except:
        value_2 = int(value_2)
    return (value_1, value_2)

✓card_to_value_tuple('9S') → (1, 9)
✓card_to_value_tuple('KD') → (3, 13)
✗card_to_value_tuple('10H') → expected (2, 10), got wrong value
✗card_to_value_tuple('AC') → expected (4, 1), got wrong value

card[1] for the suit, card[0] for the rank. Works for 'AH', '7D', 'QS', '9C'. But '10H' destroys the parser.

B · Q2 · Check for Greeting Prefix

Q2. Does It Start With A Greeting?

def starts_with_greeting(s: str) -> bool: """Returns True if s starts with 'Hello ' or 'Hi ' (with trailing space). >>> starts_with_greeting('Hello there') True >>> starts_with_greeting('Hi friend') True >>> starts_with_greeting('Hithere') False >>> starts_with_greeting('Welcome') False """

The rule is in the docstring in plain English: starts with 'Hello ' or 'Hi ' (with trailing space). Three different students find three different ways not to implement exactly this.

B · Q2 · Baseline

One Second. Full Marks.

👤 One-Line Olu ✓ Public 4/4 ✓ Private 3/3 ⏱ ~1 s 2 events

solution.py

def starts_with_greeting(s: str) -> bool:
    return s.startswith('Hello ') or s.startswith('Hi ')

The docstring says "starts with 'Hello ' or 'Hi '." The student wrote exactly that sentence in Python. Nothing more. The trailing space is there because the docstring says it's there. One second, because this is exactly what reading the spec looks like.

B · Q2 · The Trailing Space

One Error Fails The Test.

👤 Spacebar Sania ✗ Public 3/4 ⚠ Private 2/3 — 67 event 4

solution.py

def starts_with_greeting(s: str) -> bool:
    return s.startswith('Hello') or s.startswith('Hi')

✓starts_with_greeting('Hello there') → True
✓starts_with_greeting('Hi friend') → True
✗starts_with_greeting('Hithere') → expected False, got True
✓starts_with_greeting('Welcome') → False

startswith('Hi') matches 'Hi friend' and also 'Hithere'. The trailing space disambiguates them. The fix is obvious.

B · Q2 · The Trailing Space

Two Spaces Solve The Problem.

👤 Spacebar Sania ✓ Public 4/4 ✓ Private 3/3 event 6

solution.py

def starts_with_greeting(s: str) -> bool:
    return s.startswith('Hello ') or s.startswith('Hi ')

Two spaces added — one to each argument. Public: 4/4. Private: 100. The entire gap between 67 and 100 is two characters. This is the cleanest illustration of how test cases can reveal hidden issues.

B · Q2 · JavaScript Accent

Python Or JavaScript?

👤 JavaScript Jaya ✗ SyntaxError ⏱ 0 s event 1

solution.py

def starts_with_greeting(s: str) -> bool:
    if s.startswith('Hello' || 'Hi'):
        return True
    return False

|| is logical OR in JavaScript. In Python it's a syntax error. The intent here is completely correct — the student wants OR between two strings — but the operator is from a different language. Surface intent: perfect. Language fluency: still catching up.

B · Q2 · JavaScript Accent

The Operator Is Inside The Machine.

👤 JavaScript Jaya ✗ Public 3/4 ⏱ ~30 s event 3

solution.py

def starts_with_greeting(s: str) -> bool:
    if s.startswith('Hello' or 'Hi'):
        return True
    return False

✓starts_with_greeting('Hello there') → True
✗starts_with_greeting('Hi friend') → expected True, got False
✓starts_with_greeting('Hithere') → False
✓starts_with_greeting('Welcome') → False

'Hello' or 'Hi' evaluates to 'Hello', because 'Hello' is truthy. So startswith('Hello' or 'Hi') is just startswith('Hello'). 'Hi friend' now returns False. The fix — two separate startswith calls — comes a few events later.

B · Q2 · Diplomatic Patch

Bargaining Instead of Debugging.

👤 JavaScript Jaya ✓ Public 4/4 ✗ Private: fails event 81

solution.py

def starts_with_greeting(s: str) -> bool:
    if s == 'Hithere':
        return False
    if s.startswith('Hello' or 'Hi' or 'hello' or 'hi'):
        return True
    elif s.startswith('Hi'):
        return True
    return False

The student saw that 'Hithere' should return False and their code was failing on it. So they added an explicit exception for that exact string. The public test is satisfied. The rule — trailing space — remains unimplemented. This is not debugging. It's treaty negotiations with the grader. One visible case, one explicit patch. You have seen this in a pull request.

B · Q2 · Version Control by Comment

The Solution Is Hidden In Comments.

👤 Comment Khanna ✓ Public 4/4 ⚠ Private 2/3 — 67 final submission

solution.py

def starts_with_greeting(s: str) -> bool:
    if not isinstance(s, str):
        return False
    s = s.strip()
    return s.startswith('Hello') or s.startswith('Hi')

    '''cleaned = s.strip().lower()
    is_hello = cleaned.startswith("hello ") and (
        len(cleaned) == 6 or not cleaned[6].isalpha())
    is_hi = cleaned.startswith("hi ") and (
        len(cleaned) == 3 or not cleaned[3].isalpha())
    return is_hello or is_hi'''

Student began the right solution, commented it out, and ignored it. Commenting code may be OK. Ignoring it may not.

B · Q3 · Shuffle a Three-Word Sentence

Q3. Shuffle A Three-Word Sentence

def shuffle_sentence(sentence: str, order: tuple) -> str: """Rearranges the words of a 3-word sentence by the given index order. >>> shuffle_sentence('apple banana orange', (0, 2, 1)) 'apple orange banana' >>> shuffle_sentence('cat dog mouse', (2, 1, 0)) 'mouse dog cat' >>> shuffle_sentence('red green yellow', (1, 0, 2)) 'green red yellow' """

The order parameter is a tuple of indices. The question is whether the student treats it as data (general) or treats it as a pattern to match (specific).

B · Q3 · Baseline

Treats `order` as Data. One Line.

👤 Data-Driven Durga ✓ Public 3/3 ✓ Private 3/3 ⏱ ~1 s 3 events

solution.py

def shuffle_sentence(sentence: str, order: tuple) -> str:
    s = sentence.split(" ")
    return " ".join([s[i] for i in order])

Split, index, join. The student saw order as a general index plan and used it directly. The abstraction is visible in three words: s[i] for i in order. This works for any sentence, any order, any length — because the student solved the rule, not the examples.

B · Q3 · Overfitting

Let's Solve One Sentence.

👤 Hard-Coder Hari ⚠ Public 1/3 event 3

solution.py

def shuffle_sentence(sentence: str, order: tuple) -> str:
    if order == (0, 2, 1):
        return 'apple orange banana'

✓shuffle_sentence('apple banana orange', (0,2,1)) → 'apple orange banana'
✗shuffle_sentence('cat dog mouse', (2,1,0)) → expected 'mouse dog cat'
✗shuffle_sentence('red green yellow', (1,0,2)) → expected 'green red yellow'

The public tests show specific inputs with specific outputs. This student learned the first one.

B · Q3 · Overfitting

Let's Solve Two Sentences.

👤 Hard-Coder Hari ⚠ Public 2/3 event 7

solution.py

def shuffle_sentence(sentence: str, order: tuple) -> str:
    if order == (0, 2, 1):
        return 'apple orange banana'
    elif order == (2, 1, 0):
        return 'mouse dog cat'

✓shuffle_sentence('apple banana orange', (0,2,1)) → 'apple orange banana'
✓shuffle_sentence('cat dog mouse', (2,1,0)) → 'mouse dog cat'
✗shuffle_sentence('red green yellow', (1,0,2)) → expected 'green red yellow'

Still learning examples. Not the rule.

B · Q3 · Overfitting

Let's Solve Three. But Private Tests Fail

👤 Hard-Coder Hari ✓ Public 3/3 ✗ Private 0/3 final submission

solution.py

def shuffle_sentence(sentence: str, order: tuple) -> str:
    if order == (0, 2, 1):
        return 'apple orange banana'
    elif order == (2, 1, 0):
        return 'mouse dog cat'
    elif order == (1, 0, 2):
        return 'yellow red green'

✗Private group 1: different sentence, same order → wrong output
✗Private group 2: different sentence, same order → wrong output
✗Private group 3: different sentence, same order → wrong output

Three public greens, zero private passes. The function is convinced that all sentence shuffling is secretly about one fruit salad. The private tests use different words.

B · Q3 · Brute Force

But The Universe Is Small Enough to List

👤 Hard-Coder Hari ✓ Public 3/3 ✓ Private 3/3 ~12 events

solution.py

def shuffle_sentence(sentence: str, order: tuple) -> str:
    s = sentence.split(" ")
    n0, n1, n2 = s[0], s[1], s[2]
    if order == (0, 1, 2):   return n0 + " " + n1 + " " + n2
    elif order == (0, 2, 1): return n0 + " " + n2 + " " + n1
    elif order == (1, 0, 2): return n1 + " " + n0 + " " + n2
    elif order == (1, 2, 0): return n1 + " " + n2 + " " + n0
    elif order == (2, 0, 1): return n2 + " " + n0 + " " + n1
    else:                    return n2 + " " + n1 + " " + n0

Six branches, one for each permutation of three words. It is correct — there are exactly six three-word orderings, and all six are here. Private: 100. Brute force wins a small, perfectly legal victory over abstraction. The student isn't wrong. They've just decided that abstraction is optional when the universe is small enough to enumerate. This is a strategy that won't survive scaling. But it worked here.

B · Q4 · Pangram Check

Q4. Does It Have All Letters?

def is_pangram(text: str) -> bool: """Returns True if text contains all 26 letters of the alphabet. >>> is_pangram('the quick brown fox jumps over the lazy dog') True >>> is_pangram('this is not a pangram') False >>> is_pangram('abcdefghijklmnopqrstuvwxyz') True """

Two students, two shortcuts that almost work. One forgets a letter. One counts letters instead of checking them.

B · Q4 · Baseline

Solve It In One Expression

👤 Set-Logic Selvi ✓ Public 3/3 ✓ Private 3/3 ⏱ ~0 s 2 events

solution.py

def is_pangram(text: str) -> bool:
    alphabet = set("abcdefghijklmnopqrstuvwxyz")
    return alphabet.issubset(set(text.lower()))

Build the alphabet as a set. Check if it's a subset of the unique characters in the input. The invariant — every letter must appear — maps directly to set membership. The entire logic is one expression. This student saw the mathematical structure of the problem before reaching for a loop.

B · Q4 · Find the Bug

Can You Spot The First Bug?

👤 Spelling Spandana ⚠ Private 1/3 — 33 event 35

solution.py

def is_pangram(text: str) -> bool:
    letters = "absdefghijklmnopqrstuvwxyz"
    count = 0
    for char in text.lower():
        if char in letters:
            count += 1
    return count >= 26

Take a moment.

B · Q4 · The Checking Mechanism Needs Checking

Can You Spot The First Bug?

👤 Spelling Spandana ⚠ Private 1/3 — 33 event 35

solution.py

def is_pangram(text: str) -> bool:
    letters = "absdefghijklmnopqrstuvwxyz"
    count = 0
    for char in text.lower():
        if char in letters:
            count += 1
    return count >= 26

"absdefghijklmnopqrstuvwxyz" — 'c' is missing, replaced by a second 's'. The function whose entire job is to verify all 26 letters starts by not including one. Under cognitive load, even the checking mechanism needs to be checked.

B · Q4 · The Heuristic

If It Longer Than 26 Letters...

👤 Counter Gounder ✓ Public 3/3 ⚠ Private 1/3 — 33 event 61

solution.py

def is_pangram(text: str) -> bool:
    alphabets = string.ascii_lowercase
    text1 = text.lower().replace(' ', '')
    count = 0
    for i in range(len(text1)):
        if text1[i] in alphabets:
            count += 1
    return count >= 26

✓is_pangram("the quick brown fox jumps over the lazy dog") → True
✓is_pangram("this is not a pangram") → False
✓is_pangram("abcdefghijklmnopqrstuvwxyz") → True
✗is_pangram("aaaaaaaaaaaaaaaaaaaaaaaaaaaa") → expected False, got True

Count how many letters appear in the string. If 26 or more, return True. The public tests pass because "the quick brown fox..." has well over 26 letters. The private tests do not.

B · Q4 · The Heuristic → The Fix

Oh, Wait, I Meant Unique Letters.

👤 Counter Gounder ✓ Public 3/3 ✓ Private 3/3 event 78

solution.py

def is_pangram(text: str) -> bool:
    alphabets = string.ascii_lowercase
    text1 = text.lower().replace(' ', '')
    uniq = set()
    for i in range(len(text1)):
        if text1[i] in alphabets:
            uniq.add(text1[i])
    return len(uniq) >= 26

Switch from counting total letters to counting unique letters. Private: 100. The false summit is the moment count >= 26 passed three public tests. The actual solution came later, when the question changed from "does my code pass?" to "what family of inputs would break it?"

B · Q4 · The Hard Case

She Had It Right. Then Abandoned It.

      right structure, one bug
      →
      abandoned
      →
      fix count
      →
      4/6 right
      →
      100 edits
      →
      6/6 right
    

👤 Rewind Rita

At event 1, Rita had the right approach: iterate through characters, check membership, return False on failure. One bug — return True was inside the loop. She didn't fix the bug. She replaced the approach entirely with count >= 26, which passed all public tests and nothing else.

It took 103 events and 1 hour, 51 minutes to find her way back to essentially the same structure — minus the indentation error.

The right solution was there at the start. Without a save point, she couldn't go back to it.

Q4 · Pangram Check

Right Approach. One Bug...

👤 Rewind Rita ⏱ 0:00 · Event 1 ✗ Public 2/3

solution.py

def is_pangram(text: str) -> bool:
    alphabets=string.ascii_lowercase
    text1=text.lower()
    for i in text1:
        if i not in alphabets:
            return False
        return True

✓is_pangram("the quick brown fox jumps over the lazy dog") → True
✗is_pangram("this is not a pangram") → expected False, got True
✓is_pangram("abcdefghijklmnopqrstuvwxyz") → True

The right structure is already here: iterate through characters, check membership, return False on failure. One bug — return True is inside the for loop. Fix the indentation and this is a correct solution. She didn't fix it. She replaced the whole approach.

Q4 · Pangram Check

... Means We're Just Counting Letters.

👤 Rewind Rita ⏱ 1:35:09 · Event 61 ✓ Public 3/3 ✗ Private 1/3

solution.py

def is_pangram(text: str) -> bool:
    alphabets=string.ascii_lowercase
    text1=text.lower().replace(' ','')

    '''for i in range(len(text1)):
        if text[i] not in alphabets:
            return False

        return True'''

    count=0
    for i in range(len(text1)):
        if text1[i] in alphabets:
            count+=1

    if count>=26:
        return True
    return False

The original loop code is in triple quotes — buried alive. The new approach counts every letter. If count ≥ 26, it's a pangram. The problem: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" also has 26+ letters. The hidden tests know this. The public tests don't care.

Q4 · Pangram Check

Cleaner Code. Same Wrong Answer. Still 33.

👤 Rewind Rita ⏱ 1:40:07 · Event 81 ✓ Public 3/3 ✗ Private 1/3

solution.py

def is_pangram(text: str) -> bool:
    alphabets=string.ascii_lowercase
    text1=text.lower().replace(' ','')

    count=0
    for i in range(len(text1)):
        if text1[i] in alphabets:
            count+=1

    if count>=26:
        return True
    return False

She removed the triple-quoted graveyard. The file is clean. The logic is unchanged. The count heuristic is still wrong. Tidiness doesn't move the grader.

Q4 · Pangram Check

103 Events. Set-Based. Private: 100.

👤 Rewind Rita ⏱ 1:51:05 · Event 103 ✓ Private 3/3

solution.py

def is_pangram(text: str) -> bool:
    alphabets=string.ascii_lowercase
    text1=text.lower().replace(' ','')

    '''count=0
    for i in range(len(text1)):
        if text1[i] in alphabets:
            count+=1
    if count>=26:
        return True
    return False'''

    uniq=set()
    for i in range(len(text1)):
        if text1[i] in alphabets:
            uniq.add(text1[i])

    if len(uniq)>=26:
        return True
    return False

The count code goes into triple quotes. The set-based approach — collect unique letters, check if there are 26 — is live. This is structurally what she had at event 1, fixed. It took 103 events and 1:51 to get here.

B · The Pattern

फिर, आप pattern पहचान नहीं पाए

Photo of the scene in Taare Zameen Par where Ram Nikhumb, the teacher, tells Ishaan's parents they couldn't spot the pattern in his errors.

B · The Pattern

Four Mental Models

The Memorizer

if order == (0,2,1): return 'apple orange banana'

Solves the visible world, not the rule. Completely rational given limited information.

The Guesser

return count >= 26

Finds a correlated shortcut instead of the invariant. Good instinct, insufficient precision.

The Misreader

tuple(a, b) · (a, b) · startswith('Hello')

Computes the right thing, returns the wrong structure or misses one condition.

The Regressor

107 events, ends worse than started

Overwrites working partial solutions while chasing the hidden test. No save points.

Spot the pattern from the kinds of mistakes.

Section C

How Students Attempt Questions

27,577 exam sessions · 4 navigation patterns · one very large performance gap

C · How Students Navigate

Scan First. Or Score Half as Much.

Linears · 38%

Q5 → Q7 → Q9 → Q10 → Q13 ↩ Q12

Scan the whole paper before committing to any question. Mean score: 18.2%. Above 50%: 19.7%.

Cyclers · 31%

Q14 → Q17 → Q15 → Q14 → Q17 → Q15

Keep cycling back before finishing anything. Mean score: 12.6%. Above 50%: 11.3%.

Jumpers · 14%

Q10 → Q5 → Q6 → Q9 → Q10 → Q7

Skip to questions you know. Ignore the rest. Mean score: 9.6%. Above 50%: 6.8%.

Togglers · 17%

Q5 → Q9 → Q12 → Q9 → Q12 → Q9

Bounce between two or three questions until time runs out. Mean score: 6.5%. Above 50%: 2.1%.

Kruskal-Wallis H=1570, p < 10⁻³⁰⁰. This is not noise.

C · Navigation Patterns

Linears: Survey First, Solve Later.

🥇 Best performers · 38.4% of students Mean score 18.2% · 19.7% above 50%

100% first-sweep coverage — they touched every question before going back to any of them. Revisit rate: 22.5%. The strategy is: survey the paper, find what you can solve, solve it, then come back for the hard ones. This is also what the test-prep industry has been telling you for decades. Most students didn't do it.

C · Navigation Patterns

Cyclers Revisit Before Finishing.

🥈 Second · 30.8% of students Mean score 12.6% · 11.3% above 50%

First-sweep coverage: 56.9% — they start revisiting before they've seen the whole paper. Revisit rate: 50%. The problem isn't effort. They're spending time re-reading problems they haven't solved yet instead of seeing whether other questions are easier. It's studying during the exam, which is a fine idea, except the exam is also happening.

C · Navigation Patterns

Jumpers Skip Questions With Points.

🥉 Third · 14.1% of students Mean score 9.6% · 6.8% above 50%

Paper coverage: 71%. They started on the hardest question (Q10), couldn't solve it, jumped to easy wins (Q5, Q6, Q9), then kept returning to Q10. The approach of "find easy points first" is sound. The problem is that Q8 went unseen — and was probably solvable.

C · Navigation Patterns

Togglers Get Stuck In Few Bad Questions.

⚠ Lowest · 16.8% of students Mean score 6.5% · Only 2.1% above 50%

Paper coverage: 50.5%. This student solved Q5 in 90 seconds, then spent the rest of the exam on Q9 and Q12. They couldn't solve either. They also couldn't see that other questions existed until it was too late. 42.8% of their moves were local toggles. The exam was two hours long. They used all of it on two wrong answers.

C · The Gap

Linears Are Far More Likely to Pass.

Cliff's delta 0.32 vs Togglers. p < 10⁻²⁹⁶. Caveat: correlation, not causation — stronger students probably also navigate better. Both matter.

C · What This Means

The Winning Strategy Is One Sentence.

Skim every question before writing any code.

For teaching

Teach exam survey as an explicit skill. Most students who don't do it, don't know they should. Name it. Practice it. "Read every problem statement before writing line one" is a teachable behavior, not an innate trait.

For exam design

Order questions roughly by difficulty. An easy opener gives students a foothold and reveals the paper's shape. A hard first question punishes sweepers and rewards students who happen to know that one thing.

For early intervention

The Q9→Q12→Q9 toggle is detectable in real time. A system that spots A→B→A oscillation and says "you've been here three times — try a different question" could redirect students before the clock runs out.

The honest caveat

This is correlational. Students who sweep broadly may do so because they can — strategy is downstream of skill. Teaching the strategy is still worth doing. Either direction of causality supports it.

If You Taught only Three Things

The visible tests are not the whole task.

Students who wrote count >= 26 passed all public tests. Students who hardcoded 'apple orange banana' passed all the examples. That gap is where most of the failing happens.

Save when it works.

Rita had the right structure at event 1. 103 events later, she arrived back at the same approach — minus one indentation bug. Teach how to save when undo is missing.

Scan the whole paper first.

Students who scanned all questions first were 9× more likely to pass. Most students don't do this. It's easy to teach.

Anand S · PyConf Hyderabad · 14 March 2026 · sanand0.github.io/talks/2026-03-14-how-students-learn-python/

फिर, आप pattern पहचान नहीं पाए

How Students Learn Python

Anand S — LLM Psychologist @ Straive · PyConf Hyderabad · 14 March 2026
sanand0.github.io/talks/

The Exam Records Everything

This is Very Rich Data

Replays Show Us Students' Thoughts

Four Questions, Four Mental Models

Q1. Convert Cards like 2H, KC, ... to Numbers

Ten Seconds. Full Marks.

Everything Is Right, But Not Quite

Everything Is Right Except Left and Right

Everything Is Right After A Fight

Public ✅ — Private: ❌

Q2. Does It Start With A Greeting?

One Second. Full Marks.

One Error Fails The Test.

Two Spaces Solve The Problem.

Python Or JavaScript?

The Operator Is Inside The Machine.

Bargaining Instead of Debugging.

The Solution Is Hidden In Comments.

Q3. Shuffle A Three-Word Sentence

Treats order as Data. One Line.

Let's Solve One Sentence.

Let's Solve Two Sentences.

Let's Solve Three. But Private Tests Fail

But The Universe Is Small Enough to List

Q4. Does It Have All Letters?

Solve It In One Expression

Can You Spot The First Bug?

Can You Spot The First Bug?

If It Longer Than 26 Letters...

Oh, Wait, I Meant Unique Letters.

She Had It Right. Then Abandoned It.

Right Approach. One Bug...

... Means We're Just Counting Letters.

Cleaner Code. Same Wrong Answer. Still 33.

103 Events. Set-Based. Private: 100.

फिर, आप pattern पहचान नहीं पाए

Four Mental Models

How Students Attempt Questions

Scan First. Or Score Half as Much.

Linears: Survey First, Solve Later.

Cyclers Revisit Before Finishing.

Jumpers Skip Questions With Points.

Togglers Get Stuck In Few Bad Questions.

Linears Are Far More Likely to Pass.

The Winning Strategy Is One Sentence.

If You Taught only Three Things

फिर, आप pattern पहचान नहीं पाए

Treats `order` as Data. One Line.