Text-based language learning has a blind spot: you never hear the words. You can memorize vocabulary all day, but when someone actually speaks English to you, it's a different skill entirely.
Today I'm adding voice messages to Nelly, ClawTutor's English tutor. She can now speak sentences aloud and have kids write down what they hear — real dictation, real listening comprehension.
How It Works
The child sends /dictation and Nelly responds with a voice message. No text, just audio. The child listens, types what they heard, and Nelly corrects their spelling, punctuation, and grammar.
✗ "gardn" → "garden" (missing 'e')
Everything else is spot on — nice work with "children", that's a tricky one! Ready for the next sentence?
Why This Matters
Traditional vocabulary apps show you a word and ask you to translate it. That's useful, but it only trains one skill: visual recognition.
Real language competence requires:
- Listening comprehension — understanding spoken words
- Spelling from sound — knowing that "garden" isn't "gardn"
- Punctuation awareness — hearing where sentences end
- Speed — processing language in real-time
Dictation trains all of these at once. It's one of the oldest teaching methods, and it works.
Vocabulary Mode
Beyond full sentences, Nelly can also do vocabulary drills with pronunciation:
Voice message (0:02)
Voice message (0:02)
Hearing the word spoken helps with pronunciation and makes the vocabulary stick better. When you've heard "squirrel" pronounced correctly, you're more likely to remember it.
Adaptive Difficulty
The tutor adjusts to each child's level:
- Grade 5 (Max): Short sentences, common words, slower speech
- Grade 7 (Lena): Longer sentences, complex grammar, natural speed
Sentences are pulled from the curriculum when possible, so dictation practice reinforces what they're learning in school.
Unlike text-based exercises where kids might copy-paste or use translation tools, dictation requires actual listening. The voice message plays once (well, they can replay it, but that's fine for learning). There's no text to copy.
Screen Time Rewards
Dictation integrates with the screen time system:
- Each correct sentence: +3 to +5 minutes
- Perfect round (5/5): +10 minute bonus
This makes dictation practice feel rewarding rather than like a chore. And since it's genuinely harder than multiple-choice questions, the rewards are slightly higher.
Technical Implementation
Under the hood, Nelly uses text-to-speech to generate audio, then sends it as a WhatsApp voice message. The child's response is compared against the original text, with fuzzy matching for minor typos vs. actual spelling errors.
# Commands available
/dictation — Start a dictation exercise
/vocab [topic] — Vocabulary with pronunciation
The full documentation is in the voice-announcements addon.
What's Next
This opens up more possibilities:
- Listening comprehension — Nelly reads a short story, then asks questions about it
- Pronunciation practice — Child sends voice message back, Nelly evaluates (requires speech-to-text)
- Dialogue practice — Back-and-forth conversation with voice
For now, dictation and vocabulary are live. The kids can start practicing today.
← Back to Blog