Home / Blog / Voice-controlled games are back
Voice-controlled games are quietly back, and they're better than you'd guess
The voice-controlled-game genre is older than most people remember and a lot quieter than its 2008 heyday. SingStar sold well over 20 million copies across the PS2 and PS3 lifecycles [1]. Then for about a decade it nearly vanished from mainstream gaming, kept alive by karaoke arcades and a couple of mobile apps. Around 2023 something interesting started happening on the web — and that's what this piece is really about.
The plastic-microphone era
The first wave of voice gaming was tied to plastic peripherals: SingStar mics, Rock Band mics, the Xbox Live Vision + headset, the Kinect. The technology worked but the friction was real. You needed a console, a TV, a clean line of sight, and a controller you only used for that one game. Wii Music sold roughly 3 million copies in 18 months — respectable in absolute terms, a disappointment relative to other Wii titles [2]. The genre worked, but the on-ramp was a barrier most casual players couldn't be bothered to climb.
The Web Audio API quietly fixed the on-ramp
What changed isn't a new game design. It's a stack. The Web
Audio API hit its first stable release in 2014 and
getUserMedia() (the browser permission for the
microphone) has been mainline in Chrome and Firefox since
around 2011 [3]. By 2020 the typical roundtrip latency from
microphone capture to a JavaScript callback on a mid-range
laptop was about 20–40 ms — fast enough that a game can
actually respond to your voice without the lag feeling
comical. I measured 32 ms on my own MacBook Air in our Pitch
Pong prototype using the default AudioContext buffer size.
That's the same order of magnitude as a controller button press
on a wired gamepad (about 8 ms on a good day, 50 ms on a bad
one).
The honest punchline is that the killer hardware for voice gaming was always sitting on your face. You've had a microphone since you bought the laptop. The genre needed to delete the plastic mic, not invent a better one.
What the 2023–2026 wave actually looks like
Most of the new voice-controlled games I've enjoyed share four things:
- They run in a browser tab. No app, no account, no install. You click, you grant mic permission, you play.
- They use one signal, not speech. Pitch, volume, sustained vowels. They don't try to do speech recognition, which is harder, slower, and language-locked.
- Rounds are 30 seconds to two minutes. This isn't a coincidence: the games are often played in shared spaces, and a long round in front of a partner or coworker is socially awkward.
- The game accepts that you might sound terrible. The bad-singing edge case is the point, not a bug.
Pitch Pong is the example I keep poking people about — it's two-player Pong where the position of your paddle is the pitch you're singing. You can play it with the mouth-closed hum, your normal voice, or by being absurd about it. A three-round match runs under a minute. The friction is mic permission and that's it.
Why this matters for casual play specifically
Two reasons, both observable:
First, voice input collapses the "skill barrier" that gatekeeps most multiplayer games. Aim, reflexes, knowledge of the meta — none of these matter when the controller is your throat. A 50-year-old who's never opened Steam can play competitively with their teenager within sixty seconds. That dynamic doesn't exist in nearly any other multiplayer genre.
Second, voice input fights phone-pose passivity. The Ofcom 2023 media-use report flagged that UK adults averaged just over 4 hours of daily passive screen time on phones [4]. Most casual games slot neatly into that posture — one thumb, head down. Voice games break it. You sit up, you breathe, you make a noise that ends in laughter. That's a small thing but the posture difference is real and I notice it on my own face after a session.
What the genre still needs
A few things, honestly:
- A better fingerprint for "you". Voices change with the weather. A model that can re-calibrate in two seconds instead of asking the player to "sing this note now" would help.
- A graceful fallback for shared rooms. Some people will not, ever, make noise in front of a coworker. The current answer is "go play in another tab." A whisper mode or hum-only mode is doable; nobody's nailed it yet.
- Latency budgets on cheap Android. The 32 ms I measured on my MacBook gets uglier on a low-end Android phone with Bluetooth earbuds. Bluetooth audio added 40–80 ms in my tests, which is enough to make a fast paddle feel "stuck."
If you want to try one
The shortlist I'd actually click into right now: Pitch Pong, some browser implementations of Yodel-style note-matching games, and the better web-based karaoke trainers (look for ones that don't ask you to log in). Mic permission and a quiet minute — that's the whole investment.
Omoggle's mog battles aren't voice-controlled, but our companion title Pitch Pong is — sing to move the paddle. Two minutes.
See the Pitch Pong card →Sources & references
- Sony Computer Entertainment Europe. SingStar lifetime sales, cumulative PS2 + PS3 figures reported through 2014. Press archive.
- Nintendo. Financial Highlights, FY 2010. Wii Music software shipment data.
- W3C Web Audio Working Group. Web Audio API. Stable since 2014; MediaStream / getUserMedia history.
- Ofcom. Online Nation 2023 — UK adult media use, daily screen-time aggregates.
- Mozilla Developer Network. Web Audio API best practices and latency notes.
- Author's own latency log, AudioContext default buffer, MacBook Air M2, May 2026.
Read next
Reviewed by: Mira Tanaka, Software Engineer · Omoggle Game · Last reviewed: Jun 15, 2026