The Inflection Moment
285 million vernacular internet users exist in India today. Most use phones with ≤2GB RAM. Your architecture for Mumbai doesn't scale to Meerut. Team structure must reflect this reality, not San Francisco org charts.
Voice tech is uniquely local. A Hindi speaker evaluates your UX differently than an engineer in Bangalore. By the time you're at 50 people, your Tier-2 user feedback gets filtered through three layers. It arrives distorted.
The Span of Control Problem
At 20 people, your founder manages product, growth, and part of engineering. This works because decisions are synchronous. Everyone sits in one room or one Slack thread.
At 50 people, that model collapses. You need a VP of Product, VP of Eng, and a Growth Lead. But spans matter more than titles.
Keep engineering pods to 5-7 people maximum. One senior engineer leads acoustics. Another leads server-side ranking. Another owns mobile SDK. Each reports to a tech lead, not directly to your VP Eng. This sounds hierarchical. It isn't. It's clarity.
A VP with 15 direct reports becomes a bottleneck within six months. Your acoustic engineer waits two weeks for architecture guidance. Meanwhile, a competitor ships a vernacular ASR model trained on regional accents.
The VP Eng Inflection
You'll hire a VP Eng between 30-40 people. This hire defines your next 3-4 years.
Wrong hire: Someone from a consumer app who optimizes for velocity. They flatten structure, move fast, break things. Voice tech breaks differently. An ASR model trained on bad data breaks quietly. Inference latency creeps up 40ms at a time. These aren't visible in sprint reviews.
Right hire: Someone with systems thinking. Telecom background, ML infra experience, or voice-adjacent (think Jio's infrastructure builders). They ask uncomfortable questions: "What's our inference SLA?" "How do we version acoustic models?" "What's our fallback when the API times out in Punjab?"
Your VP Eng will either embed data rigor into culture or enable cowboy engineering. There's no middle ground at this stage.
Culture Drift Is Real
Voice products require patience. Your Tier-2 user has a 4G connection that drops three times per call. Optimization feels pointless to a new hire from IIT who last used 5G in Hyderabad.
By the time you're 80 people, half your team has never shipped a voice product. They optimize for metrics they understand: DAU, sessions, engagement. These are wrong proxies for voice.
Your retention at day-7 might be 25%. Your retention at day-30 might be 40%. That gap isn't product. It's users learning your app works when connection is stable.
You need ritualized founder presence in product calls. Not as oversight. As a grounding force. The founder asks: "Would a Lucknow user understand this error message?" This question is not scalable. Embed it anyway.
Practical Structure at 50-70 People
Think in pods, not departments:
Core Platform Pod: 5 people. ASR, NLP, ranking. Owned by your best engineer.
Mobile SDK Pod: 4 people. Integration, offline sync, battery optimization. Owned by someone who's shipped voice apps.
Backend Reliability Pod: 4 people. Inference serving, fallbacks, monitoring. Not exciting. Essential.
Product & Analytics: 3-4 people. One person owns Tier-2 user research (travels quarterly). One owns metrics rigor.
Growth: 4-6 people. But 50% of their time supports product discovery through voice usage data, not paid channels.
Notice: No department has more than 7 people reporting to one person. Each pod has a clear decision-maker.
The Founder's Remaining Job
Delegate product, not vernacular voice design.
Your VP Product can own roadmap. You own user research methodology. You decide how feedback from Indore changes a model trained on Mumbai data.
This seems inefficient. It's actually the constraint that keeps you honest.
The Hard Implication
If you want to scale voice tech to 10 million users, accept that your org chart at 70 people will look different from a FinTech or B2B SaaS at the same scale. Your VP Eng won't manage 15 people. Your product lead will spend 20% of time on decisions that don't fit a spreadsheet.
This is not dysfunction. This is structure aligned to the problem.
Hire for it explicitly.