India’s artificial intelligence ecosystem is entering a decisive new phase. After years of experimentation, scattered pilots, and reliance on foreign foundation models, the country is now witnessing the rise of sovereign, multilingual Large Language Models (LLMs) built specifically for Indian datasets, regulatory frameworks, and cultural contexts. From government-backed initiatives to ambitious private startups, India is laying the foundation for a self-reliant future in generative AI (GenAI).
Here’s a deep dive into the leading indigenous LLM initiatives shaping India’s AI landscape.
BharatGen and Param 2: India’s Sovereign Foundation Model
BharatGen represents India’s flagship sovereign AI effort. Launched as an IIT Bombay consortium initiative in 2025, BharatGen is designed to create multilingual AI systems rooted in Indian linguistic and legal realities.
Its latest release, Param 2, is a 17-billion-parameter multilingual foundation model built using a Mixture of Experts (MoE) architecture. It supports 22 Indian languages, aiming to bridge accessibility gaps in governance, education, and public services.
Unlike imported AI systems trained largely on Western datasets, Param 2 is aligned with Indian regulatory frameworks and socio-cultural nuances. For citizens, this could translate into:
>Better AI-powered government services
>More accurate legal and policy interpretations
>AI tools in regional languages
>Improved digital inclusion
BharatGen operates within India’s broader national mission to reduce dependency on foreign AI models and cloud infrastructure, reinforcing the push for technological sovereignty.
Sarvam AI: Full-Stack AI “That Gets India”
Among private players, Sarvam AI has emerged as a highly visible force in India’s AI narrative. Branding itself as building “AI that gets India,” the company is developing both foundational models and application-ready systems.
Sarvam Vision
Sarvam Vision is a 3-billion-parameter vision-language model optimized for Indian documents. It focuses on OCR and layout intelligence across mixed scripts, handwritten forms, and complex document structures in 22 Indian languages plus English.
The model reportedly achieved:
>84.3% accuracy on olmOCR-Bench
>93.28% accuracy on OmniDocBench v1.5
These benchmarks suggest competitive performance against global document AI systems, especially in multi-script and real-world Indian layouts involving tables, forms, and formulas.
Bulbul V3
Bulbul V3 is Sarvam’s advanced text-to-speech (TTS) system. Built on an underlying language model trained on Indian datasets, it supports 11 Indian languages and English with an Indian accent.
Key capabilities include:
>30+ speaker voices
>Control over tone, pace, and expressivity
>Multiple output formats (MP3, WAV, AAC, OPUS, FLAC)
>Sample rates from 8 kHz to 48 kHz
Such flexibility makes it viable for telephony voice agents as well as high-fidelity enterprise applications.
Sarvam Samvaad & Pravaah
Sarvam also unveiled Samvaad, a conversational AI agent program offering services in 22 Indian languages. The system is described as deeply embedded within enterprise workflows and designed with “Indian unit economics.”
Additionally, the company introduced Pravaah, positioned as an AI token factory for India, aimed at infrastructure-level optimization for large-scale deployments.
Sarvam is partnering with global technology players, including Bosch Technology, Qualcomm, and HMD, to scale its AI stack. With products like Sarvam Kaze AI smart glasses expected in 2026, the company is clearly expanding beyond pure software.
Gnani.ai and Inya VoiceOS: Native Voice Intelligence
Voice AI is another crucial pillar of India’s AI story. Gnani.ai introduced Inya VoiceOS, a 5-billion-parameter voice-to-voice foundational AI model.
Unlike traditional systems that convert speech to text and then back to speech, Inya VoiceOS processes audio natively in acoustic and semantic space. This reduces latency and preserves natural prosody.
Highlights include:
>Trained on 14+ million hours of multilingual speech
>Fine-tuned with 1.2 million hours of task-specific audio
>Backed by 8 trillion text tokens
>Supports 15+ Indian languages
>Sub-second latency
It is designed for real-time applications such as:
>Government helplines
>Healthcare voice assistants
>Banking and logistics workflows
A larger 14-billion-parameter version is reportedly in development.
Soket AI and Open Research Models
Soket AI is pursuing a more transparent and open approach.
Its Pragna 1B model contains 1.25 billion parameters with a 2048-token context window. Built on Meta’s Llama-2 tokenizer, it incorporates six Indian languages and is trained on 6.3 million English Wikipedia articles alongside Indic datasets.
Soket’s broader Project EKA aims to develop 100+ billion-parameter foundation models trained on 2 trillion tokens curated for Indian contexts. Open releases allow researchers and engineers to fine-tune for regional and domain-specific applications.
Dhi-5B: Grassroots Innovation
A striking example of frugal innovation is Dhi-5B, reportedly developed by a researcher from IIT Guwahati on a modest Rs 1 lakh budget. The 5-billion-parameter multimodal model underwent pre-training, supervised fine-tuning, context extension, and vision integration.
Its emergence highlights how India’s AI ecosystem is not limited to large corporate labs but is also seeing contributions from academic and independent innovators.
Greater Implication: Sovereign AI and Vertical Specialization
Beyond these models, companies such as Gan.AI, Avataar, GenLoop, ZenteiQ, IntelliHealth, Shodh AI, and Tech Mahindra’s Maker’s Lab are building verticalised AI solutions across media, healthcare, infrastructure, and enterprise R&D.
The overarching trend is clear:
>Foundation models are becoming commoditised.
>Differentiation lies in localisation and domain depth.
>Investors increasingly demand viable business models, not just benchmark scores.
India’s AI journey is likely to evolve incrementally rather than explosively. But the direction is unmistakable. With sovereign models trained on Indian languages, datasets, and policy frameworks, the country is steadily building AI systems capable of solving local challenges at scale.
As the ecosystem matures, India may not just consume AI innovation—it could define how multilingual, culturally contextual AI is built for the Global South.
