Gemini 3.1 Flash-Lite Offers Choice on How It Processes Inputs

Share This Post

Enterprise developers can now choose the level of thinking they need for a specific task with Google’s newly released Gemini 3.1 Flash-Lite, it’s latest reasoning model. 

On Tuesday, the cloud provider launched Gemini 3.1 Flash-Lite in preview, calling it the fastest and most cost-efficient Gemini 3 series model, built for high developer workloads. 

With the model, enterprises can choose the different depths of thinking — minimal, low, medium or high — needed, depending on the task that is being performed. The model runs in AI Studio, a platform developers can use to build, test and deploy applications using Gemini models, as well as in Vertex AI, a platform for building and scaling machine learning models. The model is suitable for high-volume translation, content moderation and complex workloads, such as generating user interfaces and dashboards, following instructions and creating simulations, according to Google. 

Related:OpenAI Unveils $110B in Funding, Expands AWS Partnership

With its new model, Google aims to target a challenge many enterprise developers face when working with reasoning models. Often, thinking models take time perform a task, which can be costly and wasteful if the developer does not need an in-depth level of analysis for a particular task. By enabling enterprise developers to choose the level of thinking, Google is also helping enterprises develop multi-purposed agents. 

“This is an ideal model for agents,” said Mark Beccue, an analyst at Omdia, a division of Informa TechTarget.  

He added that while other AI model providers are focused on reasoning models and agents, Google’s strategy tends to be enterprise driven. In this case, Google’s approach is focused on reducing the number of tokens that a model uses, which gives businesses a performant but cheaper model option.  

“If you can make something two-and-a-half times faster and cut the price almost in half, that’s a huge game,” said Bradley Shimmin, an analyst at Futurum Group.  

He added that enterprise developers are also beginning to distribute tasks across multiple models rather than rely on one. For example, a developer building AI agents might need Gemini 3.1 Pro for planning and building, whereas 3.1 Flash-Lite could be used for basic documentation or code generation. 

“It’s not a game of overwhelming with greatness,” Shimmin said. “It’s a game of optimizing with greatness.” 

Developers have started to realize with other model releases, such as Qwen 3.5-9B from earlier this week, that it might be better to turn off the ability for the model to process its task because having it on slows down the model and limits optimization.  

Related:Google Releases Nano Banana 2 With Added AI Features

“As you get more into more complex interactive sessions or longer context windows, you’re sometimes not better off with the extra thinking time,” Shimmin said. 

Gemini 3.1 Flash-Lite is an example of how models continually evolve and grow in the AI market, Beccue said. 

“We’re moving at a rapid pace to bring in models that will be more efficient, better,” he said. “They’re getting better and more efficient.” 

Google said its new model costs $ 0.25 per one million input tokens and $1.50 per one million output tokens. 

Related Posts

Arthur Hayes Confirmed As A Bitcoin 2026 Speaker

Arthur Hayes, one of the most provocative and incisive...

X Warns Against Creator Payouts Over Undisclosed AI War Videos

In brief X's product head, Nikita Bier, said creators posting...

Ethereum Exodus Continues: Supply On Crypto Exchanges Dries Up To Years-Long Low

Trusted Editorial content, reviewed by leading industry experts and...

Former LAPD Officer Convicted In $350,000 Bitcoin Kidnapping And Home Invasion

A Los Angeles County jury has found former Los...

Indiana Governor Signs Bill Allowing Bitcoin In State Retirement Plans

Indiana Gov. Mike Braun has signed legislation allowing bitcoin...