Enterprise developers can now choose the level of thinking they need for a specific task with Google’s newly released Gemini 3.1 Flash-Lite, it’s latest reasoning model.
On Tuesday, the cloud provider launched Gemini 3.1 Flash-Lite in preview, calling it the fastest and most cost-efficient Gemini 3 series model, built for high developer workloads.
With the model, enterprises can choose the different depths of thinking — minimal, low, medium or high — needed, depending on the task that is being performed. The model runs in AI Studio, a platform developers can use to build, test and deploy applications using Gemini models, as well as in Vertex AI, a platform for building and scaling machine learning models. The model is suitable for high-volume translation, content moderation and complex workloads, such as generating user interfaces and dashboards, following instructions and creating simulations, according to Google.
With its new model, Google aims to target a challenge many enterprise developers face when working with reasoning models. Often, thinking models take time perform a task, which can be costly and wasteful if the developer does not need an in-depth level of analysis for a particular task. By enabling enterprise developers to choose the level of thinking, Google is also helping enterprises develop multi-purposed agents.
“This is an ideal model for agents,” said Mark Beccue, an analyst at Omdia, a division of Informa TechTarget.
He added that while other AI model providers are focused on reasoning models and agents, Google’s strategy tends to be enterprise driven. In this case, Google’s approach is focused on reducing the number of tokens that a model uses, which gives businesses a performant but cheaper model option.
“If you can make something two-and-a-half times faster and cut the price almost in half, that’s a huge game,” said Bradley Shimmin, an analyst at Futurum Group.
He added that enterprise developers are also beginning to distribute tasks across multiple models rather than rely on one. For example, a developer building AI agents might need Gemini 3.1 Pro for planning and building, whereas 3.1 Flash-Lite could be used for basic documentation or code generation.
“It’s not a game of overwhelming with greatness,” Shimmin said. “It’s a game of optimizing with greatness.”
Developers have started to realize with other model releases, such as Qwen 3.5-9B from earlier this week, that it might be better to turn off the ability for the model to process its task because having it on slows down the model and limits optimization.
“As you get more into more complex interactive sessions or longer context windows, you’re sometimes not better off with the extra thinking time,” Shimmin said.
Gemini 3.1 Flash-Lite is an example of how models continually evolve and grow in the AI market, Beccue said.
“We’re moving at a rapid pace to bring in models that will be more efficient, better,” he said. “They’re getting better and more efficient.”
Google said its new model costs $ 0.25 per one million input tokens and $1.50 per one million output tokens.

