A new open source AI model trained on the languages and cultures of Latin America has been introduced by the Andean nation of Chile.
More than two years in the making, Latam-GPT was developed by scientists, researchers and professionals from more than 60 institutions across 15 different Latin American and Caribbean countries under a program coordinated by Chile’s National Center of Artificial Intelligence, CENIA.
Also participating in the effort were the Chilean Ministry of Science, Technology, Knowledge and Innovation, AWS and the Development Bank of Latin America and the Caribbean.
The model was built with language, data and context specific to Latin America and the Caribbean, amid growing unease globally about the current dominance of the AI sector by big U.S. tech vendors and the fast-developing sovereign AI movement.
“Unlike models trained primarily with information in English and cultural frameworks from the global north, Latam-GPT understands the cultural and linguistic nuances, as well as the historical and political contexts of Latin America,” according to a CENIA release. CENIA launched the model at an event in Santiago on Feb. 10 at which CENIA director Alvaro Soto said Latam-GPT enables Latin America “to join the AI revolution as a key player”.
He was backed up by the country’s science minister, Aldo Valle, who added: “This project stems from the conviction that regional integration is the only realistic path to achieving technological sovereignty with a democratic purpose.” Also in attendance was Chile’s president, Gabriel Boric, who welcomed the model’s release with a post on X.
The need for a technology such as Latam-GPT appears evident, given that research has shown that data in Spanish, the language used by most of Latin America, has until now accounted for only about 4% of the data used to train language models. Portuguese, Brazil’s native tongue, has made up as little as 2% of training data.
Spanish and Portuguese were used for the main content of LatamGPT, and the project aims to include indigenous languages as well.
The model was developed on a base architecture of Meta’s Llama 3.1 open model, with 70 billion parameters, and trained on officially approved texts obtained with permission.
In total, more than 300 billion plain-text tokens — equivalent to around 230 billion words — were collected under license and curated to provide what is claimed to be a “high quality dataset,” according to CENIA.
However, LatamGPT’s potential to make inroads into the AI market dominated by a few U.S. and Chinese companies could be limited. It was developed on a budget of only $550,000, the AP reported.
However, Latam-GPT’s availability on Hugging Face and GitHub indicates that some see it as useful foundational infrastructure for those looking to develop future regional applications.

