Blog Layout

How sustainable are Generative AI models?

Nina Habicht • Jul 31, 2023

This blog post outlines why large language models, thus generative AI models need so much power to be trained, deployed and maintained. We also look why some solutions are more sustainable compared to others.


What impacts the carbon footprint of large language models?


There are three main values that impact the carbon footprint of llms like GPT-4:


1) The footprint of the training model

2) The footprint from inference. "Inference" in large language models is referring to the capability of llms predicting outcomes using new input data. The models essentially use past knowledge to make educated guesses about the meaning of new sentences or to predict what comes next in a conversation. It's like connecting the dots using what they've learned from reading lots of text.

3) The footprint needed to produce all the required hardware and capabilities of the cloud data center.


Most energy-intensive is the trainint part of such models. Importantly, larger models do use more energy during their deployment. However, this study shows that inference is also a major consumer of energy, with up to 90% of the ML workloads is due to inference processing.


How high are the energy-costs of a large language models used in generative AI?


We summarized some important facts for you to shed lights on the environmental impact of large language models:


  • The Megatron Turing model from NVIDIA needed hundreds of NVIDIA DGX A100 multi-GPU servers, each using up to ~ 6.5 kilowatts of power (1).
  • To train one BERT model (LLM by Google) is roughly the same amount of energy and carbon footprint as a trans-Atlantic flight (2).
  • Researchers from the paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜" explain that training models such as OpenAI's GPT-4 or PaML from Google uses 300 tons of CO2 (= one average person uses 5 tons a year) as outlined by HBR.


What is a sustainable large language model strategy?


If possible, use existing, pre-trained models from llm providers such as Microsoft OpenAI, Google (bard, PaLM), Meta AI (LlaMA) and do not create your own generative models (sidemark: this also costs around $ 100 Mio. according to a recently published article on HuggingFace. You should - instead - still fine-tune a model for your use case.


What are Best-practices to build greener llms?


  • Besides pre-trained model usage, use small models (with less parameters) by removing unnecessary parameters (so-called "pruning") without jeopardizing the accuracy you need (e.g. your model should still answer with a good confidence)
  • Reduce training time by experimenting with a distilled version of BERT, DistilBERT. "Distilled models are also been shown to be more energy efficient" as outlined in sustainable AI in the cloud.
  • Fine-tune your model with own data for some period instead of training it from scratch
  • Use cloud-based environments which provide scalable infrastructure (Azure, AWS, Google)
  • There is specialized hardware that support training speed (GraphcoreHabana) and inference (Google TPUAWS Inferentia)
  • Other technical tricks: merge model layers (so-called "fusion"), store model parameters in smaller values (say, 8 bits instead of 32 bits, so-called "quantization")
  • Run ML models on small, low-powered edge devices without need to send the data to the server to process). Use for example  TinyML.
  • Monitor carbon footprint via tooling such as  CodeCarbonGreen algorithms, and ML CO2 Impact.
  • Encourage your data science team to set benchmarking standards and include sustainability in their model considerations
  • Finally, think if you really need Generative AI at all. If you need help we can support you by evaluating your use case.



Summary about sustainable generative AI


Please find a comparison between llms and their sizes in the previous blog. Additional material can be found here:



Responsible AI is a crucial foundation for all generative AI products. It is important that we consider carbon footprint with AI models to protect our earth and future.


Need support with your Generative Ai Strategy and Implementation?

🚀 AI Strategy, business and tech support 

🚀 ChatGPT, Generative AI & Conversational AI (Chatbot)

🚀 Support with AI product development

🚀 AI Tools and Automation

Get in touch
How to strategically use GPTs from OpenAI
By Nina Habicht 03 May, 2024
This blog explains how gpts can be used as a part of your Generative AI journey and exploration towards your Ai strategy.
Why implementing ai tools is not an ai strategy
By Nina Habicht 03 May, 2024
This post explains why implementing ai tools without any strategy and business view can be detrimental and lead to not successful ai projects.
Generative AI in 2024, Investment areas in 2024
By Nina Habicht 01 Jan, 2024
This post is abou the major generative AI trends and investment areas in 2024
How schools and universities can use Generative AI
By Nina Habicht 29 Dec, 2023
universities and schools need to change learining approach due to generative AI. How schools and universities can use Generative AI
Supports with the definition of GPTs, alternatives and options to build own chatbots or assistant
By Nina Habicht 25 Dec, 2023
A comprehensive Guide to Alternatives of GPTs and Assistant API from OpenAI
By Nina Habicht 26 Nov, 2023
Many companies are reluctant when implementing llm-based products because they fear bein confronted with high costs. Especially for medium-sized companies which have not the ressouces or enough capacity to deploy and oprimize their AI models nor to set up an own infrastructure with MLOps. As described in our article about sustainability of Gen. AI applications , cloud and performance costs of running an llm can become very high. What are the cost types when implementing OpenAI or other llms? T here are four types of costs related to llms: Inference Costs Setup and Maintenance Costs Costs depending on the Use Case Other Costs related to Generative AI products What are inference costs? An llm has been trained on a huge library of books, articles, and websites. Now, when you ask it something, it uses all that knowledge to make its best guess or create something new that fits what you asked for. That process of coming up with answers or creating new text based on what it has learned is called inference in LLMs . Usually, developers would call a large language model like GPT-4. But here comes the "but": usually not only large language models account to the total costs when running the final product. To explain: LLMs can be used to classify data (e.g undestand that the text talks about "searching a new car insurance"), for summarization, for translation and for many other tasks. Download the ultimative Gen. AI Task Overview to learn where llms make sense.
Checklist to implement Generative AI in your company
By Nina Habicht 24 Nov, 2023
this article helps companies like enterprises and sme to successfully implement generative AI by providing best-in-breed frameworks.
By Nina Habicht 01 Nov, 2023
In this blog you will learn about the alternatives to ChatGPT and OpenAI. Where is Bard better than ChatGPT? Bard is the response to OpenAI's ChatGPT. What makes Bard so different to OpenAI? It is free! So you can try it out here whereas ChatGPT costs $20 per month. Another advantage is the microphone on the desktop version to directly speak in your question and get a response. Bard has internet access whereas ChatGPT you need to jump from one service (Web Browsing) to the other Bard covers far more languages (265 as of October 2023) Some drawbacks: it is not able to generate pictures. With ChatGPT DALL E-3 you can generate pictures. Bard only offers you a nice description. Where is Claude better than ChatGPT? Claude is the version of ChatGPT developed by the company Anthropic. This tool is currently accessible only in the UK and US, and not yet available in Switzerland. You might consider using Nord VPN to explore its functionality in your country. Claude has one big advantage to ChatGPT: It can process more "context" ( Generative AI from A to Z ), meaning the input token (100 token equals around 75 words) can be up to 100'000 tokens (75'000 words!). GPT-3 has a limit of 4096 tokens (3072 words) and GPT-4 of 8192 tokens (= 6000 words). So when you want to upload huge files, use Claude.
By Nina Habicht 30 Sep, 2023
In this blog you will learn the slice and dice with Generative AI when it comes to the analysis of your PDFs, excels, CSVs, and more. Learn the first steps on how you can visualize even data with advanced prompt engineering. This article is very useful for analysts, reporting specialists, controllers, and marketers who have to generate reportings and summaries on a regular basis and want to do it more efficiently. Join one of my courses in Zurich to get 1:1 support or send my team a message. What are important analytics use cases with Generative AI? Generative AI can be used to detect patterns and provide ideas when it comes to data visualization. To name some important use cases: Ask and summarize pdfs Analyze sheets with numbers (Web, Sales, News) Extract PDFs, transform them, and query for specific data Generate reports (e.g. make your controlling analysis, generate excel charts) Generate social media posts Generate ideas on how to analyze and visualize data with Generative AI  What is ChatGPT Advanced Data Analysis? This tool was formerly known as "Code Interpreter". Now, it comes with the brand new name "Advanced Data Analysis" but still many people do not know its power and capabilities. So continue reading ... Where can I find ChatGPT Advanced Data Analysis? 1) If you cannot see this option, go to "settings" (left corner of the main dashboard) and activate all Beta release options 2) The availability sometimes is depending on the device and operating system you are working on (iOS, Tablets) 3) Contact the OpenAI support in case you cannot see the option.
By Nina Habicht 17 Sep, 2023
With the advances in Generative AI companies should consider how to be "on top" when it comes to new technologies such as ChatGPT and Generative AI assistants. While Google SEO was one of the main drivers before the launch of ChatGPT by OpenAI, today companies should optimize towards the next-generation of search engines based on large language models and advanced search models. Why are Generative AI searches so intelligent? Intelligent searches basically understand the semantics in a sentence. So old index-based searches used to be sufficient for keyword entries but did often fall short when users did not enter exact the correct keywords or did spelling errors. Also AI based searches with older NLP classification models before the launch of OpenAI were handling some of these challenges but did never achieve the level of intelligence of so-called embeddings. With new vectorbased embeddings ( which are also available by OpenAI ) companies can build intelligent searches that can give to their users the best solution across all their websites. What are embeddings? Basically you can think of a vector represenation. A user question is translated into a vector representation and this vector is used to query a vector database. Think about when you play with your toys. You might group them together based on what they are or how you use them. You might put all your cars together because they all have wheels and can move around. Then you might put all your action figures together because you can play make-believe with them. And your board games would be in another group because you play them on a table. This is similar to how vector embeddings work in computers. But instead of toys, we have words. Just like how you group your toys, a computer groups words that are similar. For example, words like "cat", "dog", and "hamster" could be in one group because they are all pets. But how does a computer know which words are similar? It learns from reading a lot, like how you learn from playing and studying. If the computer sees the word "dog" being used in similar places as the word "cat", it will think these words are related and put them close together in its group. So, in the end, vector embeddings are like a big, organized toy box for a computer, but with words instead of toys. Just like how you can more easily pick a toy to play with when your toy box is organized, a computer can more easily understand and use words when they are nicely grouped by vector embeddings. How can companies create their own Generative AI search? OpenAI did not only change the way companies could search with ChatGPT but it also changed the way companies can create their own page search. There are several paid and open-source models (embeddings from HuggingFace or OpenAI and Meta) to create your own intelligent search. If you need support with the development of your own search contact our team . This graph from TheAiEdge.io nicely illustrates how embeddings work:
Show More
Share by: