Blog Layout

What are the costs for enterprises to use llms?

Nina Habicht • Nov 26, 2023

Many companies are reluctant when implementing llm-based products because they fear bein confronted with high costs. Especially for medium-sized companies which have not the ressouces or enough capacity to deploy and oprimize their AI models nor to set up an own infrastructure with MLOps. As described in our article about sustainability of Gen. AI applications, cloud and performance costs of running an llm can become very high.


What are the cost types when implementing OpenAI or other llms?


There are four types of costs related to llms:


  1. Inference Costs
  2. Setup and Maintenance Costs
  3. Costs depending on the Use Case
  4. Other Costs related to Generative AI products



What are inference costs?


An llm has been trained on a huge library of books, articles, and websites. Now, when you ask it something, it uses all that knowledge to make its best guess or create something new that fits what you asked for. That process of coming up with answers or creating new text based on what it has learned is called inference in LLMs.


Usually, developers would call a large language model like GPT-4. But here comes the "but": usually not only large language models account to the total costs when running the final product. To explain: LLMs can be used to classify data (e.g undestand that the text talks about "searching a new car insurance"), for summarization, for translation and for many other tasks. Download the ultimative Gen. AI Task Overview to learn where llms make sense.


Download Overview

What are the cost factors affecting the LLM costs?


Factors affecting LLM cost include model size (e.g. how many parameters it uses), with larger models requiring more resources, and context length (how much data, respectively text you provide to the model when you request your question), impacting computational demands​​. Keep in mind that larger models are not always better.


  • Enterprises face a range of choices between proprietary models like OpenAI's and open-source models like Llama 2 and Falcon 180-B​​. The selection of an LLM ifself depends on specific use cases and a balance between cost and performance​​.


Which Generative AI use cases cost more?


As outlined in our "Tasks where Generative AI helps" framework we show different use cases where generative AI makes sense.


  • Cost vary especially with the use case. For tasks like "summarizing" GPT-4 is often preferred in researches but comes with high costs at OpenAI.
  • Usually chatbots are more complex and cost-intensive than other Gen. AI products, especially when they do offer more than a "one-shot" response and can "talk for a while" about a topic and hold memory.
  • If your information has to be up-to-date at any time to deliver sufficient user-experience and avoid image issues (e.g. healthcare, insurance, fintech sectors), Retrieval Augmented Generation (RAG, see below for more information) costs will be coming on top. These RAG caused daily query costs vary between OpenAI and other models, reflecting different pricing structures​​.
  • Finally, "fine-tuning" is the most expensive process which means you change or customize at least one parameter in your llm neural network. This may change how the model learns on knowledge or its reasoning capabilities. "This leads to $100,000 for smaller models between one to five billion parameters, and millions of dollars for larger models" as outlined in AI Business article. Thus, finetuning comes only into play for large Enterprises or startup that finetune their own large language model or domain (e.g. BloomGPT). In this case, open-source models may lead to less costs in the long-term than OpenAI model finetuning. Keep in mind that people sometimes also refer to "finetuning" related to the optimization of a model concerning its purpose or domain and do not mean refer to any "change the model parameters". The latter is not as expensive as "real finetuning".


How much does a OpenAI model cost per month?


The monthly costs of the model really depend on the model you use, e.g. whether you use GPT-3.5 with 4K context (see below in the table), GPT3.4 with 16K context, GPT-4 with 8K context or GPT-4 with 32K context or even the new "Turbo versions", and on the usage (traffic) of your Generative AI product.


The yearly costs range from $1K-50K on the low usage end, depending on which model. Or from $1M-56M a year for high usage as outlined in the calculation below. Studies show that for low usage OpenAI model make sense whereas for high usage and traffic cases with many millions of users, open-source models can safe costs. 


Source: Differences between OpenAI' models. GPT4  is  as OpenAI describes "10 times more advanced than its predecessor, GPT-3.5. This enhancement enables the model to better understand the context and distinguish nuances, resulting in more accurate and coherent responses." 

Source: Costs are depending on the GPT model and context as well usage requests. For high traffic applications the authors recommend to use own open-source models instead GPT, respectively OpenAI. Find more calculations here.

What are the maintenance costs related to llms?


To set up the infrastructure for llms you need to consider several points:


First, you need to set up open-source models or proprietary models like GPT on a cloud environment. It is also possible to set up the infrastructure on-premises (where you run your models on private servers) but usually this requires costly GPUs and hardware from providers like NVIDIA. As Skanda Vivek outlines this requires a A10 or A100 NVIDIA GPU. A10 (with 24GB GPU memory) costs $3k, whereas the A100 (40 GB memory) costs $10–20k with the current market shortage in place.


  1. Compute: FLOPs (floating point operations per second) needed to complete a task. This depends on the number of parameters and data size. Larger models are not always better in performance.
  2. Data Center Location: Depending on the local data center, energy efficiency and CO2 emission may differ. 
  3. Processors: Computing processors (CPUs, GPUs, TPUs) vary in energy efficiencies for specific tasks



Source: Open Source can become more expensive than GPT-3.5, especially due to complexity in maintenance, prompt engineering and extensive data science knowledge required for non-proprietary models like Llama 2.

What are other costs related to LLMs?


Very often embeddings (vectorization of your text corpus) and RAG (Retrieval Augmentation Generation) are needed besides mere GPT or llm models.


Why? Traditional language models can sometimes make errors or give generic answers because they're only using the information they were trained on. With RAG, the model can access up-to-date and specific information, leading to better, more informed answers. Example: Let's say you ask, "What's the latest research on climate change?" A RAG model first finds recent scientific articles or papers about climate change. Then, it uses that information to generate a summary or answer that reflects the latest findings, rather than just giving a general answer about climate change. This makes it much more useful for questions where current, detailed information is important. For Enterprises, up-to-date content is key, thus there must be some mechanism to retrieve best content continuously.


What is better? Open-Source LLM vs. Proprietary Models, e.g. OpenAI?


 This is difficult to say. It depends on your requirements. Please find more in this useful article.

"ChatGPT is more cost-effective than utilizing open-source LLMs deployed to AWS when the number of requests is not high and remains below the break-even point." as cited from the Medium article.

Key Learnings when it comes to the costs for Generative AI products

The costs related llms vary from your use case (chatbot, analysis, voicebot, FAQ bot, summarization, etc.), traffic exposure, performance and set up (on-premises, cloud) requirements.


  • Open-Source is often more flexible than OpenAI when implementing own models, with a lot of traffic (> millions of users). However, it requires more prompt engineering, infrastructure customization and data science knowhow.
  • Fine-tuning costs differ, with OpenAI models typically more expensive than alternatives like LLaMa-v2-7b​​.
  • OpenAI's models are noted for their higher costs, but being more straight-forward, and efforts like the GPT-3.5 Turbo API aim to reduce prices​​.


Do you need support with choosing the right big tech provider, Generative AI product vendor or just want to kick off your project?


We are here to support you: contact us today.


Need support with your Generative Ai Strategy and Implementation?

🚀 AI Strategy, business and tech support 

🚀 ChatGPT, Generative AI & Conversational AI (Chatbot)

🚀 Support with AI product development

🚀 AI Tools and Automation

Get in touch
How to strategically use GPTs from OpenAI
By Nina Habicht 03 May, 2024
This blog explains how gpts can be used as a part of your Generative AI journey and exploration towards your Ai strategy.
Why implementing ai tools is not an ai strategy
By Nina Habicht 03 May, 2024
This post explains why implementing ai tools without any strategy and business view can be detrimental and lead to not successful ai projects.
Generative AI in 2024, Investment areas in 2024
By Nina Habicht 01 Jan, 2024
This post is abou the major generative AI trends and investment areas in 2024
How schools and universities can use Generative AI
By Nina Habicht 29 Dec, 2023
universities and schools need to change learining approach due to generative AI. How schools and universities can use Generative AI
Supports with the definition of GPTs, alternatives and options to build own chatbots or assistant
By Nina Habicht 25 Dec, 2023
A comprehensive Guide to Alternatives of GPTs and Assistant API from OpenAI
Checklist to implement Generative AI in your company
By Nina Habicht 24 Nov, 2023
this article helps companies like enterprises and sme to successfully implement generative AI by providing best-in-breed frameworks.
By Nina Habicht 01 Nov, 2023
In this blog you will learn about the alternatives to ChatGPT and OpenAI. Where is Bard better than ChatGPT? Bard is the response to OpenAI's ChatGPT. What makes Bard so different to OpenAI? It is free! So you can try it out here whereas ChatGPT costs $20 per month. Another advantage is the microphone on the desktop version to directly speak in your question and get a response. Bard has internet access whereas ChatGPT you need to jump from one service (Web Browsing) to the other Bard covers far more languages (265 as of October 2023) Some drawbacks: it is not able to generate pictures. With ChatGPT DALL E-3 you can generate pictures. Bard only offers you a nice description. Where is Claude better than ChatGPT? Claude is the version of ChatGPT developed by the company Anthropic. This tool is currently accessible only in the UK and US, and not yet available in Switzerland. You might consider using Nord VPN to explore its functionality in your country. Claude has one big advantage to ChatGPT: It can process more "context" ( Generative AI from A to Z ), meaning the input token (100 token equals around 75 words) can be up to 100'000 tokens (75'000 words!). GPT-3 has a limit of 4096 tokens (3072 words) and GPT-4 of 8192 tokens (= 6000 words). So when you want to upload huge files, use Claude.
By Nina Habicht 30 Sep, 2023
In this blog you will learn the slice and dice with Generative AI when it comes to the analysis of your PDFs, excels, CSVs, and more. Learn the first steps on how you can visualize even data with advanced prompt engineering. This article is very useful for analysts, reporting specialists, controllers, and marketers who have to generate reportings and summaries on a regular basis and want to do it more efficiently. Join one of my courses in Zurich to get 1:1 support or send my team a message. What are important analytics use cases with Generative AI? Generative AI can be used to detect patterns and provide ideas when it comes to data visualization. To name some important use cases: Ask and summarize pdfs Analyze sheets with numbers (Web, Sales, News) Extract PDFs, transform them, and query for specific data Generate reports (e.g. make your controlling analysis, generate excel charts) Generate social media posts Generate ideas on how to analyze and visualize data with Generative AI  What is ChatGPT Advanced Data Analysis? This tool was formerly known as "Code Interpreter". Now, it comes with the brand new name "Advanced Data Analysis" but still many people do not know its power and capabilities. So continue reading ... Where can I find ChatGPT Advanced Data Analysis? 1) If you cannot see this option, go to "settings" (left corner of the main dashboard) and activate all Beta release options 2) The availability sometimes is depending on the device and operating system you are working on (iOS, Tablets) 3) Contact the OpenAI support in case you cannot see the option.
By Nina Habicht 17 Sep, 2023
With the advances in Generative AI companies should consider how to be "on top" when it comes to new technologies such as ChatGPT and Generative AI assistants. While Google SEO was one of the main drivers before the launch of ChatGPT by OpenAI, today companies should optimize towards the next-generation of search engines based on large language models and advanced search models. Why are Generative AI searches so intelligent? Intelligent searches basically understand the semantics in a sentence. So old index-based searches used to be sufficient for keyword entries but did often fall short when users did not enter exact the correct keywords or did spelling errors. Also AI based searches with older NLP classification models before the launch of OpenAI were handling some of these challenges but did never achieve the level of intelligence of so-called embeddings. With new vectorbased embeddings ( which are also available by OpenAI ) companies can build intelligent searches that can give to their users the best solution across all their websites. What are embeddings? Basically you can think of a vector represenation. A user question is translated into a vector representation and this vector is used to query a vector database. Think about when you play with your toys. You might group them together based on what they are or how you use them. You might put all your cars together because they all have wheels and can move around. Then you might put all your action figures together because you can play make-believe with them. And your board games would be in another group because you play them on a table. This is similar to how vector embeddings work in computers. But instead of toys, we have words. Just like how you group your toys, a computer groups words that are similar. For example, words like "cat", "dog", and "hamster" could be in one group because they are all pets. But how does a computer know which words are similar? It learns from reading a lot, like how you learn from playing and studying. If the computer sees the word "dog" being used in similar places as the word "cat", it will think these words are related and put them close together in its group. So, in the end, vector embeddings are like a big, organized toy box for a computer, but with words instead of toys. Just like how you can more easily pick a toy to play with when your toy box is organized, a computer can more easily understand and use words when they are nicely grouped by vector embeddings. How can companies create their own Generative AI search? OpenAI did not only change the way companies could search with ChatGPT but it also changed the way companies can create their own page search. There are several paid and open-source models (embeddings from HuggingFace or OpenAI and Meta) to create your own intelligent search. If you need support with the development of your own search contact our team . This graph from TheAiEdge.io nicely illustrates how embeddings work:
By Nina Habicht 17 Sep, 2023
This article gives you a great overview of the most relevant use cases for generative AI. What are the most relevant Generative AI industries? Based on Citi Research, the financial and consumer sectors are among the most significant business fields where General AI is disrupting the current status quo. At Voicetechhub, we observe similar patterns from our clients' needs. The following industries we see most relevant for Gen. AI: Insurance, financial services, education, consumer markets/e-commerce, and healthcare. The most prevalent use cases involve enhancing productivity in IT systems and automating tasks to save daily efforts. Additional Facts: Generative AI in Finance and Accounting : AI is transforming the financial sector by automating trading, personal finance, fraud detection, and robo-advisors ( Boston ). E-commerce and Generative AI: AI helps e-commerce businesses with personalized recommendations, chatbots, and predictive sales analytics ( Businesswire ) Healthcare Generative AI: AI applications in healthcare include disease identification, drug discovery, and personalized treatment plans ( Walton , McKinsey ) Generative AI in the Insurance Sector: AI aids in automating claims processing, detecting fraud, and customizing policy pricing based on individual risk assessment ( McKinsey ).
Show More
Share by: