Blog Layout

What are the costs for enterprises to use llms?

Nina Habicht • Nov 26, 2023

Many companies are reluctant when implementing llm-based products because they fear bein confronted with high costs. Especially for medium-sized companies which have not the ressouces or enough capacity to deploy and oprimize their AI models nor to set up an own infrastructure with MLOps. As described in our article about sustainability of Gen. AI applications, cloud and performance costs of running an llm can become very high.


What are the cost types when implementing OpenAI or other llms?


There are four types of costs related to llms:


  1. Inference Costs
  2. Setup and Maintenance Costs
  3. Costs depending on the Use Case
  4. Other Costs related to Generative AI products



What are inference costs?


An llm has been trained on a huge library of books, articles, and websites. Now, when you ask it something, it uses all that knowledge to make its best guess or create something new that fits what you asked for. That process of coming up with answers or creating new text based on what it has learned is called inference in LLMs.


Usually, developers would call a large language model like GPT-4. But here comes the "but": usually not only large language models account to the total costs when running the final product. To explain: LLMs can be used to classify data (e.g undestand that the text talks about "searching a new car insurance"), for summarization, for translation and for many other tasks. Download the ultimative Gen. AI Task Overview to learn where llms make sense:


Download Overview

What are the cost factors affecting the LLM costs?


Factors affecting LLM cost include model size (e.g. how many parameters it uses), with larger models requiring more resources, and context length (how much data, respectively text you provide to the model when you request your question), impacting computational demands​​. Keep in mind that larger models are not always better.


  • Enterprises face a range of choices between proprietary models like OpenAI's and open-source models like Llama 2 and Falcon 180-B​​. The selection of an LLM ifself depends on specific use cases and a balance between cost and performance​​.


Which Generative AI use cases cost more?


As outlined in our "Tasks where Generative AI helps" framework we show different use cases where generative AI makes sense.


  • Cost vary especially with the use case. For tasks like "summarizing" GPT-4 is often preferred in researches but comes with high costs at OpenAI.
  • Usually chatbots are more complex and cost-intensive than other Gen. AI products, especially when they do offer more than a "one-shot" response and can "talk for a while" about a topic and hold memory.
  • If your information has to be up-to-date at any time to deliver sufficient user-experience and avoid image issues (e.g. healthcare, insurance, fintech sectors), Retrieval Augmented Generation (RAG, see below for more information) costs will be coming on top. These RAG caused daily query costs vary between OpenAI and other models, reflecting different pricing structures​​.
  • Finally, "fine-tuning" is the most expensive process which means you change or customize at least one parameter in your llm neural network. This may change how the model learns on knowledge or its reasoning capabilities. "This leads to $100,000 for smaller models between one to five billion parameters, and millions of dollars for larger models" as outlined in AI Business article. Thus, finetuning comes only into play for large Enterprises or startup that finetune their own large language model or domain (e.g. BloomGPT). In this case, open-source models may lead to less costs in the long-term than OpenAI model finetuning. Keep in mind that people sometimes also refer to "finetuning" related to the optimization of a model concerning its purpose or domain and do not mean refer to any "change the model parameters". The latter is not as expensive as "real finetuning".


How much does a OpenAI model cost per month?


The monthly costs of the model really depend on the model you use, e.g. whether you use GPT-3.5 with 4K context (see below in the table), GPT3.4 with 16K context, GPT-4 with 8K context or GPT-4 with 32K context or even the new "Turbo versions", and on the usage (traffic) of your Generative AI product.


The yearly costs range from $1K-50K on the low usage end, depending on which model. Or from $1M-56M a year for high usage as outlined in the calculation below. Studies show that for low usage OpenAI model make sense whereas for high usage and traffic cases with many millions of users, open-source models can safe costs. 


Source: Differences between OpenAI' models. GPT4  is  as OpenAI describes "10 times more advanced than its predecessor, GPT-3.5. This enhancement enables the model to better understand the context and distinguish nuances, resulting in more accurate and coherent responses." 

Source: Costs are depending on the GPT model and context as well usage requests. For high traffic applications the authors recommend to use own open-source models instead GPT, respectively OpenAI. Find more calculations here.

What are the maintenance costs related to llms?


To set up the infrastructure for llms you need to consider several points:


First, you need to set up open-source models or proprietary models like GPT on a cloud environment. It is also possible to set up the infrastructure on-premises (where you run your models on private servers) but usually this requires costly GPUs and hardware from providers like NVIDIA. As Skanda Vivek outlines this requires a A10 or A100 NVIDIA GPU. A10 (with 24GB GPU memory) costs $3k, whereas the A100 (40 GB memory) costs $10–20k with the current market shortage in place.


  1. Compute: FLOPs (floating point operations per second) needed to complete a task. This depends on the number of parameters and data size. Larger models are not always better in performance.
  2. Data Center Location: Depending on the local data center, energy efficiency and CO2 emission may differ. 
  3. Processors: Computing processors (CPUs, GPUs, TPUs) vary in energy efficiencies for specific tasks



Source: Open Source can become more expensive than GPT-3.5, especially due to complexity in maintenance, prompt engineering and extensive data science knowledge required for non-proprietary models like Llama 2.

What are other costs related to LLMs?


Very often embeddings (vectorization of your text corpus) and RAG (Retrieval Augmentation Generation) are needed besides mere GPT or llm models.


Why? Traditional language models can sometimes make errors or give generic answers because they're only using the information they were trained on. With RAG, the model can access up-to-date and specific information, leading to better, more informed answers. Example: Let's say you ask, "What's the latest research on climate change?" A RAG model first finds recent scientific articles or papers about climate change. Then, it uses that information to generate a summary or answer that reflects the latest findings, rather than just giving a general answer about climate change. This makes it much more useful for questions where current, detailed information is important. For Enterprises, up-to-date content is key, thus there must be some mechanism to retrieve best content continuously.


What is better? Open-Source LLM vs. Proprietary Models, e.g. OpenAI?


 This is difficult to say. It depends on your requirements. Please find more in this useful article.

"ChatGPT is more cost-effective than utilizing open-source LLMs deployed to AWS when the number of requests is not high and remains below the break-even point." as cited from the Medium article.

Key Learnings when it comes to the costs for Generative AI products

The costs related llms vary from your use case (chatbot, analysis, voicebot, FAQ bot, summarization, etc.), traffic exposure, performance and set up (on-premises, cloud) requirements.


  • Open-Source is often more flexible than OpenAI when implementing own models, with a lot of traffic (> millions of users). However, it requires more prompt engineering, infrastructure customization and data science knowhow.
  • Fine-tuning costs differ, with OpenAI models typically more expensive than alternatives like LLaMa-v2-7b​​.
  • OpenAI's models are noted for their higher costs, but being more straight-forward, and efforts like the GPT-3.5 Turbo API aim to reduce prices​​.


Do you need support with choosing the right big tech provider, Generative AI product vendor or just want to kick off your project?


We are here to support you: contact us today.


Need support with your Generative Ai Strategy and Implementation?

🚀 AI Strategy, business and tech support 

🚀 ChatGPT, Generative AI & Conversational AI (Chatbot)

🚀 Support with AI product development

🚀 AI Tools and Automation

Get in touch
Top ChatGPT Prompts and Prompt Engineering for Product Managers
By Nina Habicht 10 May, 2024
Top ChatGPT Prompts and Prompt Engineering for Product Managers
How Does Perplexity AI Work?
By Nina Habicht 04 May, 2024
Article about Perplexity AI and how to properly use it. We highlight the difference between ChatGPT and Perplexity AI.
How to choose between ChatGPT, Claude, Copilot and Gemini
By Nina Habicht 04 May, 2024
How to choose between ChatGPT, Claude, Copilot and Gemini. Please find our ultimate guideline on how the models differ and which llm is strong at which task.
How to strategically use GPTs from OpenAI
By Nina Habicht 03 May, 2024
This blog explains how gpts can be used as a part of your Generative AI journey and exploration towards your Ai strategy.
Why implementing ai tools is not an ai strategy
By Nina Habicht 03 May, 2024
This post explains why implementing ai tools without any strategy and business view can be detrimental and lead to not successful ai projects.
Generative AI in 2024, Investment areas in 2024
By Nina Habicht 01 Jan, 2024
This post is abou the major generative AI trends and investment areas in 2024
How schools and universities can use Generative AI
By Nina Habicht 29 Dec, 2023
universities and schools need to change learining approach due to generative AI. How schools and universities can use Generative AI
Supports with the definition of GPTs, alternatives and options to build own chatbots or assistant
By Nina Habicht 25 Dec, 2023
A comprehensive Guide to Alternatives of GPTs and Assistant API from OpenAI
Checklist to implement Generative AI in your company
By Nina Habicht 24 Nov, 2023
this article helps companies like enterprises and sme to successfully implement generative AI by providing best-in-breed frameworks.
By Nina Habicht 01 Nov, 2023
In this blog you will learn about the alternatives to ChatGPT and OpenAI. Where is Bard better than ChatGPT? Bard is the response to OpenAI's ChatGPT. What makes Bard so different to OpenAI? It is free! So you can try it out here whereas ChatGPT costs $20 per month. Another advantage is the microphone on the desktop version to directly speak in your question and get a response. Bard has internet access whereas ChatGPT you need to jump from one service (Web Browsing) to the other Bard covers far more languages (265 as of October 2023) Some drawbacks: it is not able to generate pictures. With ChatGPT DALL E-3 you can generate pictures. Bard only offers you a nice description. Where is Claude better than ChatGPT? Claude is the version of ChatGPT developed by the company Anthropic. This tool is currently accessible only in the UK and US, and not yet available in Switzerland. You might consider using Nord VPN to explore its functionality in your country. Claude has one big advantage to ChatGPT: It can process more "context" ( Generative AI from A to Z ), meaning the input token (100 token equals around 75 words) can be up to 100'000 tokens (75'000 words!). GPT-3 has a limit of 4096 tokens (3072 words) and GPT-4 of 8192 tokens (= 6000 words). So when you want to upload huge files, use Claude.
Show More
Share by: