What are the costs for enterprises to use llms?

Nina Habicht • November 26, 2023

Many companies are reluctant when implementing llm-based products because they fear bein confronted with high costs. Especially for medium-sized companies which have not the ressouces or enough capacity to deploy and oprimize their AI models nor to set up an own infrastructure with MLOps. As described in our article about sustainability of Gen. AI applications, cloud and performance costs of running an llm can become very high.

What are the cost types when implementing OpenAI or other llms?

There are four types of costs related to llms:

Inference Costs
Setup and Maintenance Costs
Costs depending on the Use Case
Other Costs related to Generative AI products

What are inference costs?

An llm has been trained on a huge library of books, articles, and websites. Now, when you ask it something, it uses all that knowledge to make its best guess or create something new that fits what you asked for. That process of coming up with answers or creating new text based on what it has learned is called inference in LLMs.

Usually, developers would call a large language model like GPT-4. But here comes the "but": usually not only large language models account to the total costs when running the final product. To explain: LLMs can be used to classify data (e.g undestand that the text talks about "searching a new car insurance"), for summarization, for translation and for many other tasks. Download the ultimative Gen. AI Task Overview to learn where llms make sense:

Download Overview

What are the cost factors affecting the LLM costs?

Factors affecting LLM cost include model size (e.g. how many parameters it uses), with larger models requiring more resources, and context length (how much data, respectively text you provide to the model when you request your question), impacting computational demands. Keep in mind that larger models are not always better.

Enterprises face a range of choices between proprietary models like OpenAI's and open-source models like Llama 2 and Falcon 180-B. The selection of an LLM ifself depends on specific use cases and a balance between cost and performance.

Which Generative AI use cases cost more?

As outlined in our "Tasks where Generative AI helps" framework we show different use cases where generative AI makes sense.

Cost vary especially with the use case. For tasks like "summarizing" GPT-4 is often preferred in researches but comes with high costs at OpenAI.
Usually chatbots are more complex and cost-intensive than other Gen. AI products, especially when they do offer more than a "one-shot" response and can "talk for a while" about a topic and hold memory.
If your information has to be up-to-date at any time to deliver sufficient user-experience and avoid image issues (e.g. healthcare, insurance, fintech sectors), Retrieval Augmented Generation (RAG, see below for more information) costs will be coming on top. These RAG caused daily query costs vary between OpenAI and other models, reflecting different pricing structures.
Finally, "fine-tuning" is the most expensive process which means you change or customize at least one parameter in your llm neural network. This may change how the model learns on knowledge or its reasoning capabilities. "This leads to $100,000 for smaller models between one to five billion parameters, and millions of dollars for larger models" as outlined in AI Business article. Thus, finetuning comes only into play for large Enterprises or startup that finetune their own large language model or domain (e.g. BloomGPT). In this case, open-source models may lead to less costs in the long-term than OpenAI model finetuning. Keep in mind that people sometimes also refer to "finetuning" related to the optimization of a model concerning its purpose or domain and do not mean refer to any "change the model parameters". The latter is not as expensive as "real finetuning".

How much does a OpenAI model cost per month?

The monthly costs of the model really depend on the model you use, e.g. whether you use GPT-3.5 with 4K context (see below in the table), GPT3.4 with 16K context, GPT-4 with 8K context or GPT-4 with 32K context or even the new "Turbo versions", and on the usage (traffic) of your Generative AI product.

The yearly costs range from $1K-50K on the low usage end, depending on which model. Or from $1M-56M a year for high usage as outlined in the calculation below. Studies show that for low usage OpenAI model make sense whereas for high usage and traffic cases with many millions of users, open-source models can safe costs.

Source: Differences between OpenAI' models. GPT4 is as OpenAI describes "10 times more advanced than its predecessor, GPT-3.5. This enhancement enables the model to better understand the context and distinguish nuances, resulting in more accurate and coherent responses."

Source: Costs are depending on the GPT model and context as well usage requests. For high traffic applications the authors recommend to use own open-source models instead GPT, respectively OpenAI. Find more calculations here.

What are the maintenance costs related to llms?

To set up the infrastructure for llms you need to consider several points:

First, you need to set up open-source models or proprietary models like GPT on a cloud environment. It is also possible to set up the infrastructure on-premises (where you run your models on private servers) but usually this requires costly GPUs and hardware from providers like NVIDIA. As Skanda Vivek outlines this requires a A10 or A100 NVIDIA GPU. A10 (with 24GB GPU memory) costs $3k, whereas the A100 (40 GB memory) costs $10–20k with the current market shortage in place.

Compute: FLOPs (floating point operations per second) needed to complete a task. This depends on the number of parameters and data size. Larger models are not always better in performance.
Data Center Location: Depending on the local data center, energy efficiency and CO2 emission may differ.
Processors: Computing processors (CPUs, GPUs, TPUs) vary in energy efficiencies for specific tasks

Source: Open Source can become more expensive than GPT-3.5, especially due to complexity in maintenance, prompt engineering and extensive data science knowledge required for non-proprietary models like Llama 2.

What are other costs related to LLMs?

Very often embeddings (vectorization of your text corpus) and RAG (Retrieval Augmentation Generation) are needed besides mere GPT or llm models.

Why? Traditional language models can sometimes make errors or give generic answers because they're only using the information they were trained on. With RAG, the model can access up-to-date and specific information, leading to better, more informed answers. Example: Let's say you ask, "What's the latest research on climate change?" A RAG model first finds recent scientific articles or papers about climate change. Then, it uses that information to generate a summary or answer that reflects the latest findings, rather than just giving a general answer about climate change. This makes it much more useful for questions where current, detailed information is important. For Enterprises, up-to-date content is key, thus there must be some mechanism to retrieve best content continuously.

What is better? Open-Source LLM vs. Proprietary Models, e.g. OpenAI?

This is difficult to say. It depends on your requirements. Please find more in this useful article.

"ChatGPT is more cost-effective than utilizing open-source LLMs deployed to AWS when the number of requests is not high and remains below the break-even point." as cited from the Medium article.

Key Learnings when it comes to the costs for Generative AI products

The costs related llms vary from your use case (chatbot, analysis, voicebot, FAQ bot, summarization, etc.), traffic exposure, performance and set up (on-premises, cloud) requirements.

Open-Source is often more flexible than OpenAI when implementing own models, with a lot of traffic (> millions of users). However, it requires more prompt engineering, infrastructure customization and data science knowhow.
Fine-tuning costs differ, with OpenAI models typically more expensive than alternatives like LLaMa-v2-7b.
OpenAI's models are noted for their higher costs, but being more straight-forward, and efforts like the GPT-3.5 Turbo API aim to reduce prices.

Do you need support with choosing the right big tech provider, Generative AI product vendor or just want to kick off your project?

We are here to support you: contact us today.

< Older Post

Newer Post >

Need support with your Generative AI Strategy and Implementation?

🚀 AI Strategy, business and tech support

🚀 ChatGPT, Generative AI & Conversational AI (Chatbot)

🚀 Support with AI product development

🚀 AI Tools and Automation

Get in touch

Which Tool for What? State of AI Tools 2025

By Nina Habicht • May 8, 2025

Should I use several AI tools or stick to one platform? That's a question I often hear from clients. 𝐓𝐡𝐞 𝐫𝐞𝐚𝐥 𝐚𝐧𝐬𝐰𝐞𝐫? 𝐈𝐭 𝐝𝐞𝐩𝐞𝐧𝐝𝐬 𝐨𝐧 𝐲𝐨𝐮𝐫 𝐮𝐬𝐞 𝐜𝐚𝐬𝐞. Ask yourself: What problem are you trying to solve? Our guideline to be successful with your AI tool journey 1. Start by exploring a few major large language model platforms (ChatGPT, Gemini, Claude, etc.). - Gemini -> Amazing multimodality, images - ChatGPT -> Swiss Knife for AI, great for coding, logical and analytical tasks. - Claude -> Psychological, enhanced writing and strong with coding 2. Once you’ve defined your use case, commit to one main tool and consider upgrading to a paid version for the full experience. Still continue experimenting with specialised tools for certain tasks, so you learn, get ideas and can depriorize certain use cases. 3. Most importantly, invest in learning prompt engineering and focus on solving real problems that deliver value for you or your business and your clients. Sometimes, you don’t even need AI!

How to Use AI Chatbots in Your Business – and Stay Compliant

By Nina Habicht • April 29, 2025

AI-powered chatbots, whether developed in-house or deployed through trusted platforms, are revolutionizing customer service, knowledge access, and internal communication. However, alongside these opportunities come new legal obligations: data protection , transparency , and EU AI Act compliance must be addressed carefully. This article covers: Where AI chatbots bring business value What compliance risks you must manage How to implement AI chatbots successfully and securely

Reasoning vs. Deep Research vs. RAG

By Nina Habicht • March 25, 2025

What's RAG? The goal is for the language model is not to draw on its own knowledge (from the model), but for information to be enriched in the prompt. This is usually your own data you provide to the model (PDFs, systems).

AI Video Creation: The Ultimate Guide to Runway, Luma AI, Haiper.ai, and Hailuo AI

By Nina Habicht • February 16, 2025

Video Creation: The Ultimate Guide to Runway, Luma AI, Haiper.ai, and Hailuo AI

What are the Best AI Powerpoint Tools

By Nina Habicht • February 16, 2025

What are the best AI powerpoint tools. Discover tools that create presenations with AI.

Image Generator and AI Branding Tool Review for Marketing and Brand Managers

By Nina Habicht • November 24, 2024

Ultimative review of all relevant image creation tools

Optimizing your Website for AI: How to get found by ChatGPT

By Nina Habicht • August 24, 2024

Optimizing your Website for AI: How to get found by ChatGPT. This article provides concrete Large Language Model Optimization strategies for SMEs and companies.

A Practical Guide for Midjourney Image Generation

By Nina Habicht • August 24, 2024

Since August 2024, users have been able to use the web version of the image creation tool Midjourney. This simplifies usage by providing a user-friendly interface to experiment with one of the top Generative AI image creation tools available. We tested it for you and are sharing helpful tips and tricks. How to prompt images with Midjourney? If you use Midjourney on discord, there is a clear prompt structure and prompt parameters to adhere to. Usually, it makes sense to stick to it: 1) To prompt use "/Imagine" 2) Then enter your subject (description and details) you want to see on the image and it's environment (see yellow highlighted below in the prompt example) 3) Then enter composition, lightning, colours (see green highlighted below in the prompt example) 4) Finally add technical parameters to adjust and finalize your image. Please find a useful parameter library here.

Understanding Agentic AI: A Game-Changer for the Future of AI

By Nina Habicht • August 11, 2024

This article explains agentic AI and why it is so important when building generative AI and chatbot applications. Overview about Agentic AI vs. Gen. AI vs. RPA and all you need to know about these terms.

LLM Benchmarks: Finding the right LLM for your Needs

By Nina Habicht • July 29, 2024