Blog Layout

The Swisscom Voice Assistant

Riccardo Lopetrone • Jun 12, 2020

Switzerland's first Voice Assistant


The Swisscom Voice Assistant is unique as it can speak up to five languages, especially Swiss German. It has been developed by the Swiss telecommunication company Swisscom.
 
Voicetechhub speaks to Riccardo Lopetrone, Senior Product Manager Voice @Swisscom. We want to share these great insights with you.

Tell me more about your background, experience and how you got in touch with Voice?

I have been working as a TV Product Manager for different telecommunications companies for the last 15 years. Since 5 years I'm responsible for the customer experience around voice controlled devices @Swisscom.

My first contact with voice was in 2011, when Apple presented its first iPhone with Siri. I bought the new 4S model straightaway and made my first experiences with voice assistants. I found the feature very fascinating, even if the Siri capabilities were very limited back then. I then recognized that I wanted to deep-dive into the topic.

In 2016 at Swisscom we developed the first TV box with integrated voice control in Switzerland, a push-to-talk solution on the TV remote control which supported several voice commands on Swisscom TV. When Alexa was then launched in 2018 I started my first experiences with smart speakers. Today I have over a dozen devices at home (Amazon Echo, Google Nest, Homepod and of course the Swisscom Box), which I regularly use to e.g. play music, create shopping lists, control my smarthome devices or check the weather forecast.

Last year Swisscom decided to take the next innovative step so we integrated our own smart speaker into the new Swisscom Box.

How would you describe the Swisscom Box compared to other smart voice-enhanced devices?

All new Swisscom customers with a Swisscom TV contract get the new Swisscom Box with the integrated Voice Assistant. Existing customers can also upgrade to the new box. Beneath great entertainment features and a wide choice of content from the most popular providers (e.g. Teleclub, Netflix, Sky, etc.) the Swisscom Box comes along with two microphones and a speaker to support the new Voice Assistant. The Box is delivered with deactivated microphones to cover data privacy concerns. The customers need to actively switch on the microphones before they can use the voice assistant. Only then the Swisscom Box will react to our wake word "Hey Swisscom"

A few of the available skills (e.g. Spotify or the WiFi skill) first need to be activated on the separate companion app (Swisscom Home App) to allow account linking and for additional information around data privacy (e.g. when showing your WiFi password on the TV screen. All other skills are activated by default.

What are the main features?

Our focus was on the voicification of Swisscom products and services. 
So far, the following skills are available:
  • TV (e.g.channel change, title & actor search, trick modes, TV recommendations, etc.
  • Smarthome (e.g. switch on and off your smarthome devices, execute scenes, etc.)
  • MyCloud (voice control of your pictures and videos from the Swisscom cloud on the big TV screen)
  • Weather (ask for the weather forecast in Switzerland, in collaboration with our companies local.ch & search.ch)
  • News (ask for the latest news, in collaboration with our company bluewin.ch)
  • Router & WiFi control (switch on and off your Internet & Wifi, present the WiFi password on the TV screen)

What is unique about the Voice Assistant of the Swisscom Box?

  • Strong data privacy policies
  • Speech recognition in 5 languages (DE/FR/IT/EN including a Swiss German model for the main dialects in Switzerland
  • A deep integration into our services and products which would only have been partially possible on a 3rd party voice platform

What were the main challenges when developing the box?

  • First of all we had a very challenging launch target. We developed all the components including the hardware in less than 12 months. Since many teams were involved (more than a dozen) the coordination was very complex.
  • A further big challenge was the support of different language models and on top the huge amount of entities (terms) which come along with some of our skills. For the TV search for example we have more than 30'000 programme and movie titles which are available for our customers every week. And sometimes we need to support also mixed languages within the same language model. E.g. if you have chosen the English speech recognition but you are looking for movies with a French actor ("show me movies with Gérard Depardieu").
  • Privacy and transparency around data storage was also a very important topic for us, so it took many discussions with our legal and data governance departments. As an example we decided to deliver the Swisscom Box with deactivated microphones so the customer first needs to activate them if he wants to use the voice assistant. Furthermore the customer can allow or decline that humans will listen to a small part of his generated audio files, which would allow us to improve the system.

What can we expect in the future?

In a first step we are optimizing the basic functionalities (e.g. the wake-word sensibility, the speech recognition for all the supported languages, the intent recognition, etc.). In parallel, we want to improve the existing skills and include more context to enhance the customer experience. And of course we want to expand the number of supported skills with further Swisscom services and products.

Will it be possible to develop a third-party application?

For the time being we are focussing on Swisscom Use Cases. We will evaluate in a later stage if it makes sense to open up the platform for other Swiss companies.
Janine Kraft

Riccardo Lopetrone

Senior Product Manager Voice

Riccardo Lopetrone is Senior Product Manager Voice at Swisscom, the leading telecommunications company in Switzerland. For more than 5 years he has been committed to creating great user experiences for voice controlled services.


Need support with your Generative Ai Strategy and Implementation?

🚀 AI Strategy, business and tech support 

🚀 ChatGPT, Generative AI & Conversational AI (Chatbot)

🚀 Support with AI product development

🚀 AI Tools and Automation

Get in touch
How to strategically use GPTs from OpenAI
By Nina Habicht 03 May, 2024
This blog explains how gpts can be used as a part of your Generative AI journey and exploration towards your Ai strategy.
Why implementing ai tools is not an ai strategy
By Nina Habicht 03 May, 2024
This post explains why implementing ai tools without any strategy and business view can be detrimental and lead to not successful ai projects.
Generative AI in 2024, Investment areas in 2024
By Nina Habicht 01 Jan, 2024
This post is abou the major generative AI trends and investment areas in 2024
How schools and universities can use Generative AI
By Nina Habicht 29 Dec, 2023
universities and schools need to change learining approach due to generative AI. How schools and universities can use Generative AI
Supports with the definition of GPTs, alternatives and options to build own chatbots or assistant
By Nina Habicht 25 Dec, 2023
A comprehensive Guide to Alternatives of GPTs and Assistant API from OpenAI
By Nina Habicht 26 Nov, 2023
Many companies are reluctant when implementing llm-based products because they fear bein confronted with high costs. Especially for medium-sized companies which have not the ressouces or enough capacity to deploy and oprimize their AI models nor to set up an own infrastructure with MLOps. As described in our article about sustainability of Gen. AI applications , cloud and performance costs of running an llm can become very high. What are the cost types when implementing OpenAI or other llms? T here are four types of costs related to llms: Inference Costs Setup and Maintenance Costs Costs depending on the Use Case Other Costs related to Generative AI products What are inference costs? An llm has been trained on a huge library of books, articles, and websites. Now, when you ask it something, it uses all that knowledge to make its best guess or create something new that fits what you asked for. That process of coming up with answers or creating new text based on what it has learned is called inference in LLMs . Usually, developers would call a large language model like GPT-4. But here comes the "but": usually not only large language models account to the total costs when running the final product. To explain: LLMs can be used to classify data (e.g undestand that the text talks about "searching a new car insurance"), for summarization, for translation and for many other tasks. Download the ultimative Gen. AI Task Overview to learn where llms make sense.
Checklist to implement Generative AI in your company
By Nina Habicht 24 Nov, 2023
this article helps companies like enterprises and sme to successfully implement generative AI by providing best-in-breed frameworks.
By Nina Habicht 01 Nov, 2023
In this blog you will learn about the alternatives to ChatGPT and OpenAI. Where is Bard better than ChatGPT? Bard is the response to OpenAI's ChatGPT. What makes Bard so different to OpenAI? It is free! So you can try it out here whereas ChatGPT costs $20 per month. Another advantage is the microphone on the desktop version to directly speak in your question and get a response. Bard has internet access whereas ChatGPT you need to jump from one service (Web Browsing) to the other Bard covers far more languages (265 as of October 2023) Some drawbacks: it is not able to generate pictures. With ChatGPT DALL E-3 you can generate pictures. Bard only offers you a nice description. Where is Claude better than ChatGPT? Claude is the version of ChatGPT developed by the company Anthropic. This tool is currently accessible only in the UK and US, and not yet available in Switzerland. You might consider using Nord VPN to explore its functionality in your country. Claude has one big advantage to ChatGPT: It can process more "context" ( Generative AI from A to Z ), meaning the input token (100 token equals around 75 words) can be up to 100'000 tokens (75'000 words!). GPT-3 has a limit of 4096 tokens (3072 words) and GPT-4 of 8192 tokens (= 6000 words). So when you want to upload huge files, use Claude.
By Nina Habicht 30 Sep, 2023
In this blog you will learn the slice and dice with Generative AI when it comes to the analysis of your PDFs, excels, CSVs, and more. Learn the first steps on how you can visualize even data with advanced prompt engineering. This article is very useful for analysts, reporting specialists, controllers, and marketers who have to generate reportings and summaries on a regular basis and want to do it more efficiently. Join one of my courses in Zurich to get 1:1 support or send my team a message. What are important analytics use cases with Generative AI? Generative AI can be used to detect patterns and provide ideas when it comes to data visualization. To name some important use cases: Ask and summarize pdfs Analyze sheets with numbers (Web, Sales, News) Extract PDFs, transform them, and query for specific data Generate reports (e.g. make your controlling analysis, generate excel charts) Generate social media posts Generate ideas on how to analyze and visualize data with Generative AI  What is ChatGPT Advanced Data Analysis? This tool was formerly known as "Code Interpreter". Now, it comes with the brand new name "Advanced Data Analysis" but still many people do not know its power and capabilities. So continue reading ... Where can I find ChatGPT Advanced Data Analysis? 1) If you cannot see this option, go to "settings" (left corner of the main dashboard) and activate all Beta release options 2) The availability sometimes is depending on the device and operating system you are working on (iOS, Tablets) 3) Contact the OpenAI support in case you cannot see the option.
By Nina Habicht 17 Sep, 2023
With the advances in Generative AI companies should consider how to be "on top" when it comes to new technologies such as ChatGPT and Generative AI assistants. While Google SEO was one of the main drivers before the launch of ChatGPT by OpenAI, today companies should optimize towards the next-generation of search engines based on large language models and advanced search models. Why are Generative AI searches so intelligent? Intelligent searches basically understand the semantics in a sentence. So old index-based searches used to be sufficient for keyword entries but did often fall short when users did not enter exact the correct keywords or did spelling errors. Also AI based searches with older NLP classification models before the launch of OpenAI were handling some of these challenges but did never achieve the level of intelligence of so-called embeddings. With new vectorbased embeddings ( which are also available by OpenAI ) companies can build intelligent searches that can give to their users the best solution across all their websites. What are embeddings? Basically you can think of a vector represenation. A user question is translated into a vector representation and this vector is used to query a vector database. Think about when you play with your toys. You might group them together based on what they are or how you use them. You might put all your cars together because they all have wheels and can move around. Then you might put all your action figures together because you can play make-believe with them. And your board games would be in another group because you play them on a table. This is similar to how vector embeddings work in computers. But instead of toys, we have words. Just like how you group your toys, a computer groups words that are similar. For example, words like "cat", "dog", and "hamster" could be in one group because they are all pets. But how does a computer know which words are similar? It learns from reading a lot, like how you learn from playing and studying. If the computer sees the word "dog" being used in similar places as the word "cat", it will think these words are related and put them close together in its group. So, in the end, vector embeddings are like a big, organized toy box for a computer, but with words instead of toys. Just like how you can more easily pick a toy to play with when your toy box is organized, a computer can more easily understand and use words when they are nicely grouped by vector embeddings. How can companies create their own Generative AI search? OpenAI did not only change the way companies could search with ChatGPT but it also changed the way companies can create their own page search. There are several paid and open-source models (embeddings from HuggingFace or OpenAI and Meta) to create your own intelligent search. If you need support with the development of your own search contact our team . This graph from TheAiEdge.io nicely illustrates how embeddings work:
Show More
Share by: