You can help improve the text to speech voices!
We are currently working very hard to fix a lot of pronunciation corrects (words that are read in a strange/funny/wrong way) in the text-to-speech voices. No text-to-speech engines are perfect, so sometimes it is needed to “learn” it how certain words should be pronounced.
Words such as Foreign words, brand names, technical expressions, abbreviations and person names can be a bit tricky for the TTS to pronounce since they are usually exceptions not following the “rules” of the language. All languages are made up by rules and exceptions, and the list of the latter is almost endless.
You can help!
If you come across words that you should say belong to the “common” category, please let us know. You can use the Customer Service/Feedback menu on http://webreader.readspeaker.com
Speech-enabling for the long-tail
As you might have remembered when I wrote a post about From birth of the talking web and into the future. I owed you a follow-up note so here it is! As I had discussed, we started out by having a focused approach on which customers we should approach and which end-users would most benefit from a server-side speech-enabling solution for web sites. On the user side we have seen that the usages of our technology have increased over the past years making it appealing to a greater number of users. On the customer side, we also witnessed a greater variety of sectors interested in speech-enabling their web content ranging from public sites to banks, insurance companies, non-profit organisations and many others.
Now over the past months another change happened. We started getting an increasing amount of incoming leads from much smaller web sites and blogs also interested in speech-enabling their content. This could range from the mom and pop store with a web site to the blogger interested in space technology. These are typically 1 to 10 people organisations. Some of them are purely personal initiatives ie someone interested in a hobby while others might be freelancers, consultants, designers or any other small company or non-profit organisation. Since our company is set up to deal with mid-sized and bigger organisations we needed to see how we could propose an easy way for all these smaller web sites and blogs to speech-enable their content. The idea here was to really get a grasp on the essential features that matter to this segment and not throw in all the bells and whistles that serve no purpose at all. Then we thought how to make the implementation process as easy as possible so that all these new small customers could simply integrate our solution as a no-brainer either by using plug-ins we have developed for some popular CMS and blog platforms or either as a simple copy & paste of our HTML code directly into the source code of the page. The last point was to create a new web shop where both personal web sites and blogs as well as small companies and organisations could easily choose the most suitable package for their needs, sign-up and subscribe as seamlessly as possible.
We are now proud to announce that we are ready to launch this new venture! Our new product for this segment is called webReader and you can find out all about it by going to www.readspeaker.com. We hope you will enjoy this new service and find it useful and we will dedicate our maximum attention to support you in the best way possible. We are starting off with American and British English, Swedish and French voices and will be adding more very shortly.
Good article on text-to-speech
My colleague and co-blogger Daniel Erkstam has just published a good article about the history of text-to-speech technology. Click here to read the full article about the history of TTS.
From birth of the talking web and into the future (part 1)
We have been in the business of speech-enabling web sites since 1999, date at which I had the idea to bring text-to-speech into the arena of web sites. What was my motivation for doing so? I realised that a certain number of people around me had problems or felt uncomfortable reading text found on web sites. Sure, screen readers were already around and TTS had been built into operating systems but these options were simply not used by these users who I questioned hard about how they would like web sites to function. On the other side I thought to myself that for a web site owner it would be a useful feature to help users get an easier and free access to the audio version of their content without having to take care and worry about developing, installing, maintaining and updating this themselves. The combination of those 2 findings gave birth to ReadSpeaker which was commercially launched in Sweden back in 2001.
At the beginning I had a very focused idea of which web site owners this would appeal to. I started approaching the public sector as well as web sites that were aimed at disability groups. At the beginning the end users who I thought about were mainly people who suffered from dyslexia and other various reading disabilities. Then a strange thing happened. I started getting feedback from users that I had not even thought of would use ReadSpeaker. These were senior citizens who appreciated the comfort of having the choice between reading or listening the text content of web sites. These were foreigners living in Sweden who liked to be able to listen to Swedish instead of reading it. These were students who could listen to lessons by saving the mp3 file to their mobile devices. These were “information workers” who in their fast paced environments needed to listen to web content while taking care of other tasks at the same time. These were….well you got me, the circle of users kept and keeps on getting bigger and bigger. This trend also had an effect on the customers that we started approaching and that were also increasingly contacting us. From the narrower group of public and disability web sites, we started implementing ReadSpeaker on a greater variety of areas like the banking sector, the insurance companies, transport organisations, online media, etc.
What happened next was very interesting, but more on that in another (soon to come) post
Google Knol – now with text-to-speech
A few days ago Google announced that they begin to experiment with text-to-speech on their “Knol”.
Quote from their site: “We are experimenting with Audio Playback as an option for some knols, starting with a handful of English language featured knols. You can listen using our Flash player, or by downloading an mp3 file and using any mp3 player.”
If a listen-button is shown next to the “print” and “share” button, you know that the Knol is available also as audio.
Read all about it and try it out here: http://knol.google.com/k/knol-help/knol-audio-playback/
Non Latin Support for the ReadSpeaker Enterprise
In August we are releasing the support for Chinese and Arabic on the ReadSpeaker Enterprise Services platform. Please contact info@voice-corp.com or stay tuned to this blog to hear more about it.
Guest blogger: Speech syntheses – one for each purpose
This is a post from todays guest blogger: Daniel Erkstam, Nordic Sales Director for VoiceCorp.
The pictures shows two robots. The left one is an industrial robot from ABB that probably is used to build cars or something similar. The right one is one of the most advanced AI robots that can be found today. It is possible to converse with it and it is very human like.
Both robots serve their purpose and do it well. And it is the same with speech syntheses.
When we launched the first speaking web services back in 2001 the only available voices was very robotic ones and became kind of boring listening to on longer texts. Today we use voices made in a complete different technique and the quality become closer and closer to recorded speech.
But the thing is that the older voices is still used by a lot of people and is even preferred compared to the newer ones for some purposes. For example people with visual impairment often prefer the older voices for screen-reading software’s like Jaws. The reason is that the older voices are more consequent on how they read the text and you can get used to the odd and robotic character of the voice. The older voices also read out the text in a more detailed way. The voices we use today are a lot more human like but also more “forgiving” when it comes to spelling errors and some words from foreign languages etc. The secret behind that is many times bigger database with the phonemes.
We know that the smaller need a person have for a synthetic speech, the harder judge he/she will be. We who doesn’t have reading difficulties or visual impairment can see/read the text and compare that to the voice speaking. Then we react on every little slight error in the pronouncing by the synthesis.
We put a lot of effort to make the reading as good as possible by making a lot of customizations so that the speech syntheses pronounce the current website’s vocabulary as good as possible. Because we know that there is a strong connection between how good it sounds and how many people that will use the service.
Back to the robots again: They might both serve their purposes well. But I guess it would be an easy choice which one you would pick to serve visitors at the reception desk, right?
SpeechMachine text-to-speech in Viral Marketing
A sausage says more then a thousand words.
Scan, one of the leading Swedish brands just launched a really great viral marketing campaign using VoiceCorps SpeechMachine solution. The idea for the campaign is quite cool. The campaign is for marketing Scan’s new line of spicy sausages. They wanted to add some nice interaction with the user so they added the strongest media around. Speech.
The core functionalty is that the users can send “speech-cards” to each other. They enter the text, listens if it is good and send the speech card to a friend.
The cool thing is that we used a Spanish voice but using Swedish speech rules. The result is a Spanish guy speaking Swedish. It’s brilliant! It really sounds like a guy from Spain that only lived a few years in Sweden. Enough time to learn the language but keeping a strong Spanish accent. The speech solution itself was delivered in just a couple of hours thanks to SpeechMachines ability to integrate with all the TTS engines on the market.
SpeechMachine is provided by VoiceCorp as a 100% hosted service that allows creative web developers to easily add text-to-speech functionality to their web apps without requiring any knowledge about text-to-speech technology. The communication with the customer’s web based app and the SpeechMachine is based on standard HTTP requests, and is therefore really easy to integrate in any web app.
Want to try out the app, http://www.scan.se/kryddigakorvar/
Podcasting made simple! rSpeak VocalFruits
VoiceCorp announced today, together with VocalFruits, that they launcing the rSpeak VocalFruits Information Composing System.
It is a “Web 2.0″ web application where anyone can create a podcast from any RSS source and where content owners such as bloggers can offer their audience a speaking version of their content!
The blog posts- Bang! Right into iTunes.
rSpeak VocalFruits will basically replace AudioFeed (www.audiofeedcreator.com), a not very social, but very appreciated free web service that I created about a year and a half ago.
With the new web based podcasting service, any registred user can create a podcast from any RSS feed in no time! There are also a couple of really cool features like aggregating a number of RSS feeds into one podcast or why not create a personal podcast that you can update (adding posts to) just by emailing to your personal vocalfruits email address.
In addition to the podcast RSS feed it also creates a web browser version and a mobile version ideal for mobile devices such as mobile phones and PDA’s.
In this release there will be support for US English, French and Spanish. Support for more languages (Swedish, Dutch, German, UK English and Portuguese) will be available within a month or two from what I heard. Also it will be changed so that you do not need to be a registered user to be able to listen… Anybody should be able to listen. That is key.
Check it out at www.vocalfruits.com. Stay tuned!
Want to become a TTS Voice? Now you can!
I met one of our TTS suppliers the other day. Lars-Erik Larsson, the CEO of Acapela Group. He told me about their latest development being a service to create corporate voices for their text-to-speech engine. The coolest thing was that they could now offer it at a very reasonable cost thanks to a new technology and procedures they have developed. The Acapela Voice Factory. and it only takes between 3 weeks(!) for the simplest version and up to 14 weeks for the full quality version!
Still the cost level is not really reachable for private users, but there are other (free) alternatives as well like the FestVox with the Festival TTS platform (however you can not in any way compare the quality with the commercial solutions).
With a price tag starting at 7500€ (excluding the cost of the speaker) it is really reachable for a lot of companies that want to have a corporate TTS voice that can easily pay off by using TTS in to automate some customer support, automate switchboards, or just do it as a fun thing in for example interactive web campaigns. But you would also need to buy licenses for the engine itself if you want to use it. But that is a very reasonable cost in such a project.
Is there a market? Sure. Lets say we have two car manufacturer that want to integrate speech into their cars. Obviously, Volvo would not want to use the same voice as Saab for instance. You must hear the difference
. Some companies today have a corporate voice that they use in all radio and TV commercials, and wouldn’t it be great to have that voice talent also answering the phone on 30 lines at the same time?
However, if you have dreamt of immortality, this is one step closer. But if you decide to give up your voice up to a TTS engine, there are a couple of things you should be aware of.
- Your voice can technically be used to speak very dirty words and there is always the risk of people using it in a very very bad way.
- You can never really use your voice as something that identify that it is actually you. I.e. quite a problem if you also happens to be a big fan of speech verification systems…
The target group for corporate voices are mainly, well, corporations. Congratulations Acapela-Group!
