If you’ve had experience with AWS Transcribe, you will notice that despite it’s excellent accuracy, it is unfortunately very slow, which can make things awkward if your application is time-sensitive.
One alternative was IBM Watson, which I personally found to be much faster than AWS Transcribe, by an order of magnitude, and it does support keyword matching, which is great if you are looking out for key phrases in audio. However, I did find it was less accurate than AWS transcribe.
You can get started with IBM Watson for free, without a credit card, and you can subscribe to the free version of the speech to text API (500 minutes free per month), which gives you enough to test with. This example assumes you already have an API key, and service URL, which you can get from the IBM website.
So, in this example, I am using a pre-recorded WAV file, which is included in my project (Build action set to copy always). The WAV file is a 8Hkz format, which is really low quality.
So, lets include the NUGET package by typing;
Then, we’ll write a bit of code to test this out – The project type is a .NET Core, Console app.
var bAudio = File.ReadAllBytes(“Sample.wav”);
var memAudio = new MemoryStream(bAudio);
const string apiKey = “xxxxxx”;
var authenticator = new IamAuthenticator(apiKey);
var service = new SpeechToTextService(authenticator);
var results = service.Recognize( memAudio, model: “en-US_NarrowbandModel”);
In terms of complexity, this is certainly easier than the equivalent code for AWS Transcribe, since you don’t need to upload to S3, poll on the results, then download from S3 again.
It’s pretty much the same price as AWS at $0.02 per minute, but has different pricing tiers, so it’s hard to compare like-for-like.