Speech recognition between American and Chinese companies

I’ve already written here that I started trying out speech input. I’ve tried various ones for both Chinese and English, namely Apple, Sogou, and IFlyTek. Sogou is a relatively well known, at least in China, company that used to have sizable search market share quite a while ago. It’s also famous for its Chinese input method, which is its default. IFlyTek is this little known company in Hefei, Anhui, that can tap smart graduates of the University of Science and Technology of China (中国科技大学) there, arguably the nerdiest school in China. I was rather disappointed, and my impression was that these Chinese companies have a long way to go in AI, compared the top American ones.

Curious to see a more objective comparison, I did a test, where I recorded something on this matter that I thought of impromptu.

Yes, it sounds very hesitant and stumbly, because it was entirely improvised. But it’s good enough. What did, Apple, Sogou, and IFlyTek generate respectively when tested on this audio file.

Apple

我想做一下作业,然后试试中国公司和美国公司的语音识别比较一下。我排客之后对这些中国公司感到非常的失望,就不用说,谷歌苹果很可能都比最好的中国公司多苹果。的强项不是我觉得中国的科技公司这两年好多都是媒体可以的。他们斗地主还是继续在美国所找到的人也都是下个二流的同时,很可能在中国的顶级的开发人,还知道那就是说最好的开发员可能在美国还比中国多多得多。(here, many characters were actually omitted as opposed to misrecognized)

Sogou

我想在这上面做一下录音然后试一试中国公司和美国公司的。语音识别你想一下我还克我肯定会之后对那些中国公司感到非常的失望就不用说,谷歌苹果很可能都比这里好的,公司的好多。虽然苹果强项不是人工智能我觉得中国科技公司这两年好多都是媒体吹的他们谁小技术还是咱俩落后基本上没到的人也都是香的,二流的。同时很可能在中国的电机的开发员还少那就是说最好开发员华人开发员在美国还比中国多甚至多的多

IFlyTek

我想,在这上面,做一下录音,然后试一试。中国公司和美国公司的,语音识别,比较一下。拍客,我肯定会,之后,对,那些中国公司,感到非常,的失望。就不用说谷歌了。苹果,很可能都比,这个好的中国公司,做的好的多。苹果。都瞧瞧,不是人工智能。我觉得中国。的,科技公司,这两年。好多都是媒体吹的,他们,11首。的技术,还是落后,即使在美国,所招到的人,也,都是相对流的,同时,很可能,在中国的。顶级的开发员,还。,那就是说。最好,的开发员。华人开发员。在,美国,比中国,多,甚至多得多。

It turned out that Sogou and IFlyTek are actually a bit better than Apple for speech recognition, to my surprise, which just goes to show how flawed subjective impressions can be. Of course, all of them made numerous major errors, such that I can see why speech input still isn’t widely used (as far as I know). Even for English, Apple make some errors. I told me friend this, and he said, “strange, it’s usually pretty reliable for me, maybe your voice isn’t clear enough.” Though he was using Google’s on an Android, and we all know that Google is the world leader in AI, almost certainly quite a ways ahead of the other top companies in it. So I tried out Google’s as well, via this, and the result was

我想在这上面做一下录音然后试一试中国公司和美国公司的语音识别比较一下我差可我肯定会之后对那些中国公司感到非常的失望就不用说谷歌苹果很可能都比最好的中国公司做的好的多虽然苹果的强项不是人工智能我觉得中国的科技公司这两年好多都是媒体吹的他们实际上的技术还是等等6号其实在美国所招到的人也都是香奈儿流的同时很可能在中国的顶级的开发源还非常少那就是说最好的开发华人开房源可能在美国还比中国多甚至多的多

It’s comparable in accuracy to IFlyTek, maybe a bit worse.

Of course, I’m sure Google and Apple invested relatively little on Chinese speech recognition. Just like Sogou and IFlyTek invested little on English (or maybe they trained on English spoken with Chinese accents), because their English speech recognition basically felt like complete garbage.

In any case, we can still see that speech recognition and AI in general still has a long way to go. After all, your AI is only as good as the data you feed to train it. It will never handle cases exceptional to the training set and not programmatically hard coded, unless there is a major paradigm shift in how state-of-the-art AI is done (so something even better than neural nets).

Whoever reads this is welcome to do a similar experiment comparing Google Translate with Baidu Translate. I did, but I didn’t record the results so it doesn’t really count as a completed experiment.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.