Speech recognition between American and Chinese companies

I’ve already written here that I started trying out speech input. I’ve tried various ones for both Chinese and English, namely Apple, Sogou, and IFlyTek. Sogou is a relatively well known, at least in China, company that used to have sizable search market share quite a while ago. It’s also famous for its Chinese input method, which is its default. IFlyTek is this little known company in Hefei, Anhui, that can tap smart graduates of the University of Science and Technology of China (中国科技大学) there, arguably the nerdiest school in China. I was rather disappointed, and my impression was that these Chinese companies have a long way to go in AI, compared the top American ones.

Curious to see a more objective comparison, I did a test, where I recorded something on this matter that I thought of impromptu.

Yes, it sounds very hesitant and stumbly, because it was entirely improvised. But it’s good enough. What did, Apple, Sogou, and IFlyTek generate respectively when tested on this audio file.

Apple

我想做一下作业,然后试试中国公司和美国公司的语音识别比较一下。我排客之后对这些中国公司感到非常的失望,就不用说,谷歌苹果很可能都比最好的中国公司多苹果。的强项不是我觉得中国的科技公司这两年好多都是媒体可以的。他们斗地主还是继续在美国所找到的人也都是下个二流的同时,很可能在中国的顶级的开发人,还知道那就是说最好的开发员可能在美国还比中国多多得多。(here, many characters were actually omitted as opposed to misrecognized)

Sogou

我想在这上面做一下录音然后试一试中国公司和美国公司的。语音识别你想一下我还克我肯定会之后对那些中国公司感到非常的失望就不用说,谷歌苹果很可能都比这里好的,公司的好多。虽然苹果强项不是人工智能我觉得中国科技公司这两年好多都是媒体吹的他们谁小技术还是咱俩落后基本上没到的人也都是香的,二流的。同时很可能在中国的电机的开发员还少那就是说最好开发员华人开发员在美国还比中国多甚至多的多

IFlyTek

我想,在这上面,做一下录音,然后试一试。中国公司和美国公司的,语音识别,比较一下。拍客,我肯定会,之后,对,那些中国公司,感到非常,的失望。就不用说谷歌了。苹果,很可能都比,这个好的中国公司,做的好的多。苹果。都瞧瞧,不是人工智能。我觉得中国。的,科技公司,这两年。好多都是媒体吹的,他们,11首。的技术,还是落后,即使在美国,所招到的人,也,都是相对流的,同时,很可能,在中国的。顶级的开发员,还。,那就是说。最好,的开发员。华人开发员。在,美国,比中国,多,甚至多得多。

It turned out that Sogou and IFlyTek are actually a bit better than Apple for speech recognition, to my surprise, which just goes to show how flawed subjective impressions can be. Of course, all of them made numerous major errors, such that I can see why speech input still isn’t widely used (as far as I know). Even for English, Apple make some errors. I told me friend this, and he said, “strange, it’s usually pretty reliable for me, maybe your voice isn’t clear enough.” Though he was using Google’s on an Android, and we all know that Google is the world leader in AI, almost certainly quite a ways ahead of the other top companies in it. So I tried out Google’s as well, via this, and the result was

我想在这上面做一下录音然后试一试中国公司和美国公司的语音识别比较一下我差可我肯定会之后对那些中国公司感到非常的失望就不用说谷歌苹果很可能都比最好的中国公司做的好的多虽然苹果的强项不是人工智能我觉得中国的科技公司这两年好多都是媒体吹的他们实际上的技术还是等等6号其实在美国所招到的人也都是香奈儿流的同时很可能在中国的顶级的开发源还非常少那就是说最好的开发华人开房源可能在美国还比中国多甚至多的多

It’s comparable in accuracy to IFlyTek, maybe a bit worse.

Of course, I’m sure Google and Apple invested relatively little on Chinese speech recognition. Just like Sogou and IFlyTek invested little on English (or maybe they trained on English spoken with Chinese accents), because their English speech recognition basically felt like complete garbage.

In any case, we can still see that speech recognition and AI in general still has a long way to go. After all, your AI is only as good as the data you feed to train it. It will never handle cases exceptional to the training set and not programmatically hard coded, unless there is a major paradigm shift in how state-of-the-art AI is done (so something even better than neural nets).

Whoever reads this is welcome to do a similar experiment comparing Google Translate with Baidu Translate. I did, but I didn’t record the results so it doesn’t really count as a completed experiment.

A bug in WeChat

Nothing major. But you see, I shared my first moment on it. The result was:

WeChatHyperlinkBugScreenshot

You see the problem right? Broken hyperlink. The bug I filed:

WeChatBugFiling

I’m kind of surprised and disappointed that they’ve missed this edge case for so long. And to be honest, I don’t feel like WeChat is all that great technically. A while ago, I tried their web interface, and it was shit, barely usable. I didn’t find their Moments (朋友圈) feature social_wechat_moment in Discover (发现) easy to use either. I wanted to post the above message, and I had to Google to find that to post one without a photo, you need to hold the camera icon at the top right corner for a bit. Not a terribly intuitive interface.

So, in spite of all the recent hype, a Danish data scientist told me that Chinese are deeply incompetent, due to corruption and incompetent leadership, and that other Euros who know China have told him the same. Similarly, University of Washington CS prof (now at Stanford) James Landay, who spent a few years at Microsoft Research Asia, wrote December of 2011 that Chinese computer science, while having made tremendous strides, is still leagues behind. I doubt his opinion has changed that much over the past almost seven years. Personally, I haven’t found most Chinese from China software engineers here all that impressive, though of course, I’ve also seen some really brilliant and creative ones. Of course, there is also that software engineers in general, wherever they’re from, are just not that smart, compared to say mathematicians or physicists or real engineers, due to the low intellectual difficulty of most of the work. Apparently, a senior engineer at Google can think that “eigenvalue” is “specialized terminology.” Of course, any serious STEM person will think you’re a total joke if you say that. Luboš Motls has written on his blog that most programmers think like folks in the humanities, not natural scientists. On this, I concur almost 100%.

 

Trying out speech input

I wrote my previous blog article lying in bed at night very tired, trying out speech recognition input. I was using the one provided by Sogou. It turned out that even after many manual corrections, there were still several errors made which I didn’t catch. You can check the complexity and level of ambiguity of the writing itself (of course you’ll have to read Chinese). You also don’t know how clearly I spoke. Yes, it can be a problem when you speak quickly without a certain level of enunciation, especially when your combination of words isn’t all that frequent. There are of course also exceptional cases which a smart human would easily recognize that the machine cannot, like when I say Sogou, a human with the contextual knowledge would not see it as “so go.” Of course, this is expected, AI is only as good as your training data.

I tried Google’s speech recognition too, here, and initially it seemed to work much better, until it started to make errors too. Next, I tried IFlyTek, this company in Hefei which supposedly hires a ton of USTC (中科大) grads. Still not much better. It’s much easier to type Chinese and have to select something other than the default very occasionally. Turns out that the statistical NLP techniques for Chinese input work well enough, especially given the corpus that Sogou, whose input method I use, has accumulated over time. I had read that back a while ago, it even criticized Google for using some of their data for their Pinyin input method, and Google actually conceded that it did. It’s expected that the Chinese companies in China would have easier access to such data. Even so, Google Translate still works visibly better than Baidu Translate, even for Chinese.

From an HCI perspective, it’s much easier to input English on phone than to input Chinese. Why? Because spelling (Pinyin in the case of Chinese) correction, necessarily for phone touch-screen keyboard, works much better for English than for Chinese. Sure, Sogou provides a 9 key input method as shown below (as opposed to the traditional 26 key),

SogouNineKeyScreenshot

where once one is sufficiently practiced, the key press error rate goes down significantly, but the tradeoff is more ambiguity, which means more error in inference to manually correct. In the example below, 需要(xu’yao) and 语言(yu’yan) are equivalent under the key-based equivalence relation (where equivalence classes are ABC, DEF, PQRS, etc). Unfortunately, I meant 语言(yu’yan) but the system detected as 需要(xu’yao).

SogouNineKeyInferenceError

You can kind of guess that I wanted to say that “Chinese this language is shit.” The monosyllabic-ness of the spoken Chinese language, in contrast to the polysyllabic (?) languages in the Middle East for which the alphabet was first developed, obstructed the creation of an alphabet. Because each distinct syllable in Chinese maps to so many distinct characters with different meanings, there would be much ambiguity without characters. For an extreme example of this, Chinese linguistic genius Yuen Ren Chao (赵元任) composed an actually meaningful passage with 92 characters all pronounced shi by the name of Lion-Eating Poet in the Stone Den.

I remember how in 8th grade history class, an American kid in some discussion said how our (Western) languages are so much better than their (Chinese-based) languages, and the teacher responded with: I wouldn’t say better, I would say different. Honestly, that kid has a point. Don’t get me wrong. I much appreciate the aesthetic beauty of the Chinese language. I’m the complete opposite of all those idiot ABCs who refuse to learn it. But no one can really deny that the lack of an alphabet made progress significantly harder in many ways for Chinese civilization. Not just literacy. Printing was so much harder to develop, though that is now a solved problem, thanks much to this guy. There is also that Sogou’s Chinese OCR, which I just tried, basically doesn’t work. Of course, nobody really worries about this now, unlike in the older days. In the early 20th century, there were prominent Chinese intellectuals like Qian Xuantong (钱玄同) who advocated for the abolishment of Chinese characters. Moreover, early on in the computer era, people were worried that Chinese characters would be a problem for it.

In any case, unless I am presented with something substantially better, I can only conclude that any claim, such as this one, that computers now rival humans at speech is bullshit. I was telling a guy yesterday that AI is only as a good as your training data. It cannot really learn autonomously. There will be edge cases in less restricted contexts (unlike in chess and go, where there are very precisely defined rules) such as a computer vision and speech recognition obvious to a human that would fool the computer, until the appropriate training data is added for said edge case and its associates. Analogously, there has been a near perpetual war between CAPTCHA generators and bots over the past few decades, with more sophisticated arsenals developed by both sides over time. Technical, mathematically literate people, so long as they take a little time to learn the most commonly used AI models and algorithms, all know. Of course, there will always be AI bureaucrats/salesmen conning the public/investors to get more funding and attention to their field.

Don’t get me wrong. I still find the results in AI so far very impressive. Google search uses AI algorithms to find you the most relevant content, and now deep learning is being applied to extract information directly from image context itself, vastly improving image search. I can imagine in a decade we’ll have the same for video working relatively well. To illustrate this, China now has face recognition deployed on a wide scale. This could potentially be used to search for all the videos a specific person appears in by computationally scanning through all videos in the index, and indexing corresponding between specific people and times in specific videos. Of course, much of the progress has been driven by advances in hardware (GPUs in particular) which enable 100x+ speedup in the training time. AI is mostly an engineering problem. The math behind it is not all that hard, and in fact, relatively trivial compared to much of what serious mathematicians do. Backpropagation, the idea and mathematical model behind deep learning that was conceived in the 80s or even 70s in the academic paper setting but far too computationally costly to implement at that time on real world data, is pretty straightforward and nowhere near the difficulty of many models for theoretical physics developed long ago. What’s beautiful about AI is that simple models often work sufficiently well for certain types problems so long as the training data and computational power is there.

为什么中国核心科技依然薄弱

我与一位清华电子工程毕业但博士之后转至软件开发的人午饭聊天,他说若Facebook消失,人很可能变得更加有效,而相反,若Intel消失,科技及我们的现代生活会几乎停顿。此理明显,而甚少所提。在本人眼中,将网络公司英文述为”tech”,其实是对技术的一种严重歪曲,对理工真才实学的人是一种笑话,也是对他们的一种贬值,因为这种虚伪的词语宣传只不过是起一种误导公众及反知识反科学的不良效应。当然,在市场经济,真正的天分和能力经常是不太受重视并经济价值不高,价值高的反而是会做买卖会搞关系会吹的人和技术含量不高但助做买卖的工作,典型为网络公司的码农。一般来讲,高级的理工人才非常的专或者如果做理论的东西未有直接的经济价值,选择(若留在他们的专长里)极少,所以很少能拿到特别高的工资。这也是为什么中国(美国也差不多)愿意投资或投入核心科技产品,以芯片为典型例子,的人极少,即使非常聪明天性善于真正科技的人也经常随着社会及经济压力和诱惑离开他们当前的热爱,这是很可怜的趋势。一般人的视野是极其肤浅的,没办法,只能通过优生加上教育提高一个社会的整体素质,可惜的是,在市场经济下,连教育都会成为给学生提供的产品,尤其在现在的制度文化放纵的美国。不过美国由于他之前某些划时代性创新,如半导体的发明,所积累的领先地位,无论如何,都会有精英追求真正的工作,当然现在比以前也少的多了。中国不然,还很落后,必须依靠明智的计划制度弥补多年的空白。

中国人基础科学做的牛的与国外相比的确比较少,而且大多在国外,这没有什么争议。中国的精英科学文化的确还远远差于西方,缺乏适当的传统,这只能慢慢建立,在这一点可以像很成功的日本学习一下。我感觉从某种角度而言,中国人还是非常缺乏日本人所有的那种为民族奉献的精神,这一点,老一辈是有,但是从改革开放上大学那一辈开始就基本消失了,受美国人的精神污染了。共和国头30年被迫隔离于美国大大推迟了,从某种角度,中国的经济发展,但是迫使了中国建立一种科技上自力更生的传统。随着改革开放,中国融入美国为主的体系,这种精神有了大大的涣散,但是也有遗留。比如我跟我的美国朋友说中国计算机上还未出过真正有国际影响的尖端产品,他却回答中国阻止美国网络公司而建立自己的网络企业是明智的选择不然早就被外国给吞下去了。中国封闭Google和Facebook的确很丢人,可是为了自己的经济实力和国家战略角度而言,此代价不用说是远远直得的。我也想到过,如果中国政府如五六十年代那样限制精英的个人自由绝对不允许出国留学的”彻底叛变”但是给他们远远更好地培养和工作安排,中国今天的科技水平会远远更高研制出远远更多的尖端技术产品,拥有独立的包括芯片及其生态系统(是包括操作系统和兼容的以后所有重要的应用软件)。

有人说中国快要超过美国了,我问在什么方面?他回“在所有方面。”我说在经济上凭中国的量加上整体水平相当高会很可怕的,但是以科技代表的质量还有很漫长的路要走,中国的尖端科技水平还是太落后了,有很漫长的路要走。中国人必须在学习先进国家的同时,发展出自己独特的科技研发的体系和风格,敢于采取在适当的情况下极端的措施为实现目的,不要太在乎别人的看法,尤其美国的看法,因为中国现在已经有足够的实力和好的趋势支撑全心全意的追求自己的道路。

有一位从中国过来的在计算机行业工作的人有一次跟我说“ABC最惨,既不能当美国人又失去了当中国人的机会。”他觉得在美国的华人作为被动的少数民族是非常可怜的。在看到哈弗大学对华裔学生的系统其实被证实的情况下,我自己也会说,凭自己的经验,天分高的华人与他们能力复合的培养和发展机会是相对难的,导致华人的水平和地位低于他们的天分,与白人相比。美国的体制把好多先天很好的中国孩子搞坏,不光在事业上,同时也给他们一些自我认同问题。这些人如果留在中国,在得到足够政府支持的情况下,能为中国作出伟大的贡献而非在美国被荒废掉。不光是他们,还有他们的子孙。由于移民制度对高智商的筛选(大略,移民率是智商的单挑函数),在美国的华人的智商分布会有很粗的右尾巴,但是美国种族配额和歧视能容纳的给华人的高智商发展机会是有限的,很多华人必然怀才不遇,大材小用。

我小学一年级来美国,但慢慢的发觉美国文化有很多非常骗人的地方,所以要防止洗脑啊。在了解美国文化和体质和学习美国好的东西的同时,阅读中文,学习俄文,接触欣赏苏联和中国的红色基因给了我对世界更多元化的认识。虽然前苏联已经大败解体了,未能全面,即使在科技的范围内,追上美国,它依然做出好多精彩的结果,以严峻危机所促进的革命性的新制度实现了奇迹,震惊了世界,给了当时贫穷落后受列国欺凌的中国适当的启发和榜样,也提供了决定性的科技知识和援助,让中国千年的古文明在近代战火的背景下浴火重生,直到今天,面临着超级大国的地位。苏联创造的新制度,新文化给世界留下了宝贵的财富,在科学上,在艺术上,在政治思想和体质上。我觉得苏联的那套远远的更符合中国的国情,尤其与美国相比,中国应当把苏联的东西适当的与自己的文化和情况相结合,把社会主义带到前所未有的高峰。当然,中国若要真正成为前苏联那样的超级大国,必须先成为科技强国,做一些颠覆性带领潮流的首先,像苏联的航天那样,而非仅在别人的核心工作基础之上做出一些实用性科技。这需要自信而系统的发挥自己文化和体质的强点和独特之处,大胆的投入长期的核心科技研发和基础探索,促成更多的不同尖端领域的独特文化和群体。中国精英知识分子要多发扬先辈所继承的革命精神,非崇洋媚外,敢于挑战权威,创造新的奇迹载入史册。

Face recognition in China

I recently learned that face recognition, led by unicorns SenseTime and Megvii, has reached the level of accuracy and comprehensiveness that it is percolating into retail and banking, and moreover police are using it to detect suspects, or so various media articles say, like this one. Just Google “face recognition china.” I’m both surprised and impressed. Of course, in hindsight, what they did was mostly collect, aggregate, and organize enough data to train the deep learning models to the level that they can be put to production. The Chinese government has, after all, resident identity cards for all Chinese citizens with photos. I was certainly somewhat envious of the people involved in that in China, and I feel like such a failure compared to them, and that my life has been so boring and uneventful in comparison. Of course, whether I’m suited to do deep learning is another matter. After playing a bit with neural nets, including on the canonical MNIST data set, I sure was disappointed, and I understood immediately why this guy, who is doing a machine learning PhD at Stanford, had said to me that deep learning is very engineering heavy. I wish I had the enthusiasm and motivation for stuff like GPUs. As for that, all I’ve done was play with CUDA in a way so minor almost as if I did absolutely nothing. Again I don’t see myself as terribly suited towards engineering (I’m too much a purist at heart), but I might eventually be compelled to become interested in that, and once I do, I don’t think I’ll do badly. This also makes me wonder what I would’ve ended up like had I stayed in China. I’m sure I would’ve been weird there too, though I would also be more like everyone else. I wonder what I would have ended up majoring in there, and what I would’ve ended up doing afterwards. I’d like to think that I would have gotten a much better education and cultural experience there, though of course, the grass is always greener on the other side of the fence. For instance, in America, Asian quotas means you are judged relative to other Asians, but being in China means that automatically, and China, by virtue of having low resources per capita, is, needless to say, a grossly competitive society with fewer second chances, and thereby even harsher on late bloomers, though surely, the gaokao happens at age 18, whereas in America, grades start necessarily mattering at as early as age 14-5, when many are still very immature. I must acknowledge that as much as I dislike various aspects of the American education system, it is extremely generous, from what I see, relatively speaking, in tolerating failure at a young age. In China, you test into a specific department at a university, and once you’re in, it’s very hard to change, which means some land in majors they end up finding themselves unsuitable for. At age 18, it’s really hard to make such a decision, especially when you don’t really know anything about the actual content of the major, which is usually the case when one is a clueless kid. This is why I say that before you commit officially to an area, always try to learn something about it on your own beforehand to increase confidence that you actually have at least reasonable, and preferably high, talent for it.

On the broader topic of technology in China, it is needless to say that they are still quite a ways behind America and the advanced Western countries. Look at what the ZTE ban has done. China has its own CPUs but not the ecosystem for it. China still buys and deploys much of its most advanced military technology, including jet engines and surface-to-air missiles, an indicator that its indigenous versions of those are still seen as unproven, unreliable, and of lower quality, though surely they’ve made great strides on that the past decade. I stumbled on the video of this military parade held on August 1st, 2017 to mark the 90th anniversary of the Nanchang Uprising that showcased some of the latest developments. I didn’t like it all that much at first, with its overall presentation, the imagery and music in sync, kind of, how should I say it, corny, and I felt the music paled in comparison to the music of the Soviet Red Army, which is very hard to beat, at least based on my taste, though listening to the music again, I grew to like it more. Surely, I would characterize the whole thing as rather sinister, and representative portions of that would be this and this. Musically, the part that left the most memorable impression was this, and to be honest, I found the non-musical aspect of that part both awkward and sinister, especially coupled with the music. I’m sure many people in the West would view this parade as rather weird, or even effeminate, as much as I hate that stereotype of East Asians in America.

Yet in spite of overall and in some cases critical backwardness, China is managing to unveil a face recognition system at a level of sophistication and scale, and also scariness/creepiness that many in America could only dream of. Surely, that was far from my expectation. Who knows. Maybe in a decade, China will have a nationwide genome database. I say this with the awareness that for anything of scale, there is a tremendous advantage to homogeneity and central organization. We already see, in the case of face recognition, China’s using this to compensate for its inferior technology as far as strict quality and capability is concerned.

As far as I can tell, Chinese and Chinese society place a strong emphasis on STEM and the society as a whole is far more scientifically literate than American society, which is advantageous for certain pro-STEM policies and government, though surely, China is still struggling to produce the best people in many areas, for which the corresponding elite subcultures in the West are difficult if not impossible to transmit. It will be very interesting to see what kind of novel stuff comes out of China organically over the next decade or so, especially as China seeks further to create its own distinct ecosystem, as opposed to remaining in many ways still a subsidiary of America and Russia. In any case, I am quite a fan of the political culture of China, and on the contrary, I am rather sick of the one in America.