I mainly translated and summarized several people’s discussion from this reference in Chinese. I don’t agree with all of the points and I added some of my own views.

To be modest, I still believe in startups like SenseTime, Face++, Cogtu, Linkface, DeepGlint etc. What we discuss here may be a joke for the future. But who cares? 🙂

Original authors: Filestorm (Jianan Hao from NTU), Naiyan Wang from HUST

06/30/15 update: Here are some great resources for

Awesome Deep Vision

Translator: Ruofei Du

Someone ponders that computer vision would shift the cosmos while someone says AI (artificial intelligence) is far behind success.


As for research, this is the best era of computer vision; this is the worst era of computer vision.

First of all, there is no doubt that computer vision, as a research field, is developing at the most swift pace in AI along the history: no matter the number of researchers, or the number of publications, citations. Nevertheless, in spite of the amazing acceleration, almost every IT person claims that technology will not shift the entire world in one night, let alone that computer vision is not comparable as human beings without any controlled constraints. We cannot accomplish that goal yet, even with the most expensive computer.

Novel computer vision technology would definitely replace “the work that is boring to human being” in a very specific circumstance. From the point of global economy, such occurances will never be tsunami, but only a little lipper for the current industry field.

However, if you watch carefully with a magnifier, each lipper worths billions of dollars in the market.

Filestorms argued that “revolution via technology is a splendid power, but only by technology cannot make one startup to be the next MS, Google or Apple.


=== What do start-ups do in computer vision? ===

Someone thinks that founders of CV startups are usually too bookish; someone considers that technology without any product is nothing but a blank paper.

Technology is not toys for games, you cannot make a product tomorrow by yesterday’s paper immediately. The research power of CV startups is mainly: conquer the gap between theory and product.


Fortunately, we witness the greatest revolution in the past ten years: Deep Learning. Since the paper entitled “ImageNet Classification with Deep Convolutional Neural Networks“, there is no much change in the core framework of deep learning. However, it takes the academia so many years (since Hinton’s work in 2006) to expand deep to many related work that seems to be only one step away. Many high level vision problems are largely solved in controlled experiments, for instances: how to locate an object in a picture? What if many objects appear in a scene? Can we use deep learning for video? What about face and car make recognition? What if we don’t have enough training data? Can we classify 10,000 or even 100,000 classes? What about NLP / Speech / Biology and combinations?

Deep Learning has limited innovation in theory in the current state. Most researchers are contributing to datasets and CNN manipulation. Worsestill, I have heard five Chinese teams claim to be the first leader in ImageNet competition. Is Deep Learning coming to the stage of comparing scores?

Such problems have already or are being conquered by the academia, however, due to the requirement of originality and funding limitation, there are more problems that can never be solved by the academia, such as:

  • Turn algorithms with 200 pictures to 100,000,000.
  • Turn slow algorithm to 100x faster while keeping the accuracy.
  • Combine computer vision and natural language processing and knowledge graph to solve the same problem
  • Use user feedback to improve the learning results.

It seems that computer vision can do everything, but nothing.


A “killer” application is of great necessary to rescue computer vision start-ups to cover a large number of users

=== How to avoid being purchased or going to bankrupt? ===

Basically, most startups in computer vision targeted at being purchased because they cannot achieve a large number of users without great advertising. In the tight competition of patents, user experiences and products and usually neglects by most of the startups.

If one startup does not aim at being purchased nor going to bankrupt, ordinary users should never be emphasized too much. Finding a particular market for the computer vision product can be a great way to persist for a startups. Computer vision can bring bright future for startups if only they gain big data with a huge number of users. Honestly, the gap in computer vision startups right now is not the technology but the users. Genius engineering people should be regarded equally with genius research people.

Startups are not the sum of intelligent guys, but the game among people, investment and luck.

Finally, let’s list several good computer startups:

  • SenseTime: SenseTime builds the next-gen visual understanding and artificial intelligence engine driven by big data and deep learning. We offer SDKs and cloud APIs that can be integrated on different types of platforms, allowing enterprise partners and developers to easily employ first-class computer vision technologies. API including face recognition, image recognition and surveillance video.
  • Face++: Face++ provides one of the premium free face recognition services. My friend Chunyang Wu’s kan-lian.com uses this service.
  • Cogtu: I don’t link this company because they had a wrong quote in their homepage… The best way to predict future is to create it – Alan Turing should be “The best way to predict future is to invent it” – Alan Kay … anyway, they are an Internet advertising company and supported CVPR… (one of the best ways to attract top researchers)
  • MalongTech: style.ai  fashion times by AI = ? They had a faked demo with a iPhone interface in this website. When clicking on a fashion item like a handbag, it shows up a beauty with that handbag…. = =b
  • Linkface: They also provides free API for face recognition. I did not try it by myself, but there are rumors that it is champion of FDDB (Face Detection Data Set and Benchmark) for several times.