This site may earn affiliate commissions from the links on this folio. Terms of use.

Astute followers of artificial intelligence may recall a moment from three years ago, when Google announced it had birthed unto the earth a computer able to recognize cats using only videos uploaded past YouTube users. At the time, this represented something of a loftier h2o mark in AI. To get an thought for how far nosotros have come since so, one has only to reflect on recent advances in the RoboWatch project, an endeavor that is instruction computers to learn complex tasks using instructional videos posted on YouTube.

That innocent "learn to play guitar" clip you posted on your YouTube video feed last week? It may someday contribute to putting Carlos Santana out of a job. That's probably pushing information technology; it's more likely that thousands of home nurses and domestic staff will be axed long earlier guitar gods have to compete with robots. A recent groundswell of interest in bringing robots into the market place as caregivers for the elderly and infirm, in part fueled by graying population bases throughout the developed world, has created the necessity for teaching robots simple household tasks. Enter the RoboWatch projection.

Well-nigh avant-garde forms of AI currently in use rely upon a branch of supervised machine learning, which requires large datasets to exist "trained" on. The basic idea is that when provided with a sufficiently large database of labeled examples, the figurer tin learn to recognize what differentiates the items inside the training set, and afterward apply that classifying ability to new instances information technology encounters. The one drawback to this form of artificial intelligence is that it requires large databases of labeled examples, which are non always available or require much human curation to create.

RoboWatch is taking a dissimilar tack, using what'southward called unsupervised learning to discover the of import steps in YouTube instructional videos without any previous labeling of data. Accept for case a YouTube video on omelet making. Using the RoboWatch method, the computer successfully parsed the video on omelet creation and itemize the important steps without having first been trained with labeled examples.

color code the activity steps we discovered and visualize their key-frames and the automatically generated captions.

Color lawmaking action steps and automatically generated captions, all created by the RoboWatch algorithm for making an omelet.

It was able to practice this by looking at a big corporeality of instructional omelet-making videos on YouTube and creating a universal storyline from their audio and video signals. Equally information technology turns out, most of these videos will contain certain identical steps, such as cracking the eggs, whisking them in a bowl, and so on. When presented with enough video footage, the RoboWatch algorithm tin tease out what the essential parts of the process are and what is arbitrary, creating a kind of archetypal omelet formula. It'southward easy to see how unsupervised learning could speedily enable a robot to gain a vast assortment of applied household know-how while keeping human instruction to a minimum.

The RoboWatch project follows similar advances in video captioning pioneered at Carnegie Mellon Academy. Earlier this year, we reported on a project headed by Dr. Eric Xing, which seeks to use real-time video summarization to detect unusual activity in video feeds. This could pb to surveillance cameras with the congenital-in ability to detect suspicious activity. Putting these developments together, information technology'southward clear unsupervised learning models using video footage are likely to pave the way for the adjacent breakthrough in artificial intelligence, ane that will see robots entering our lives in ways that are likely to both scare and fascinate u.s..