Machine-learning system tackles speech and object recognition, all at once

MIT computer scientists have developed a system that learns to identify objects within an image, based on a spoken description of the image. Given an image and an audio caption, the model will highlight in real-time the relevant regions of the image being described.

Unlike current speech-recognition technologies, the model doesn’t require manual transcriptions and annotations of the examples it’s trained on. Instead, it learns words directly from recorded speech clips and objects in raw images, and associates them with one another.

The model can currently recognize only several hundred different words and object types. But the researchers hope that one day their co…
MIT News – Electrical engineering and computer science (EECS) – Computer science and technology

You may also like...

1 Response

  1. 03/03/2019

    […] of neurons. It accomplishes this task using a combination of standard computational methods and machine-learning techniques. In a new paper, the software’s creators demonstrate that CaImAn achieves […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: