Create and share a new lesson based on this one.

About TED-Ed Originals

TED-Ed Original lessons feature the words and ideas of educators brought to life by professional animators. Are you an educator or animator interested in creating a TED-Ed original? Nominate yourself here »

Meet The Creators

  • Educator Kostas Karpouzis
  • Director Lasse Rützou Bruntse
  • Script Editor Alex Gendler
  • Composer Laura Højberg Kunov
  • Sound Designer Tobias Dahl Nielsen
  • Narrator Addison Anderson


Additional Resources for you to Explore
The 1968 film “2001: A Space Odyssey” was one of the first mainstream films that dealt with the idea of machines recognizing human emotions. In the film HAL9001, the on-board computer on a space ship, interacts with the astronauts, estimates their intentions, expresses emotions and even makes decisions regarding the space mission. In real life, could we ever really build machines which understand and express emotions, an inherently psychological concept which starts building in our brains before it’s even understood and expressed? Rosalind Picard, a professor at MIT, tried to make sense of it all in her 1995 Technical Report “Affective Computing”. Picard uses the phrase “computing that relates to, arises from, or influences emotions,” to define affective computing, but one still needs to define “emotions” with a measurable approach for machines to represent and understand them. Merriam-Webster's dictionary defines affect as “a set of observable manifestations of a subjectively experienced emotion”, a description which machines and CS people can relate to: this means that we’re focusing on how emotion (which includes all behavioral, expressive, cognitive and physiological aspects) is consciously experienced by a person and how this experience is manifested through characteristics which can be measured: facial expressions, changes in speech prosody and content, or physiological changes, such as sweating or pupil dilation.

In order to train computers to recognize emotions, we employ machine learning: scientists attempt to mimic the human learning process by presenting computers with multiple instances of a particular pattern (in this cases, expressive faces or clips of expressive speech) and adapting complex equations so that they approximate the expected output from each pattern. Given the low computing power available in the 1990s, early approaches used scaled-down, cropped mug-shots, but eventually moved towards more natural instances, where algorithms identify expressive facial features and track how those are deformed, e.g. when smiling. In the case of expressive speech, one of the common approaches is to extract a representation of the pitch of the person’s voice and calculate simple or complex statistical features.

Which brings us to the question of digital representation: recognizing particular facial expressions can sometimes be challenging for humans, let alone having to teach machines how to classify them. Paul Ekman, a prominent U.S. psychologist, developed a theory about “universal facial expressions”, where specific facial manifestations of six emotions are deemed to be recognizable by people across cultures and ages. The initial list included anger, disgust, fear, happiness, sadness and surprise; the original six basic emotions are the ones widely used by CS people to classify affective behavior. This theory lends well to classification, the problem of identifying which of a set of categories a particular instance belongs to: given a set of images, videos or speech clips, a classification algorithm identifies the most likely class each of them belongs to. Here, the more labeled examples we use when training, the better chance we have that an image of a smiling person we haven’t used in the training set will be correctly classified as ‘joy’. One of the most common machine learning algorithms used here is neural networks. Neural networks consist of artificial neurons, which exchange information and form connections with each other. The connections between these artificial neurons have numeric weights which are adjusted based on the class that each sample belongs to during the training process. So, the network adapts to and learns new information, much like how our brain functions when learning.

If you think that is difficult to collect, think about all the selfies that go around the internet, how much we talk on our mobile phones, or the amount of status updates we post every day. So the big question is not how to collect the necessary data to train our machine learning algorithms, but what we’re going to do with that. Would it be something beneficial or resemble a dystopian scenario, like the ones in Hollywood movies? And, in real life, what’s stopping a computer from ‘sacrificing’ humans, in order to perform the task it has been assigned? And how is a self-driving car going to decide whether to stop abruptly when someone jaywalks, risking a crash with the car right behind it? These are all questions that we’re going to have to face, sooner rather than later, and have to do with the amount of power that we’re committed to give to machines in exchange for a more comfortable life.