Most language learning software is, unsurprisingly, language-centric, focused on spoken or written language. At present, online learning interactions are based primarily on cognition, not emotion or the integration of involuntary / concomitant aspects of communication conveyed through, for example, gaze, and/ or gesture with speech. As Merrill Swain noted, “…much of the focus of SLA has been on cognition… underlying processes and sequences in language development… (Swain, 2013). However, there is compelling evidence that “…emotion has direct pedagogical implications for the language classroom…(MacIntyre & Gregersen, 2012). Emotion is not merely incidental to elearning, but a powerful influence on the perception and processing of input, speed and noting as well as on learning outcomes and motivational processes (D’Mello, 2014).

Linguistic interactions rely upon the rapid and complex interplay of a wealth of cues and signals gleaned from facial expression, gaze, gesture and speech. Unfortunately, these many cues and signals, which often characterize and regulate attentional and cognitive online interactions, are not detected and processed by the traditional, WIMP (Windows/Icons/Menus/Pointers) interfaces. Recently, the detection and processing of emotion, as well as paralinguistic and non-linguistic cues and signals has become possible with the use of “sensing” technologies, such as those which detect eye-movement and gaze and those which detect emotion via facial expression.

Sensing technologies enable real-time adaptation of content as well as the construction of “Sensitive Artificial Listeners”, virtual agents which detect a user’s affect, gaze, posture, and attentional resources. This information permits these “Sensitive Artificial Listeners” to incorporate these inferences into more appropriate listening and turn-taking behavior. The SALs are able to more naturally respond to users through backchannelling through vocalizations, head movements, glances, facial expressions. Listening is, as Tony Lynch points out, a two-way process. It is a social construct, where content and turn-taking decisions are informed by and regulated by many para-linguistic and non-linguistic features.

Therefore, interactivity is limited by these click-and-type or touch interfaces which do not detect many explicit or implicit hints, cues and signals involving speech, gesture, movement and gaze. Research indicates that, for example, young children can use gestures to convey comprehension before they can convey their comprehension through multiple choice answers or keyboard inputs (Crowder & Newman, 1993.) Applications such as those pioneered by Sesame Street allow children to interact in a two-way conversation with characters and content in a video through clapping, jumping, throwing, speaking.

New approaches to understanding interactivity and online interactions must start with the premise that interactions are not merely fragments of disembodied written or spoken reactions and responses. Although our traditional interfaces detect only intentional responses which are language-based, new technologies allow us now to detect and process the rich and complex interplay of physiologically-based cues within the stream of a learner’s intentional responses in order to acquire clues as to a learner’s attentional, cognitive and affective states.D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A. (2014). Confusion can be beneficial for learning. Learning and Instruction, 29, 153-170.

