Cross-disciplinary Research
on Engaging Advanced Technology
for Education

CIRCL Workshop on Robots, Children & Alternative Input Methods


This two-day workshop on Robots, Young Children, & Alternative Input Methods, was held at the Northern Illinois University. The workshop brought together a pool of the leading investigators interested in interaction design and technology development in this topic. The work goals included (i) to explore the current status of research on child/robot interaction, (ii) to discuss theoretical and technical aspects that can support the research, and (iii) to explore the potential for future research in the area from the social, emotional, cognitive, and educational perspectives. The goals required the assembly of distinguished researchers from a diverse array of interrelated fields, including learning sciences, computer science, engineering, psychology, and others. The participants were selected as those who understand the necessity of cross-disciplinary collaboration for inquiring into the development of a socio-technical partnerships between an embodied, humanoid robot and young children in ways that promote children's intellectual, affective and social development.

During the workshop, the researchers shared their respective expertise and on-going research to inform each other and discuss core research questions in designing and evaluating the efficacy of the child/robot collaborative system:

  • What are the current statuses of research and development efforts in child/robot interaction?
  • What are theoretical perspectives that may guide research on developing child/robot collaborative systems?
  • What are important research issues in engineering assistance for child development with a robot?
  • What technologies are available to design child/robot interaction and collect data to assess the efficacy?
  • What are the challenges and opportunities in developing such technologies and research programs?
  • In what way are the research issues aligned with the NSF goal of Broadening Participation in STEM education and STEM workforce (particularly, the NSF initiative INCLUDES, Human Technology Partnership)?


Yanghee Kim, Northern Illinois University; Vinci Daro, Stanford University; Insook Han, Temple University; Lixiao Huang, Duke University; Laura Johnson, Northern Illinois University; Xiaojun Qi, Utah State University; Ying Xie, Northern Illinois University; Jiyoon Yoon, University of Texas at Arlington

Session 1: Sociable, educational robots & learning sciences:

Cynthia Breazeal

Title: Social Robots as Personalized Learning Companions for Early Literacy

Summary: Dr. Cynthia Breazeal (Associate Professor at MIT Media Lab) presented social robots as personalized learning companions for early literacy. As AI becomes increasingly essential, Dr. Breazeal’s team strives to design humanistic AI-enabled technologies (social robots) that empower all of us to become the people and society we aspire to be as part of daily life. She advocated that AI-enabled educational technologies should engage and support learners holistically, including emotion, cognition, social interactions, and physical interactions. Learning with personalized peer-like social robots provides friendly companionship, collaboration, interpersonal interaction and social cues, perspective-taking, social modeling, and emotional engagement. When including parents in the child-robot interactions, the robot fosters socially inclusive interaction with others, facilitates adult participation and scaffolding, and provides important opportunities for parent engagement at home. Some preliminary findings include the following results: (1) children express more joy, attention, and relational touch toward robots as pediatric companions than avatar and plush companions for in-patient care; (2) children retain more similar phrases and words, as well as longer storylines, with expressive robots than with flat robots; (3) children self-report having higher growth mindset after interacting with growth mindset robot; (4) children are more emotionally expressive with robots than with tablets; and (5) emotional data improves prediction of word-reading skill. So far, their team has a dataset for 218 hours of time-synchronized multi-modal interaction, including interaction data, surveys, sensor data, and assessments. Dr. Breazeal plans to continue with research and to design personalized learning companions for early literacy.

Watch Video  

Yanghee Kim

Title: Exploring the Educational Affordances of Embodied Robots

Summary: Dr. Yanghee Kim (Professor at Northern Illinois University) presented her research on educational affordance of embodied robots. Project DEAR aims to design an engaging and affable robot for 3-7-year-old ESL children to improve their early English literacy skills. Project IDEAL is an inclusive design that mediates children’s collaboration. Based on the literature, Dr. Kim’s team developed a theoretical framework for crafting interactions between robot and children. Their goals of robot design are three-fold. They wanted the robot to (1) consistently invite children into conversations, (2) provide opportunities for children to speak and engage in activities in either their native, second, or developing language, and (3) always demonstrate empathy with children. To do this job, they used human mediator sessions to develop scripts for robot mediation. They used the Wizard of Oz method, explicit and repeated invitation, opportunities for children to explore, and limited interaction sessions up to 20 minutes. Her team found the following findings: (1) children developed affectionate relationships and were very engaged with the robot, (2) children interacted with the robot like they would with a friend, (3) children were very forgiving of the robot’s mistakes, and (4) children gradually learned to work with their peers, taking turns and listening. Dr. Kim also introduced more evidence of an affectionate relationship with Skusie, including physical (touching and hugging), words, and unwillingness to leave. The goal of the project was to design the robot to take over the mediation role from humans. She plans to integrate curricular content and artifacts into the child-robot interactions. To evaluate changes in children’s engagement and collaboration skills, she needs computational models to design and assess children’s behaviors.

Watch Video  

Lixiao Huang

Title: The Design of Child-Robot Interaction: More Examples

Summary: Dr. Lixiao Huang (Postdoctoral Fellow at Duke University) presented more examples from the US and internationally to complement Cynthia and Yanghee’s research areas on ESL and emotional support. Dr. Huang showed one example of applying a social-educational robot in treating autism, and pointed out the problem of evaluating the effect of child-robot interaction in an isolated environment without testing it in a human-human interaction later. She also showed an example of applying a social-educational robot in treating stuttering, in which the human tutor played an enormous role in the facilitating process. In this stuttering treatment case, the learning activities and tutor’s facilitation becomes more important than the robot itself. Dr. Huang used the examples to illustrate the importance of including other humans in the children’s interactions with robots for three reasons. First, even the state-of-art robots are not robust enough to meet children’s varied needs. Second, humans have the basic psychological need for relatedness to other humans that robots cannot replace. Third, the goal of child-robot research is to increase children’s adaptability in the real world, so adding humans in the loop is ultimately good for children’s well-being. Based on the issues existing in the examples, Dr. Huang raised three big questions all researchers need to consider in all child-robot interaction research, including 1) what specific areas (i.e., storytelling, moral training, emotional support in health care, playmate interaction, etc.) do children need help? 2) How do we design both robots and activities to include humans in the loop? 3) How do we measure the results effectively considering its transfer in real-life scenarios? Dr. Huang concluded with a big picture of various human-robot interaction applications in a lifespan (from infant to older adults; from personal usage to public use) that will all need to be informed by these three questions. She suggested human factors’ methods such as task analysis to identify the critical issues in each application, and improve the design of child-robot interaction.

Watch Video  

Session 2: Child development and learning theories

Insook Han

Title: Embodied Cognition

Summary: Dr. Insook Han (Assistant Professor at Temple University) presented embodied cognition. Embodied cognition is a relatively new theoretical perspective to emphasize perceptual and physical experiences in human learning. Compared to the traditional view of considering knowledge or conception as purely mental and independent of our abilities to perceive, cognitive linguists and cognitive psychologists have tried to examine human cognition from this new perspective. Based on embodied cognition, many educational interventions have been designed and implemented that involve multisensory experiences, perceptual/bodily interaction with physical worlds. Embodied cognition research in educational contexts has mainly focused on math, science and language learning as well as many adults and elementary students. Considering that young learners' learning is inherently perceptual and multi-modal, there is room for researching with younger learners with the embodied cognition framework. A social robot can be a good tool to facilitate young learner's physical interaction as well as to capture their movement, gestures, and verbal exchanges.

Watch Video  

Ying Xie & Kyung Kim

Title: Group Cognition

Summary: Dr. Ying Xie and Dr. Kyung Kim (Assistant Professors at Northern Illinois University) presented group cognition. Gerry Stahl defined group cognition as the study of group interaction. After discussing the three levels of group cognition and why this topic is important to the community, they introduced their cutting-edge text-based dialogue analysis tool, Graphical Interface of Knowledge Structure (GIKS), that can visualize written dialogue into network graphs. They showed how this tool can be used for exploring the interaction between robot and children. Previous analysis of group cognition mainly used college students and adults’ written text. The database of children’s writing in formal and informal settings is still lacking.

Watch Video  

Vinci Daro

Title: Social Emotional Learning

Summary: Dr. Vinci Daro (Researcher at Stanford University and Director of STEM learning at Envision Learning Partners) presented “social emotional learning” as a lens for exploring ways that social robots might be used in research and in interventions to support children in school settings. Dr. Daro presented a brief rationale for focusing on social emotional learning (SEL), and then provided a brief overview of the constructs that have become dominant in the field of SEL research and development. She identified three challenges in conducting SEL research for which social robots may be relevant: a tendency in research designs (and intervention designs) to isolate the instruction or assessment of SEL skills from academic content instruction; the difficulty of measuring SEL competencies, and growth in these competencies; the significant role of teacher and researcher biases in interactions with students from different backgrounds, including linguistically diverse students. The rest of Dr. Daro’s presentation described the general outlines of how SEL frameworks might help situate research and interventions involving social robots, with a focus on students’ identity development as learners, language development for multilingual students, and the equity considerations important in these lines of research.

Watch Video  

Jiyoon Yoon

Title: Early Science Learning

Summary: Dr. Jiyoon Yoon (Associate Professor at University of Texas Arlington) presented early science learning and emphasized that children need to improve their abilities to “DO” science to enhance children’s acquisition of scientific concepts and facts. There are three approaches for doing science: Developmentally Appropriate Practice (DAP), 5E Instructional Model, and Questioning.

Watch Video  

Session 3: Visual processing and embodiment:

Xiaojun Qi

Title: Face Tracking and Emotion Recognition in Robot/Children Interaction

Summary: Dr. Xiaojun Qi (Professor at Utah State University) presented a talk to explain the current techniques for tracking faces and recognizing emotion in robot and children interaction. She explained two tracking methods that have been developed by her Computer Vision Research Laboratory. The first method is called structured multi-task multi-view sparse tracker, which casts face tracking as a sparse approximation problem in a particle filter framework to track one face. The second method is called multi-Bernoulli filtering technique, which applies the random finite set multi-target multi-Bernoulli filter to detect and track multiple faces simultaneously and without explicit detection. She further presented a deep neural network (e.g., custom version of the VGG13 network), which is trained on the facial expression recognition (FER+) database, to recognize two facial expressions (i.e., happiness and neutral) of the kids in robot and child interaction. She finally concluded her talk by presenting several challenges in tracking and emotion recognition.

Watch Video  

Aaron Kline

Title: Autism Glass Project: Expression Recognition Glasses for Autism Therapy

Summary: Dr. Aaron Kline (Professor at Stanford University) presented a system developed by his research group that seamlessly integrates sensors, real-time social cues, and feedback in behavioral therapy. He described the approaches his research group uses to help reinforce emotional awareness for children with autism, including face tracking and emotion recognition. He also emphasized the importance of including learners directly in the design process to improve their engagement in the learning experience. Finally, he mentioned the same or similar technologies could be leveraged to help measure engagement during child–robot interaction.

Watch Video  

Karthik Ramani

Title: Children-Robot Interaction (CRI) for Engaged Learning through Design and Making

Summary: Dr. Karthik Ramani (Professor at Purdue University) presented a talk to explain the latest in Ziro, which is a prototype of design-build-play robots for kids to motivate them in STEM learning. He demonstrated that kids can learn through design and making. Ziro has some vision components and is integrated with Amazon Echo (has voice integration) to do a variety of tasks. Finally, he explained the motion flow system for gesture recognition, and concluded that multimodal sensing, particularly human emotions, can allow new forms of AI-based interactions.

Watch Video  

Session 4: Speech and dialogue

Chad Dorsey

Title: Speech Technologies: Overview, Possibilities and Barriers for Learning Sciences Research

Summary: Dr. Chad Dorsey (President and CEO at the Concord Consortium) presented an overview, possibilities, and barriers for learning sciences research. Speech technologies are important for learning science research, and it can be broken down into the figure below. Audio data collection uses microphones, LENA device, Microphone arrays, and beamforming. When it comes to preprocessing, speaker diarization technology can help identify and separate individual speakers in a single audio track, while speech activity detection separates speech from the acoustic background. When it comes to speech signal analysis, word counting, turn counts, sharing, nonverbal sentiment detection, social signals detection (laughter, filled vs. unfilled pauses, and overlapped speech), and stress detection are all helpful. When it comes to lexical analysis, automatic speech recognition and keyword spotting are the two main techniques. Then use fusion method to sum the parts. These analysis methods apply to education research areas such as collaboration, argumentation and reasoning, teacher questioning, facilitation, and classroom ecology, student motivation and engagement. The method provides feedback for teaching, enables insider data collection, auto-extraction of argumentation instances, and enables longitudinal analysis of learning at scale. Finally, a few barriers include the following: error rates are still high for automated speech recognition of speech signals; youth speech development is an issue; linguistic variation is highly significant, both from youth to adults and across ages; naturalistic speech patterns are unexplored; naturalistic acoustic environments are challenging; few datasets of child speech exist.

Watch Video  

Abeer Alwan

Title: Recognizing Children’s Speech

Summary: Dr. Abeer Alwan (Professor at University of California-Los Angeles) presented speech processing techniques. Dr. Alwan argued that the variability in the way humans produce speech due to, for example, gender, accent, age, and emotion necessitates data-driven approaches to capture significant trends and behavior in the data. The same variability, however, may not be modeled adequately by such systems, especially if data are limited and corrupted by noise. Challenges in automatic speech recognition of children's’ speech include the following aspects: (1) lack of large databases of children’s speech Significant intra- and inter-speaker variability, (2) significant variability in pronunciations due to different linguistic backgrounds, and misarticulations, (3) low signal-to-noise ratio in the classroom, and (4) distinguishing reading errors from pronunciation differences. Due to these challenges, the processing of kid’s speech has significantly higher error rates than that of adult speech. Dr. Alwan developed pronunciation modeling and hypotheses to deal with children’s speech processing. Some preliminary findings showed characteristics of vowels and constants. Dr. Alwan identified the need for an "expert human" (Golden Standard) in the loop for designing and evaluating the system, as well as the importance of professional development and attention to ESL. Dr. Alwan plans to collect more data of both reading, story retelling, and other tasks. Dr. Alwan is interested in looking into shared datasets and also plans to measure in a longitudinal fashion how kid’s speech changes with age, especially at the early ages (5-8). Dr. Alwan will also look into challenging cases, such as Autism and ESL learners, which might help develop better systems for all (the Edison example).

Watch Video  

Ajith Alexander

Title: Out of the Mouths of Babes: Speaker Diarization & Recognition in Children

Summary: Dr. Ajith Alexander (President and CEO at the Oxford Wave Research U.S.) presented speech diarization. Dr. Alexander’s team specializes in research and bespoke product development, speaker recognition, speaker diarization, mobile audio and voice analysis, media similarity and time-synchronisation, and forensic audio enhancement. He performed an analysis of sample audio from Dr. Yanghee Kim’s child-robot interaction classroom recordings and was able to diarize the robot, facilitator, and two children. The major takeaways include the following aspects: (1) diarization works well for two speakers or one on one interactions of a child with a robot; (2) accuracy on multiple speakers tends to be low; (3) strong results distinguishing between “children’s voices” versus robot and facilitator voices; (4) separation across each of the children yields weaker results; (5) ambient noise, including hum from robot movement, in a classroom environment poses challenges; (6) benefit of gender-based separation is weak when children are subjects; and (7) corpus of data is limited, diarization of children’s speech as a field is largely nascent. Based on the analysis, Dr. Alexander proposed recommendations on future data collection for best diarization results: (a) constrain the recording environment to as few speakers as possible, (b) mic children individually during data collection phase to reduce post activity processing because post-processing is a harder problem, (c) use a room microphone to pick up background noises so that it can be subtracted from speech, (d) get children to speak longer phrases at least a few times in a recording versus getting yes/no answers because short clusters are harder to work with.

Marilyn Walker

Title: Conversational Agents for Children

Summary: Dr. Walker (Professor at University of California-Santa Cruz)introduced her team’s work on open domain dialog with Slugbot as part of the Amazon Alexa Prize Challenge. There are many challenges with building dialog agents for children still to be addressed; these include personalization, scaling conversational interaction, adapting to new domains, and multi-domain and multi-modal dialog systems. Because personalization is impossible with scripted behaviors, the goal is to develop technology that supports dialog interaction around any content, and to handle content that is narratively structured or expository, in addition to the most typical case of content from a structured database. They are also doing work on producing expressive linguistic and nonverbal behaviors. Currently, the NLDS lab at UCSC resources that can be useful for interacting with children around the content of a story, such as the DramaBank corpus of Story Intention Graphs (SIG) for Aesop’s Fables, a corpus of gesture annotated stories, software for expressive personality generation of both verbal and nonverbal behaviors, and software for converting monologic story tellings to first person dialogic tellings of stories. They plan to specifically work on representations and dialogue management strategies for conversation with children using a robot in the context of the exhibits at UCSC's Seymour Marine Center, controlling the nonverbal behaviors of the robot to demonstrate personality, and make the robot engaging to the children. They also are working on an NSF Cyberlearning Grant for literacy that involves an animated agent interacting with children to improve their narrative comprehension and social language skills, with an initial focus on stories such as Aesop's Fables, but possibly expanding to other children's stories.

Watch Video  

Tony Zhao

Title: Collecting Children’s Speech Using Dialport

Summary: Tony Zhao (Doctoral Researcher at Carnegie Mallon University)introduced DialPort, a dialog system that produces natural dialog in prescribed subjects, such as weather, restaurants. They propose that one way to collect data of interacting with children would be to involve their mothers. In the past, this clever solution provided good quality audio and facilitated IRB approval. Maxine Eskenazi at CMU who is the PI on DialPort would be interested in collaborating on an extended version of DialPort that could help with the paucity of databases of children’s speech. They would like to discuss this possibility further as an extension to their current NSF CRI grant in collaboration with others at the workshop.

Watch Video  

Session 5: Ethnographic observations

Laura Ruth Johnson

Title: Ethnographic Observations

Summary: Dr. Laura Johnson (Associate Professor at Northern Illinois University), provided a brief presentation on conducting qualitative ethnographic observations. She began with an overview of ethnographic fieldwork and a discussion of the levels of participation a researcher might take on within fieldwork. She also outlined different types of notes a researcher might utilize within fieldwork and how these can be used to record descriptive and reflective information and evidence about the setting and participants. In particular, she highlighted Spradley’s matrix for descriptive observations and how these help researchers pay attention to many elements within observations, including spaces, actors, activities, objects, goals, time, events, and feelings. Dr. Johnson also discussed the use of particular theoretical and methodological approaches to observing children engaged in communication, such as Corsaro’s (2012) work on interpretive reproduction and peer cultures, and the work of Hymes (1974) in the ethnography of communication. Questions after this presentation focused on how to ensure that observations are reliable and consistent across researchers/observers. Dr. Johnson emphasized the importance of researchers acknowledging their distinct lenses and perspectives, as informed by their disciplines and theoretical positions, and how these diverse perspectives can actually strengthen and enhance findings, providing more nuanced and complex explanations of particular phenomena and practices. Some researchers might also make use of procedures and processes, such as calculating inter-rater reliability, which helps a team achieve a degree of consensus regarding observations.

Watch Video