Research & Publications – Cognitive Assistive Robotics Lab

Learning Temporal Features from Video Demonstrations

Temporal features, relationships between spatial features across time, are integral to understanding many of the tasks that humans engage in daily. Unfortunately, learning temporal features directly from video is a complex problem and contemporary deep learning-based models are duration-specific making them vulnerable to variations in the duration over which activities are expressed and limiting the scale of the data they can model. I address an approach that can overcome these challenges when capturing temporal features from video of human-led demonstrations of sequential tasks. Inference using these temporal features is then leveraged for better performance on policy learning and activity recognition tasks.

Sponsor: National Science Foundation IIS 1664554

Investigator: Madison Clark-Turner

Publications:

Hussein Mostafa*, Madison Clark-Turner*, and Begum Momotaz. A Hierarchical Approach for Learning Multi-Step Sequential Tasks from Video Demonstrations (in preparation).
Madison Clark-Turner and Begum Momotaz. Understanding Temporal Relations from Video: A Pathway to Learning Sequential Tasks from Visual Demonstrations (in preparation).

Policy Learning for Sequential Tasks from Demonstrations

We are interested in learning the high-level policy of multi-step sequential (MSS) tasks, such as activities of daily living, from video demonstrations. Videos of MSS tasks are typically long in duration and exhibit large feature variance, especially when captured in non-engineered settings. Learning task policy from such videos using state-of-the-art end-to-end approaches is sample inefficient due to a reliance on pixel-level information. Understanding the unique temporal structures of MSS tasks can make policy learning easier and sample efficient. However, understanding this temporal structure requires analyzing the entire content of the video or a task which is a complex and under-explored area in the current literature. We propose a hierarchical solution to this problem where i) an automated feature selection process extrapolates temporally grounded task-relevant features from the video and then ii) a stochastic policy learning model learns a feature-constrained task policy. The proposed model is sample efficient as a result of substituting selected temporally-grounded features for pixel-level information. We demonstrate the efficacy of our proposed framework by teaching a YuMi robot a tea making task from videos and then comparing our approach’s performance to a behavioral cloning baseline.

Sponsor: National Science Foundation IIS 1830597

Investigator: Mostafa Hussein

Publications:

Mostafa Hussein, Brendan Crowe, Madison Clark-Turner, Petrik Marek, and Begum Momotaz. Robust Behavior Cloning with Adversarial Demonstration Detection , IROS 2021.
Hussein Mostafa, Brendan Crowe, Madison Clark-Turner, Petrik Marek, and Begum Momotaz. Robust Maximum Entropy Behavior cloning, NeurIPS Workshop on Robot Learning, 2020).
Hussein Mostafa, Madison Clark-Turner, and Begum Momotaz. Detecting Incorrect Visual Demonstrations for Improved Policy Learning, CoRL 2022, NZ.

Trajectory Learning from Demonstrations

This project is focused on trajectory learning from demonstration (LfD) for assistive robotics. The primary objectives of trajectory LfD are to learn motion tasks from demonstrations and exhibit robustness to spatial and temporal perturbations. Depending on the operational environment, the specific objectives of LfD algorithms may very. For rehabilitation robotics, a robot therapist must demonstrate proper exercise movements and provide feedback on a human’s exercise execution. In a workplace setting, a robot collaborator should fluently hand over objects to its human counterpart. In all applications, the LfD algorithm must provide a feasible trajectory for the robot to execute, ideally in the form of a controller. The key to success in multi-purpose trajectory LfD lies in 1) controller expressivity, which is the set of trajectories that can be reproduced, and 2) constraint enforcement, which eliminates infeasible trajectories.

Three different learning strategies have been developed, namely controller learning via regression, controller learning via optimization, and controller learning via inverse optimal control. These approaches have been published in ICRA 2019, IROS 2020, IROS 2021. Videos from each publication are shown below.

Sponsor: National Science Foundation IIS 1830597, UNH CoRE initiative 2018, 2017

Investigator: Paul Gesel

Publications:

1. Paul Gesel, Noushad Sojib and Momotaz Begum, Self-Supervised Visual Motor Skills Via Neural Radiance Fields, IROS 2023

2. Paul Gesel and Momotaz Begum. Learning Stable Dynamics via Iterative Quadratic Programming, under review, ICRA 2023

3. Paul Gesel, D. LaRoche, S. Arthanat, and M. Begum, Learning to Optimize Control Policies and Evaluate Reproduction Performance from Human Demonstration, IROS 2021

4. Paul Gesel, D. LaRoche, S. Arthanat, and M. Begum, Learning Adaptive Human Motion viaPhase Space Analysis of Demonstrated Trajectories IROS 2020

5. Paul Gesel, M. Begum, and D. LaRoche, Learning Motion Trajectories from Phase Space Analysis of the Demonstration, ICRA 2019

Care Robot for Older Adults with Dementia

This project is focused on designing an autonomous mobile robot capable of helping older adults with dementia to age in place through helping them in health-care and home safety related activities. We designed a preliminary prototype that leverages PDDL to generate a set of care plans on the fly based on real-time sensor inputs.

Investigators: This is a multidisciplinary project involving Tianyi Gu (CS), Sajay Arthanat (OT), Dain LaRoche (ES), and Dongpeng Xu (CS)

Publications:

S. Arthanat, M. Begum, T. Gu, D. P LaRoche, D. Xu, and N.Zhang, Caregiver perspectives on a smart home-based socially assistive robot for individuals with Alzheimer’s disease and related dementia, Disability and Rehabilitation: Assistive Technology, 1-10, 2020
T. Gu, M. Begum, N. Zhang, D. Xu, S. Arthanat and D. LaRoche, Adaptive Software Framework for Dementia-care Robot, Workshop on Planning and Robotics, International Conference on Automated Planning and Scheduling, 2020

Sponsor: UNH CoRE initiative 2019

Deep Q-learning of Human-Robot Interaction for Robot-mediated Behavior Intervention

This research is focused on merging and transitioning deep learning-based Learning from Demonstrations (LfD) techniques to real world human-robot interaction (HRI) applications. We have designed a deep Q network that can learn a structured sequential task from demonstration data. We have tested the network to learn a behavioral intervention where the goal is to teach children with developmental delays (such as autism spectrum disorders) how to respond to a greeting in a socially appropriate manner, i.e. a social greeting intervention. The intervention follows the principle of applied behavior analysis (ABA). The video below shows an example of this work (published in HRI 2018)

Since deep networks require a massive amount of data for training, a criteria difficult to meet by most HRI problems, we are currently investigating on designing a compact representation of long duration video data using interval algebraic relationships among events observed in the video.

Investigator: Madison Clark-Turner

Publications:

M. Turner and M. Begum, Deep Reinforcement Learning of Abstract Reasoning from Demonstration, ACM/IEEE International Conference on Human Robot Interaction, 2018
M. Turner and M. Begum, Deep Recurrent Q-Learning of Behavioral Intervention Delivery by a Robot from Demonstration Data, IEEE International Symposium on Robot and Human Interactive Communications, 2017
M. Turner and M. Begum, Learning to Deliver Robot-Mediated Behavioral Intervention, Workshop on Human-centered Robotics, Robotics Science and Systems 2017

Sponsor: National Science Foundation IIS 1664554

Reward Function Learning from Demonstrations of Sequential Tasks

One way to learn long duration sequential tasks from demonstrations is to learn the reward function of the demonstrator. The goal of this project is to learn the reward function in partially observable environments. Learning the reward function of a Partially Observable Markov Decision Process (POMDP) is not a well understood problem. Alternatively, we propose to reduce the underlying POMDP to an MDP and and extract the reward function using an efficient MDP-IRL algorithm. Our extensive experiments suggest that the reward function learned this way generates POMDP policies that mimic the policies of the demonstrator well. We tested this approach to learn the reward function of a mock therapist as s/he delivers an ABA-style social greeting intervention. The video below shows a robot delivering different steps of the intervention learned through learning the reward structures from a set of demonstrations.

Investigator: Mostafa Hussein

Publications:

Mostafa Hussein, Momotaz Begum, Marek Petrik, Inverse Reinforcement Learning of Interaction Dynamics from Demonstrations, ICRA 2019.

Sponsor: National Science Foundation IIS 1830597

Activity Recognition using Interval Temporal Relations among Shallow Features

This project investigates the possibility of replacing pixel-based representation of state space in deep learning with a novel interval temporal relations-based representation among observed events in the video. We are currently conducting rigorous testing of the performance of Interval Attribute Descriptors (IAD) generated by 3D Convolutional Neural Networks as a method to represent the interval temporal relationships of human activities in publicly available video data-sets such as UCF101.

Investigator: Jordan Chadwick

Temporal Context Graph for Policy Selection in Sequential Tasks

Complex everyday activities often possess rich temporal patterns, along with spatial features, that can be exploited to learn such tasks from a limited number of demonstrations. This projects develops a temporal context graph (TCG) based on Allen’s interval temporal Algebra that helps to select the correct policy based on the learned temporal structure of a task. TCG is a capable of dealing with tasks with cyclical atomic actions and consisting of sequential and parallel temporal relations. The video below shows the performance of TCG in learning two sequential tasks from demonstrations.

Investigator: Estuardo Carpio, Madison Clark-Turner

Publications:

Estuardo Carpio, Madison Clark-Turner, Paul Gesel, and M. Begum, Leveraging Temporal Reasoning for Policy Selection in Learning from Demonstration, ICRA 2019

Estuardo Carpio, Madison Clark-Turner, Paul Gesel, and M. Begum, Learning Sequential Human-Robot Interaction Tasks from Demonstrations: The Role of Temporal Reasoning, RoMan 2019

Sponsor: National Science Foundation IIS 1664554, UNH CoRE initiative

Exercise Sequence Learning from Demonstrations

Publications:

A. Lydakis, Y. Meng, C. Munroe, Yi-Ning Wu, M. Begum, A Learning-based Agent for Home Neurorehabilitation, IEEE International Conference or Rehabilitation Robotics, ICORR 2017, Pubmed ID: 28813990
A. Lydakis, Pei-Chun Kao, and M. Begum, Irregular Gait Detection using Wearable Sensors, PErvasive Technologies Related to Assistive Environments (PETRA) 2017
Y. Meng, C. Munroe, Y. Wu, and M. Begum, Learning from Demonstration Framework to Promote Home-based Neuromotor Rehabilitation, IEEE International Symposium on Robot and Human Interactive Communication, RoMAN 2016
C. Munroe, Y. Meng, H. Yanco, and M. Begum, Augmented Reality Eyeglasses for Promoting Home-Based Rehabilitation for
Children with Cerebral Palsy, ACM/IEEE International Conference on Human-Robot Interaction, HRI 2016 (video submission)

Publications from Momotaz’s Previous Research

Publications