1st Workshop on Understanding Human Activities: Context and Interactions

The aim of this workshop is to bring together researchers in computer vision and machine learning to share ideas and propose solutions on how to address the many aspects of activity recognition, and present new datasets that introduces new challenges in the field. Activity recognition is one of the core problems in computer vision. Recently it has attracted the attention of many researchers in the field. It is significant to many vision related applications such as surveillance, video search, human-computer interaction, and human-human, or social, interactions. Recent advances in feature representations, modeling, and inference techniques led to a significant progress in the field.

Motivated by the rich and complex temporal, spatial, and social structure of human activities, activity recognition today features several new challenges, including modeling group activities, complex temporal reasoning, activity hierarchies, human-object interactions and human-scene interactions. These new challenges aim to answer questions regarding the semantic understanding and high-level reasoning of image and video content. At this level, other classical problems in computer vision, like object detection and tracking, not only impact, but are often intertwined with activity recognition. This inherent complexity prompts more time and thought to be spent on developing solutions to tackle auxiliary problems to the human activity recognition problem. Some of the fundamental questions that we propose are:

How can we model human behavior on a spatio-temporal level for both individuals and groups?
How can we successfully represent interactions between group activities and individual activities?
Can inter-individual and inter-group interactions be modeled? How would they affect human behavior and improve activity recognition?
How do we leverage tracks and identities to improve the performance of activity recognition?
What can the scene layout (indoors, street, field, etc.) tell us about the individual actions?
How can we combine kinematic models and object detectors to model human-object interactions?
How can hierarchical representations of actions (sub-actions, attributes, etc.) help improve recognition performance?
How do we apply logic programming and knowledge bases to recognize activities?
Can we model social interactions between people and groups?

Call for Papers

The workshop invites interested participants to submit papers presenting original research in computer vision, pattern recognition, human science, and behavioral modeling. Topics of interest include, but are by no means limited to:

Action recognition from still images or videos
Spatio-temporal modeling of human activities
Human behavioral modeling
Modeling human-object interactions
Modeling scene context for activity recognition
Group and inter-group activity recognition
Individual and group activity prediction
Surveillance and video analysis
Video search and indexing
Crowd analysis
New action recognition datasets
Theoretical results of application to action recognition

We also invite both application-driven and theoretical submissions from other related domains. All submissions should present work relevant to the workshop theme. Papers must be in PDF format and must not exceed 8 pages. Authors will also have the chance to submit up to 5MB of supplementary material. All submissions are subject to a double-blind review process by the program committee. Paper submissions must adhere to the same formatting and the same policies established for the main conference. Dual submissions with any other workshop or conference are not allowed.

Authors should prepare their submissions using the official ICCV author kit. Paper submissions will be handled electronically through the CMT submission portal for the workshop. Submissions are now open and the deadline has been extended to Friday the 13th! New

Important Dates

Paper Submission Deadline
Author Notification Deadline
Camera Ready Deadline
Official Workshop Date

September 13th, 2013 (Midnight EDT) Passed!
September 30th, 2013 (Midnight EDT)
October 7th, 2013 (Midnight EDT)
December 8th, 2013

Dates are tentative and subject to change.

Speakers

Abhinav Gupta - Carnegie Mellon University, USA
Ivan Laptev - INRIA / Ecole Normale Superieure, France
Greg Mori - Simon Fraser University, Canada
Michael S. Ryoo - NASA / Jet Propulsion Laboratory, USA
Ashutosh Saxena - Cornell University, USA

Schedule

08:45 - 09:00	Opening
09:00 - 09:40	Keynote 1: Greg Mori	Discriminative Latent Variable Models for Human Action Recognition
09:40 - 10:00	Oral 1: Norimichi Ukita (NAIST)	Iterative Action and Pose Recognition using Global-and-Pose Features and Action-specific Models [PDF]
10:00 - 10:30	Morning Break
10:30 - 11:10	Keynote 2: Michael S. Ryoo	First-Person Activity Recognition: Understanding Human Interactions from Egocentric Videos
11:10 - 11:50	Keynote 3: Ashutosh Saxena	Learning Grounded Object Affordances for Human Activity Anticipation
11:50 - 12:10	Oral 2: Natalia Neverova (INSA-Lyon)	A Multi-Scale Approach to Gesture Detection and Recognition [PDF]
12:10 - 12:30	Oral 3: Ognjen Rudovic (Imperial College London)	Context-sensitive Conditional Ordinal Random Fields for Facial Action Intensity Estimation [PDF]
12:30 - 13:20	Lunch
13:20 - 14:00	Keynote 4: Ivan Laptev	Learning Actions from Auxiliary Data
14:00 - 14:40	Keynote 5: Abhinav Gupta	Primitives for Understanding Actions and Prediction
14:40 - 15:00	Oral 4: Moin Nabi (Istituto Italiano di Tecnologia)	Temporal Poselets for Collective Activity Detection and Recognition [PDF]
15:00 - 15:20	Oral 5: Victor Escorcia (Universidad del Norte)	Spatio-Temporal Human-Object Interactions for Action Recognition in Videos [PDF]
15:20 - 15:40	Oral 6: Borislav Antic (University of Heidelberg)	Less is More: Video Trimming for Action Recognition [PDF]
15:40 - 16:10	Afternoon Break
16:10 - 17:00	Panel Discussion

People

Organizers / Co-Chairs

Sameh Khamis - University of Maryland, USA
Mohamed R. Amer - Oregon State University, USA
Wongun Choi - NEC Laboratories, USA
Tian Lan - Stanford University, USA

Any questions should be directed to the organizers at unde...@gmail.com.

Program Committee

J.K. Aggarwal - University of Texas, Austin, USA
Dhruv Batra - Virginia Tech, USA
William Brendel - A9.com, USA
Asad Butt - Pennsylvania State University, USA
Jana Doppa - Oregon State University, USA
Alireza Fathi - Stanford University, USA
Peter Gehler - Max Planck Institute, Tubingen, Germany
Martin Hofmann - TU Munich, Germany
Jeremy Jancsary - Microsoft Research Cambridge, UK
Saad Khan - SRI International, USA
Kris Kitani - Carnegie Mellon University, USA
Christoph Lampert - IST Austria
Subhransu Maji - Toyota Technological Institute at Chicago, USA
Anton Milan - TU Darmstadt, Germany
Vlad Morariu - University of Maryland, USA
Vittorio Murino - University of Verona / Italian Institute of Technology, Italy
Ram Nevatia - University of Southern California, USA
Juan Carlos Niebles - Universidad del Norte, Colombia
Sebastian Nowozin - Microsoft Research Cambridge, UK
Devi Parikh - Virginia Tech, USA
Hamed Pirsiavash - MIT, USA
Bernt Schiele - Max Planck Institute, Saarbrucken, Germany
Behjat Siddiquie - SRI International, USA
Leonid Sigal - Disney Research Pittsburgh, USA
Min Sun - University of Washington, USA
Amir Tamrakar - SRI International, USA
Yang Wang - University of Manitoba, Canada
Zhenhua Wang - University of Adelaide, Australia
Weilong Yang - Google Research, USA
Bangpeng Yao - Stanford University, USA

Resources

References [bib]

Mohamed R. Amer and Sinisa Todorovic. "A Chains Model for Localizing Participants of Group Activities in Videos." ICCV, 2011.
Mohamed R. Amer, Dan Xie, Mingtian Zhao, Sinisa Todorovic, and Song-Chun Zhu. "Cost-Sensitive Top-down/Bottom-up Inference for Multiscale Activity Recognition." ECCV, 2012.
William Brendel, Sinisa Todorovic, and Alan Fern. "Probabilistic Event Logic for Interval-Based Event Recognition." CVPR, 2011
Wongun Choi and Silvio Savarese. "A Unified Framework for Multi-Target Tracking and Collective Activity Recognition." ECCV, 2012.
Wongun Choi, Khuram Shahid, and Silvio Savarese. "Learning Context for Collective Activity Recognition." CVPR, 2011.
Vincent Delaitre, Josef Sivic, and Ivan Laptev. "Learning Person-Object Interactions for Action Recognition in Still Images." NIPS, 2011.
Alireza Fathi, Ali Farhadi, and James M. Rehg. "Understanding Egocentric Activities." ICCV, 2011.
Abhinav Gupta, Aniruddha Kembhavi, and Larry S. Davis. "Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition." PAMI, 2009.
Abhinav Gupta, Praveen Srinivasan, Jianbo Shi, and Larry S. Davis. "Understanding Videos, Constructing Plots: Learning a Visually Grounded Storyline Model from Annotated Videos." CVPR, 2009.
Sameh Khamis, Vlad I. Morariu, and Larry S. Davis. "A Flow Model for Joint Action Recognition and Identity Maintenance." CVPR, 2012.
Sameh Khamis, Vlad I. Morariu, and Larry S. Davis. "Combining Per-Frame and Per-Track Cues for Multi-Person Action Recognition." ECCV, 2012.
Kris Kitani, Takahiro Okabe, Yoichi Sato, and Akihiro Sugimoto. "Fast Unsupervised Ego-Action Learning for First-Person Sports Videos." CVPR, 2011.
Tian Lan, Yang Wang, Weilong Yang, Stephen N. Robinovitch, and Greg Mori. "Discriminative Latent Models for Recognizing Contextual Group Activities." PAMI, 2012.
Tian Lan, Leonid Sigal, and Greg Mori. "Social Roles in Hierarchical Models for Human Activity Recognition." CVPR, 2012.
Ruonan Li, Rama Chellappa, and Shaohua Kevin Zhou. "Learning Multi-modal Densities on Discriminative Temporal Interaction Manifold for Group Activity Recognition." CVPR, 2009
Ruonan Li, Parker Porfilio, and Todd Zickler. "Finding Group Interactions in Social Clutter." CVPR, 2013.
Patrick Lucey, Alina Bialkowski, Peter Carr, Iain Matthews, and Yaser Sheikh. "Representing and Discovering Adversarial Team Behaviors using Player Roles." CVPR, 2013.
Marcin Marszalek, Ivan Laptev, and Cordelia Schmid. "Actions in Context." CVPR, 2009.
Vlad I. Morariu and Larry S. Davis. "Multi-Agent Event Recognition in Structured Scenarios." CVPR, 2011.
Alonso Patron-Perez, Marcin Marszalek, Ian Reid, and Andrew Zisserman. "Structured Learning of Human Interactions in TV Shows." PAMI, 2012.
Hamed Pirsiavash and Deva Ramanan. "Detecting Activities of Daily Living in First-Person Camera Views." CVPR, 2012.
Vignesh Ramananthan, Bangpeng Yao, and Li Fei-Fei. "Social Role Discovery in Human Events." CVPR, 2013.
Mikel Rodriguez, Josef Sivic, Ivan Laptev, and Jean-Yves Audibert. "Data-driven Crowd Analysis in Videos." ICCV, 2011.
Michael S. Ryoo and J.K. Aggarwal. "Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities." ICCV, 2009.
Michael S. Ryoo and J.K. Aggarwal. "Stochastic Representation and Recognition of High-level Group Activities." IJCV, 2011.
Eran Swears and Anthony Hoogs. "Learning and Recognizing Complex Multi-Agent Activities with Applications to American Football Plays." WACV, 2012.
Yang Wang, Duan Tran, Zicheng Liao, and David Forsyth. "Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition." JMLR, 2012.
Bangpeng Yao and Li Fei-Fei. "Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses." PAMI, 2012.

1st Workshop on

Understanding Human Activities:
Context and Interactions

(HACI 2013)

in conjunction with the International Conference on Computer Vision (ICCV) 2013