The aim of this workshop is to bring together researchers in computer vision and machine learning to share ideas and propose solutions on how to address the many aspects of activity recognition, and present new datasets that introduces new challenges in the field. Activity recognition is one of the core problems in computer vision. Recently it has attracted the attention of many researchers in the field. It is significant to many vision related applications such as surveillance, video search, human-computer interaction, and human-human, or social, interactions. Recent advances in feature representations, modeling, and inference techniques led to a significant progress in the field.

Motivated by the rich and complex temporal, spatial, and social structure of human activities, activity recognition today features several new challenges, including modeling group activities, complex temporal reasoning, activity hierarchies, human-object interactions and human-scene interactions. These new challenges aim to answer questions regarding the semantic understanding and high-level reasoning of image and video content. At this level, other classical problems in computer vision, like object detection and tracking, not only impact, but are often intertwined with activity recognition. This inherent complexity prompts more time and thought to be spent on developing solutions to tackle auxiliary problems to the human activity recognition problem. Some of the fundamental questions that we propose are:

  • How can we model human behavior on a spatio-temporal level for both individuals and groups?
  • How can we successfully represent interactions between group activities and individual activities?
  • Can inter-individual and inter-group interactions be modeled? How would they affect human behavior and improve activity recognition?
  • How do we leverage tracks and identities to improve the performance of activity recognition?
  • What can the scene layout (indoors, street, field, etc.) tell us about the individual actions?
  • How can we combine kinematic models and object detectors to model human-object interactions?
  • How can hierarchical representations of actions (sub-actions, attributes, etc.) help improve recognition performance?
  • How do we apply logic programming and knowledge bases to recognize activities?
  • Can we model social interactions between people and groups?

Call for Papers

The workshop invites interested participants to submit papers presenting original research in computer vision, pattern recognition, human science, and behavioral modeling. Topics of interest include, but are by no means limited to:

  • Action recognition from still images or videos
  • Spatio-temporal modeling of human activities
  • Human behavioral modeling
  • Modeling human-object interactions
  • Modeling scene context for activity recognition
  • Group and inter-group activity recognition
  • Individual and group activity prediction
  • Surveillance and video analysis
  • Video search and indexing
  • Crowd analysis
  • New action recognition datasets
  • Theoretical results of application to action recognition

We also invite both application-driven and theoretical submissions from other related domains. All submissions should present work relevant to the workshop theme. Papers must be in PDF format and must not exceed 8 pages. Authors will also have the chance to submit up to 5MB of supplementary material. All submissions are subject to a double-blind review process by the program committee. Paper submissions must adhere to the same formatting and the same policies established for the main conference. Dual submissions with any other workshop or conference are not allowed.

Authors should prepare their submissions using the official ICCV author kit. Paper submissions will be handled electronically through the CMT submission portal for the workshop. Submissions are now open and the deadline has been extended to Friday the 13th! New

Important Dates

  • Paper Submission Deadline
  • Author Notification Deadline
  • Camera Ready Deadline
  • Official Workshop Date
  • September 13th, 2013 (Midnight EDT) Passed!
  • September 30th, 2013 (Midnight EDT)
  • October 7th, 2013 (Midnight EDT)
  • December 8th, 2013

Dates are tentative and subject to change.



08:45 - 09:00Opening
09:00 - 09:40Keynote 1: Greg Mori Discriminative Latent Variable Models for Human Action Recognition
09:40 - 10:00Oral 1: Norimichi Ukita (NAIST) Iterative Action and Pose Recognition using Global-and-Pose Features and Action-specific Models [PDF]
10:00 - 10:30Morning Break
10:30 - 11:10Keynote 2: Michael S. Ryoo First-Person Activity Recognition: Understanding Human Interactions from Egocentric Videos
11:10 - 11:50Keynote 3: Ashutosh Saxena Learning Grounded Object Affordances for Human Activity Anticipation
11:50 - 12:10Oral 2: Natalia Neverova (INSA-Lyon) A Multi-Scale Approach to Gesture Detection and Recognition [PDF]
12:10 - 12:30Oral 3: Ognjen Rudovic (Imperial College London) Context-sensitive Conditional Ordinal Random Fields for Facial Action Intensity Estimation [PDF]
12:30 - 13:20Lunch
13:20 - 14:00Keynote 4: Ivan Laptev Learning Actions from Auxiliary Data
14:00 - 14:40Keynote 5: Abhinav Gupta Primitives for Understanding Actions and Prediction
14:40 - 15:00Oral 4: Moin Nabi (Istituto Italiano di Tecnologia) Temporal Poselets for Collective Activity Detection and Recognition [PDF]
15:00 - 15:20Oral 5: Victor Escorcia (Universidad del Norte) Spatio-Temporal Human-Object Interactions for Action Recognition in Videos [PDF]
15:20 - 15:40Oral 6: Borislav Antic (University of Heidelberg) Less is More: Video Trimming for Action Recognition [PDF]
15:40 - 16:10Afternoon Break
16:10 - 17:00Panel Discussion



Any questions should be directed to the organizers at

Program Committee


References [bib]

  • Mohamed R. Amer and Sinisa Todorovic. "A Chains Model for Localizing Participants of Group Activities in Videos." ICCV, 2011.
  • Mohamed R. Amer, Dan Xie, Mingtian Zhao, Sinisa Todorovic, and Song-Chun Zhu. "Cost-Sensitive Top-down/Bottom-up Inference for Multiscale Activity Recognition." ECCV, 2012.
  • William Brendel, Sinisa Todorovic, and Alan Fern. "Probabilistic Event Logic for Interval-Based Event Recognition." CVPR, 2011
  • Wongun Choi and Silvio Savarese. "A Unified Framework for Multi-Target Tracking and Collective Activity Recognition." ECCV, 2012.
  • Wongun Choi, Khuram Shahid, and Silvio Savarese. "Learning Context for Collective Activity Recognition." CVPR, 2011.
  • Vincent Delaitre, Josef Sivic, and Ivan Laptev. "Learning Person-Object Interactions for Action Recognition in Still Images." NIPS, 2011.
  • Alireza Fathi, Ali Farhadi, and James M. Rehg. "Understanding Egocentric Activities." ICCV, 2011.
  • Abhinav Gupta, Aniruddha Kembhavi, and Larry S. Davis. "Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition." PAMI, 2009.
  • Abhinav Gupta, Praveen Srinivasan, Jianbo Shi, and Larry S. Davis. "Understanding Videos, Constructing Plots: Learning a Visually Grounded Storyline Model from Annotated Videos." CVPR, 2009.
  • Sameh Khamis, Vlad I. Morariu, and Larry S. Davis. "A Flow Model for Joint Action Recognition and Identity Maintenance." CVPR, 2012.
  • Sameh Khamis, Vlad I. Morariu, and Larry S. Davis. "Combining Per-Frame and Per-Track Cues for Multi-Person Action Recognition." ECCV, 2012.
  • Kris Kitani, Takahiro Okabe, Yoichi Sato, and Akihiro Sugimoto. "Fast Unsupervised Ego-Action Learning for First-Person Sports Videos." CVPR, 2011.
  • Tian Lan, Yang Wang, Weilong Yang, Stephen N. Robinovitch, and Greg Mori. "Discriminative Latent Models for Recognizing Contextual Group Activities." PAMI, 2012.
  • Tian Lan, Leonid Sigal, and Greg Mori. "Social Roles in Hierarchical Models for Human Activity Recognition." CVPR, 2012.
  • Ruonan Li, Rama Chellappa, and Shaohua Kevin Zhou. "Learning Multi-modal Densities on Discriminative Temporal Interaction Manifold for Group Activity Recognition." CVPR, 2009
  • Ruonan Li, Parker Porfilio, and Todd Zickler. "Finding Group Interactions in Social Clutter." CVPR, 2013.
  • Patrick Lucey, Alina Bialkowski, Peter Carr, Iain Matthews, and Yaser Sheikh. "Representing and Discovering Adversarial Team Behaviors using Player Roles." CVPR, 2013.
  • Marcin Marszalek, Ivan Laptev, and Cordelia Schmid. "Actions in Context." CVPR, 2009.
  • Vlad I. Morariu and Larry S. Davis. "Multi-Agent Event Recognition in Structured Scenarios." CVPR, 2011.
  • Alonso Patron-Perez, Marcin Marszalek, Ian Reid, and Andrew Zisserman. "Structured Learning of Human Interactions in TV Shows." PAMI, 2012.
  • Hamed Pirsiavash and Deva Ramanan. "Detecting Activities of Daily Living in First-Person Camera Views." CVPR, 2012.
  • Vignesh Ramananthan, Bangpeng Yao, and Li Fei-Fei. "Social Role Discovery in Human Events." CVPR, 2013.
  • Mikel Rodriguez, Josef Sivic, Ivan Laptev, and Jean-Yves Audibert. "Data-driven Crowd Analysis in Videos." ICCV, 2011.
  • Michael S. Ryoo and J.K. Aggarwal. "Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities." ICCV, 2009.
  • Michael S. Ryoo and J.K. Aggarwal. "Stochastic Representation and Recognition of High-level Group Activities." IJCV, 2011.
  • Eran Swears and Anthony Hoogs. "Learning and Recognizing Complex Multi-Agent Activities with Applications to American Football Plays." WACV, 2012.
  • Yang Wang, Duan Tran, Zicheng Liao, and David Forsyth. "Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition." JMLR, 2012.
  • Bangpeng Yao and Li Fei-Fei. "Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses." PAMI, 2012.



Computer Vision Foundation IEEE Computer Society
HTML5 Powered with CSS3 / Styling, and Semantics Creative Commons Attribution-ShareAlike 3.0 Unported License