TY - GEN
T1 - HMDB
T2 - 2011 IEEE International Conference on Computer Vision, ICCV 2011
AU - Kuehne, H.
AU - Jhuang, H.
AU - Garrote, E.
AU - Poggio, T.
AU - Serre, T.
PY - 2011
Y1 - 2011
N2 - With nearly one billion online videos viewed everyday, an emerging new frontier in computer vision research is recognition and search in video. While much effort has been devoted to the collection and annotation of large scalable static image datasets containing thousands of image categories, human action datasets lag far behind. Current action recognition databases contain on the order of ten different action categories collected under fairly controlled conditions. State-of-the-art performance on these datasets is now near ceiling and thus there is a need for the design and creation of new benchmarks. To address this issue we collected the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube. We use this database to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions such as camera motion, viewpoint, video quality and occlusion.
AB - With nearly one billion online videos viewed everyday, an emerging new frontier in computer vision research is recognition and search in video. While much effort has been devoted to the collection and annotation of large scalable static image datasets containing thousands of image categories, human action datasets lag far behind. Current action recognition databases contain on the order of ten different action categories collected under fairly controlled conditions. State-of-the-art performance on these datasets is now near ceiling and thus there is a need for the design and creation of new benchmarks. To address this issue we collected the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube. We use this database to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions such as camera motion, viewpoint, video quality and occlusion.
UR - https://www.scopus.com/pages/publications/84856682691
U2 - 10.1109/ICCV.2011.6126543
DO - 10.1109/ICCV.2011.6126543
M3 - Conference contribution
AN - SCOPUS:84856682691
SN - 9781457711015
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 2556
EP - 2563
BT - 2011 International Conference on Computer Vision, ICCV 2011
Y2 - 6 November 2011 through 13 November 2011
ER -