Human action recognition using fusion of modern deep convolutional and recurrent neural networks

EasyChair Preprint 336

5 pages•Date: July 10, 2018

Abstract

This paper studies the application of modern deep convolutional and recurrent neural networks to video classification, specifically human action recognition. Multi-stream architecture, which uses the ideas of representation learning to extract embeddings of multimodal features, is proposed. It is based on 2D convolutional and recurrent neural networks, and the fusion model receives a video embedding as input. Thus, the classification is performed based on this compact representation of spatial, temporal and audio information. The proposed architecture achieves 93.1% accuracy on UCF101, which is better than the results obtained with the models that have a similar architecture, and also produces representations which can be used by other models as features; anomaly detection using autoencoders is proposed as an example of this.

Keyphrases: Convolutional Neural Networks, Human action recognition, Recurrent Neural Networks, Representation Learning, Video Classification

Links:	https://easychair.org/publications/preprint/4ncl
	https://doi.org/10.29007/wj5t

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:336,
  author    = {Dmytro Tkachenko},
  title     = {Human action recognition using fusion of modern deep convolutional and recurrent neural networks},
  doi       = {10.29007/wj5t},
  howpublished = {EasyChair Preprint 336},
  year      = {EasyChair, 2018}}

Download PDF Open PDF in browser