博士期间主要工作：

1、完成了动作的多特征量化表示与基于骨架运动信息的多尺度动作识别模型研究。主要贡献有：（1）提出了一种“行为动作视频→运动图谱/矩阵”的通用降维方法，实现了人体运动信息的紧凑型存储。解决了：经典行为识别方法无法有效处理运动轨迹在时空领域发生重叠的问题，以及视频信息输入参数过大的问题；（2）针对人体行为动作特点，提出了一系列基于骨架运动信息的样本增强策略，解决了动作样本因采集、标定成本高，样本稀少，容易导致深度神经网络过拟合的问题；（3）提出了一种基于人体骨架运动信息的尺度归一化策略，解决了经典“裁剪和缩放”策略易导致运动信息丢失的问题。

2、完成了基于深度卷积神经网络的多尺度动作识别与动作在复杂行为视频中的时序定位研究。主要贡献有：（1）提出了一种基于“尺度金字塔层”和“全局平均层”组合的多尺度动作识别深度神经网络模型。使模型在训练过程中，可从不同时间尺度的动作样本学习特征，有效解决了目标动作在识别过程中，因尺度对齐导致的信息丢失问题；（2）提出了一种基于大尺度时间窗口优先的探进定位策略，提升了模型对细微动作的发现能力，提高了动作时序定位效率。

3、完成了小样本甚至是零样本情况下，复杂行为的识别研究。主要贡献有：（1）提出了一种通用的复杂行为离散化定义方式，即：通过语义层定义，以低维度表达方式（文字）对高维度内容信息（视频）进行描述，解决了人体行为复杂多样，无法枚举的问题；（2）基于语义层定义，通过词向量与运动特征的组合，提出了一种连续行为的通用量化表达方式。实现了卷积神经网络在零样本情况下，从语义层定义习得复杂行为特征的可能，并最终实现了零样本情况下，人体复杂行为的高效识别。

The main research works：

1．Put forward a multi-features representation for human actions and a skeleton-based multi-scale deep neural network for human action recognition. The contributions of this work are three-fold: 1) Proposed a universal information dimensionality reduction method for transforming human action video to motion image or motion matrix. By using this method, the motion trajectory spatio-temporal overlaps problem can be effectively solved. Furthermore, the number of input parameters is dramatically reduced because of the dense storing strategy of the motion image/motion matrix. 2) Human action samples are difficult to be collected and labelled because of the unaffordable cost. This paper for the characteristics of human actions, proposed a series of data augmentation strategies to solve overfitting problem which is caused by insufficient training sample. 3) Put forward a human action spatio-temporal scale unifying strategy which is based on skeleton motion information. By using this strategy, the motion information loss problem caused by classic “cropping and scaling” can be fundamentally solved.

2．Finished the research work of designing a deep neural network for multi-scale human action recognition and temporal action localization in complex human behavior. The contributions of this work are two-fold. 1) Put forward a universal method to update traditional convolutional neural network to multi-scale network by using global average layer and spatial pyramid pooling layer. The upgraded convolutional neural network can learn features from the different scales of action samples, and can avoid motion information loss problem caused by the pre-processing work for sample scale. 2) Put forward a novel temporal action localization strategy which is so called as large-scale-first temporal detection (LSF-TD) strategy. This strategy is helpful for temporal action localization models to detect micro actions and improve the model efficiencies.

3. Finished the research work of recognizing the complex work in the situations of few samples or even zero samples. The contributions of this work are two-fold. 1) Put forward a common method which follows a discretization thought for the complex human behavior definition. The method defines the human behaviors from the semantic level, uses the human nature language (low dimension) to describe the content of human behavior videos. 2) In the view of semantic level, put forward a universal quantification representation for action and action feature combinations. Realize extracting the features of complex human behaviors in the situations of few or even zero samples. Thus, achieve the final goal of our papers—effectively recognizing the complex human behaviors with zero sample.