AbstractTemporal action localisation (TAL) has garnered significant attention due to its potential applications across various fields. The primary challenges in this domain involve detecting the start/end times of actions and recognising the action themselves. There are two primary research approaches to address these issues: treating them as separate problems or attempting to solve them simultaneously. In this thesis, the focus will be on the first component of the former approach, which is action proposal generation. This has been chosen because the results can be applied more broadly by not considering the recognition aspect, which is specific to action classes. The primary challenge in action proposal generation is detecting the start and end of actions without labels. To address this issue, it was done in three steps: (1) identifying factors and challenges affecting TAL performance, (2) implementing anchor-free boundary detection, and (3) conducting a performance comparison. Five possible scenarios were considered to examine factors affecting performance and challenging cases, and the performance was compared using existing algorithms. First, the relationship was analysed between boundary detection and action recognition. In pipeline methods, the outcomes of previous steps influence those of subsequent steps. Consequently, how recognition results vary according to proposal generation outcomes was investigated. Second, to determine the effect of class labels on performance, existing end-to-end methods were modified by transforming multi-label classification into a binary one. This facilitated the evaluation of the influence of class labels on performance. Third, in video clips, actions can sometimes be presented discontinuously due to editing or abrupt viewpoint changes. How these instances affect performance was examined. Fourth, the effects of two commonly used types of data (RGB and Flow) in the literature were studied to understand their influence on performance. Finally, to assess generalisation, cross-corpora tests were conducted to evaluate how a model trained on one dataset applied to another dataset. One key challenge in detecting actions is the unknown and variable length of actions, even within the same category, as it can differ from person to person. To address this issue, some existing methods have adopted predefined temporal spans called âanchorsâ. However, these methods necessitate repetitive operations at the same location and often result in inaccurate boundaries. To overcome these limitations, an anchor-free method was employed to directly predict the length of the action. This approach is less sensitive to boundary inaccuracies caused by predefined lengths and is more efficient due to the elimination of repetitive operations. However, direct prediction relies on partial information, which may lead to inaccurate boundaries. Therefore, additional refinement is implemented to improve boundary accuracy. Finally, the performance of the proposed method was compared with existing algorithms in challenging situations using the identified factors. In summary, factors influencing action detection were examined and an anchor-free action proposal generation method incorporating boundary refinement was proposed. Then, the performance of the proposed method was analysed in various settings, demonstrating its effectiveness and potential for addressing the challenges associated with action detection.
|Date of Award||31 Dec 2023|
|Supervisor||Xiaojun Zeng (Supervisor) & Ke Chen (Supervisor)|
- Boundary Refinement
- Temporal Action Proposal Generation
- Temporal Action Localisation
- Temporal Action Detection