Multi-Instance dictionary learning for detecting abnormal events in surveillance videos

Jing Huo, Yang Gao, Wanqi Yang, Hujun Yin

    Research output: Contribution to journalArticlepeer-review

    115 Downloads (Pure)


    In this paper, a novel method termed Multi-Instance Dictionary Learning (MIDL) is presented for detecting abnormal events in crowded video scenes. With respect to multi-instance learning, each event (video clip) in videos is modeled as a bag containing several sub-events (local observations); while each sub-event is regarded as an instance. The MIDL jointly learns a dictionary for sparse representations of sub-events (instances) and multi-instance classifiers for classifying events into normal or abnormal. We further adopt three different multi-instance models, yielding the Max-Pooling-based MIDL (MP-MIDL), Instance-based MIDL (Inst-MIDL) and Bag-based MIDL (Bag-MIDL), for detecting both global and local abnormalities. The MP-MIDL classifies observed events by using bag features extracted via max-pooling over sparse representations. The Inst-MIDL and Bag-MIDL classify observed events by the predicted values of corresponding instances. The proposed MIDL is evaluated and compared with the state-of-the-art methods for abnormal event detection on the UMN (for global abnormalities) and the UCSD (for local abnormalities) datasets and results show that the proposed MP-MIDL and Bag-MIDL achieve either comparable or improved detection performances. The proposed MIDL method is also compared with other multi-instance learning methods on the task and superior results are obtained by the MP-MIDL scheme. © 2014 World Scientific Publishing Company.
    Original languageEnglish
    Article number1430010
    JournalInternational Journal of Neural Systems
    Issue number3
    Publication statusPublished - May 2014


    • abnormal event detection
    • dictionary learning
    • Multi-instance learning
    • sparse coding


    Dive into the research topics of 'Multi-Instance dictionary learning for detecting abnormal events in surveillance videos'. Together they form a unique fingerprint.

    Cite this