影响支持向量机模型语步自动识别效果的因素研究

Translated title of the contribution: Factors Affecting Rhetorical Move Recognition with SVM Model
  • Liangping Ding
  • , Zhixiong Zhang*
  • , Huan Liu
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Objective: The paper explores the influence of sample size, the N value of N-gram, stop words, and weighting methods of word frequency on the automatic recognition of rhetorical moves in scientific paper, aiming to improve the abstracting method based on support vector machine (SVM) model. 

Methods: We retrieved a total of 1.1 million labeled moves from 720, 000 structured abstracts of scientific papers as experimental data, and constructed SVM model for move recognition. Based on the principle of single variable, we used control variable method by changing the sample size, the N value, removal of stop words, and word frequency weighting methods to analyze their impacts on the model's performance. 

Results: We found that the model yielded the best result with a sample size of 600, 000 abstracts, the N value [1, 2], keeping stop words, and using TF-IDF to weight word frequency. 

Limitations: We only examined the model with structured abstracts, which might not be comparable with other studies. 

Conclusions: The sample size and some fine features have significant impacts on the performance of traditional machine learning models.

Translated title of the contributionFactors Affecting Rhetorical Move Recognition with SVM Model
Original languageChinese (Traditional)
Pages (from-to)16-23
Number of pages8
JournalData Analysis and Knowledge Discovery
Volume3
Issue number11
DOIs
Publication statusPublished - Nov 2019

Keywords

  • move Recognition
  • structured abstracts
  • support vector machine

Research Beacons, Institutes and Platforms

  • Manchester Institute of Innovation Research

Fingerprint

Dive into the research topics of 'Factors Affecting Rhetorical Move Recognition with SVM Model'. Together they form a unique fingerprint.

Cite this