Scaling phrase-based statistical machine translation to larger corpora and longer phrases

Chris Callison-Burch, Colin Bannard, Josh Schroeder

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper we describe a novel data structure for phrase-based statistical machine translation which allows for the retrieval of arbitrarily long phrases while simultaneously using less memory than is required by current decoder implementations. We detail the computational complexity and average retrieval times for looking up phrase translations in our suffix array-based data structure. We show how sampling can be used to reduce the retrieval time by orders of magnitude with no loss in translation quality.

Original languageEnglish
Title of host publicationACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics
Pages255-262
Number of pages8
ISBN (Print)1932432515, 9781932432510
DOIs
Publication statusPublished - 2005
Event43rd Annual Meeting of the Association for Computational Linguistics, ACL-05 - Ann Arbor, MI, United States
Duration: 25 Jun 200530 Jun 2005

Publication series

NameACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

Conference

Conference43rd Annual Meeting of the Association for Computational Linguistics, ACL-05
Country/TerritoryUnited States
CityAnn Arbor, MI
Period25/06/0530/06/05

Fingerprint

Dive into the research topics of 'Scaling phrase-based statistical machine translation to larger corpora and longer phrases'. Together they form a unique fingerprint.

Cite this