A Linguistics-Driven Approach to Statistical Parsing for Low-Resourced Languages

A Linguistics-Driven Approach to Statistical Parsing for Low-Resourced Languages
Back

11/04/2023 by นพพร ม่วงระย้า

Document

A Linguistics-Driven Approach to Statistical Parsing for Low-Resourced Languages
Download

Metadata

Document Title

A Linguistics-Driven Approach to Statistical Parsing for Low-Resourced Languages

Author

Boonkwan P, Supnithi T

Name from Authors Collection

Affiliations

National Science & Technology Development Agency - Thailand; National Electronics & Computer Technology Center (NECTEC)

Type

Article; Proceedings Paper

Source Title

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS

ISSN

1745-1361

Year

2015

Volume

E98D

Issue

Open Access

gold

Publisher

IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

DOI

10.1587/transinf.2014DAP0024

Format

PDF

Abstract

Developing a practical and accurate statistical parser for low-resourced languages is a hard problem, because it requires large-scale treebanks, which are expensive and labor-intensive to build from scratch. Unsupervised grammar induction theoretically offers a way to overcome this hurdle by learning hidden syntactic structures from raw text automatically. The accuracy of grammar induction is still impractically low because frequent collocations of non-linguistically associable units are commonly found, resulting in dependency attachment errors. We introduce a novel approach to building a statistical parser for low-resourced languages by using language parameters as a guide for grammar induction. The intuition of this paper is: most dependency attachment errors are frequently used word orders which can be captured by a small prescribed set of linguistic constraints, while the rest of the language can be learned statistically by grammar induction. We then show that covering the most frequent grammar rules via our language parameters has a strong impact on the parsing accuracy in 12 languages.

License

Rights

Publisher

Publication Source

WOS

Back to items list

A Linguistics-Driven Approach to Statistical Parsing for Low-Resourced Languages

Document

Metadata

Document Title

Author

Name from Authors Collection

Boonkwan P.

Supnithi T.

Affiliations

Type

Source Title

ISSN

Year

Volume

Issue

Open Access

Publisher

DOI

Format

Abstract

Keyword

Industrial Classification

Knowledge Taxonomy Level 1

Knowledge Taxonomy Level 2

Knowledge Taxonomy Level 3

License

Rights

Publication Source

Continue browsing

A Linguistics-Driven Approach to Statistical Parsing for Low-Resourced Languages

Document

Metadata

Share

Document Title

Author

Name from Authors Collection

Boonkwan P.

Supnithi T.

Scopus Author ID

ORCID ID

Affiliations

Type

Source Title

ISSN

Year

Volume

Issue

Open Access

Publisher

DOI

Format

Abstract

Keyword

Industrial Classification

Knowledge Taxonomy Level 1

Knowledge Taxonomy Level 2

Knowledge Taxonomy Level 3

License

Rights

Publication Source

Continue browsing