Research UMN

The goal of dependency parsing is to seek a functional relationship among words. For instance, it tells the subject-object relation in a sentence. Parsing the Indonesian language requires information about the morphology of a word. Indonesian grammar relies heavily on affixation to combine root words with affixes to form another word. Thus, morphology information should be incorporated. Fortunately, it can be encoded implicitly by word representation. Embeddings from Language Models (ELMo) is a word representation which be able to capture morphology information. Unlike most widely used word representations such as word2vec or Global Vectors (GloVe), ELMo utilizes a Convolutional Neural Network (CNN) over characters. With it, the affixation process could ideally encoded in a word
representation. We did an analysis using nearest neighbor words and T-distributed Stochastic Neighbor Embedding (t-SNE) word visualization to compare word2vec and ELMo. Our result showed that ELMo representation is richer in encoding the morphology information than it's counterpart. We trained our parser using word2vec and ELMo. To no surprise, the parser which uses ELMo gets a higher accuracy than word2vec. We obtain Unlabeled Attachment Score
(UAS) at 83.08 for ELMo and 81.35 for word2vec. Hence, we confirmed that morphology information is necessary, especially in a morphologically rich language like Indonesian.

UMN Research

Pengurai Dependensi Bahasa Indonesia Menggunakan Embedding From Language Models (Elmos)

Research Information

Research Publication

Proceedings

Publications

HKI

Buku Ajar