Abstract
The Danish Sign Language Corpus and Dictionary project at Centre for Sign Language, UCC has a dual aim: to build of Danish Sign
Language Corpus, and to use this corpus to expand and improve The Danish Sign Language Dictionary. Our goal is a one-to-one
correspondence between sign lemmas in corpus and dictionary, but due to limited resources, we cannot include an accurate phonological
description of each sign form. In order to secure a consistent lemmatisation in the corpus as well as across the two resources, we thus
rely exclusively on sign videos and Danish equivalents. In this paper, we will describe how we use the lemmas of the Danish Sign
Language Dictionary, and additional signs found in connection with the dictionary work, as the initial lexical database of the corpus tool.
For new signs found in corpus, the actual corpus tokens will serve as preliminary video representations. To facilitate the sign search
when lemmatising corpus tokens, we assign several Danish equivalents to each sign, including all equivalents in the dictionary data.
Furthermore, we include synonyms found through linking these equivalents to the Danish wordnet (DanNet), although equivalents added
in this way cannot be regarded as valid senses of the sign.
Language Corpus, and to use this corpus to expand and improve The Danish Sign Language Dictionary. Our goal is a one-to-one
correspondence between sign lemmas in corpus and dictionary, but due to limited resources, we cannot include an accurate phonological
description of each sign form. In order to secure a consistent lemmatisation in the corpus as well as across the two resources, we thus
rely exclusively on sign videos and Danish equivalents. In this paper, we will describe how we use the lemmas of the Danish Sign
Language Dictionary, and additional signs found in connection with the dictionary work, as the initial lexical database of the corpus tool.
For new signs found in corpus, the actual corpus tokens will serve as preliminary video representations. To facilitate the sign search
when lemmatising corpus tokens, we assign several Danish equivalents to each sign, including all equivalents in the dictionary data.
Furthermore, we include synonyms found through linking these equivalents to the Danish wordnet (DanNet), although equivalents added
in this way cannot be regarded as valid senses of the sign.
Bidragets oversatte titel | Forbedring af lemmatiseringskonsistens uden en fonologisk beskrivelse: Projektet ordbog og korpus over dansk tegnsprog |
---|---|
Originalsprog | Engelsk |
Publikationsdato | 12 maj 2018 |
Antal sider | 4 |
Status | Udgivet - 12 maj 2018 |
Begivenhed | Language Resources and Evaluation Conference - Phoenix Seagaia Conference Center, Miyazaki, Japan Varighed: 7 maj 2018 → 12 maj 2018 Konferencens nummer: 11 http://lrec2018.lrec-conf.org/en/ |
Konference
Konference | Language Resources and Evaluation Conference |
---|---|
Nummer | 11 |
Lokation | Phoenix Seagaia Conference Center |
Land | Japan |
By | Miyazaki |
Periode | 07/05/18 → 12/05/18 |
Internetadresse |
Emneord
- tegnsprog