Abstract
The Danish Sign Language Corpus and Dictionary project at Centre for Sign Language, UCC has a dual aim: to build of Danish Sign Language Corpus, and to use this corpus to expand and improve The Danish Sign Language Dictionary. Our goal is a one-to-one correspondence between sign lemmas in corpus and dictionary, but due to limited resources, we cannot include an accurate phonological description of each sign form. In order to secure a consistent lemmatisation in the corpus as well as across the two resources, we thus rely exclusively on sign videos and Danish equivalents. In this paper, we will describe how we use the lemmas of the Danish Sign Language Dictionary, and additional signs found in connection with the dictionary work, as the initial lexical database of the corpus tool. For new signs found in corpus, the actual corpus tokens will serve as preliminary video representations. To facilitate the sign search when lemmatising corpus tokens, we assign several Danish equivalents to each sign, including all equivalents in the dictionary data. Furthermore, we include synonyms found through linking these equivalents to the Danish wordnet (DanNet), although equivalents added in this way cannot be regarded as valid senses of the sign.
Translated title of the contribution | Forbedring af lemmatiseringskonsistens uden en fonologisk beskrivelse: Projektet ordbog og korpus over dansk tegnsprog |
---|---|
Original language | English |
Publication date | 12 May 2018 |
Number of pages | 4 |
Publication status | Published - 12 May 2018 |
Event | Language Resources and Evaluation Conference - Phoenix Seagaia Conference Center, Miyazaki, Japan Duration: 7 May 2018 → 12 May 2018 Conference number: 11 http://lrec2018.lrec-conf.org/en/ |
Conference
Conference | Language Resources and Evaluation Conference |
---|---|
Number | 11 |
Location | Phoenix Seagaia Conference Center |
Country/Territory | Japan |
City | Miyazaki |
Period | 07/05/18 → 12/05/18 |
Internet address |
Keywords
- sign language
- Danish Sign Language (DTS)
- corpus linguistics
- language documentation