Improving Lemmatisation Consistency without a Phonological Description: The Danish Sign Language Corpus and Dictionary Project

Thomas Troelsgård, Jette Hedegaard Kristoffersen

    Research output: Contribution to conference without a publisher/journalPaperResearchpeer-review

    Abstract

    The Danish Sign Language Corpus and Dictionary project at Centre for Sign Language, UCC has a dual aim: to build of Danish Sign Language Corpus, and to use this corpus to expand and improve The Danish Sign Language Dictionary. Our goal is a one-to-one correspondence between sign lemmas in corpus and dictionary, but due to limited resources, we cannot include an accurate phonological description of each sign form. In order to secure a consistent lemmatisation in the corpus as well as across the two resources, we thus rely exclusively on sign videos and Danish equivalents. In this paper, we will describe how we use the lemmas of the Danish Sign Language Dictionary, and additional signs found in connection with the dictionary work, as the initial lexical database of the corpus tool. For new signs found in corpus, the actual corpus tokens will serve as preliminary video representations. To facilitate the sign search when lemmatising corpus tokens, we assign several Danish equivalents to each sign, including all equivalents in the dictionary data. Furthermore, we include synonyms found through linking these equivalents to the Danish wordnet (DanNet), although equivalents added in this way cannot be regarded as valid senses of the sign.
    Translated title of the contributionForbedring af lemmatiseringskonsistens uden en fonologisk beskrivelse: Projektet ordbog og korpus over dansk tegnsprog
    Original languageEnglish
    Publication date12 May 2018
    Number of pages4
    Publication statusPublished - 12 May 2018
    EventLanguage Resources and Evaluation Conference - Phoenix Seagaia Conference Center, Miyazaki, Japan
    Duration: 7 May 201812 May 2018
    Conference number: 11
    http://lrec2018.lrec-conf.org/en/

    Conference

    ConferenceLanguage Resources and Evaluation Conference
    Number11
    LocationPhoenix Seagaia Conference Center
    Country/TerritoryJapan
    CityMiyazaki
    Period07/05/1812/05/18
    Internet address

    Keywords

    • sign language
    • Danish Sign Language (DTS)
    • corpus linguistics
    • language documentation

    Fingerprint

    Dive into the research topics of 'Improving Lemmatisation Consistency without a Phonological Description: The Danish Sign Language Corpus and Dictionary Project'. Together they form a unique fingerprint.

    Cite this