CFP
RESOURCEFUL-2020: RESOURCEs and representations For Under-resourced Languages and domains Nya Humanisten Gothenburg, Sweden, November 25, 2020 |
Conference website | https://gu-clasp.github.io/resourceful-2020/ |
Abstract registration deadline | October 13, 2020 |
Submission deadline | October 13, 2020 |
RESOURCEs and representations For Under-resourced Languages and domains
Submission Guidelines
All papers must be original and not simultaneously submitted to another journal or conference. The following paper categories are welcome:
- Full papers
- All areas of natural language processing have achieved visible breakthroughs from the use of data-driven models. Contemporary machine learning is significantly influenced by techniques that rely on large datasets that demand substantial computational resources to solve practical problems in a tangible way (e.g. models based on transformers such as BERT, VilBERT, ALBERT, and GPT-2 that are pre-trained on large corpora of unlabelled data). However, many of the world’s languages lack the availability of linguistic description as well as of sufficiently large computer-readable corpora of linguistic material. Even those languages that are considered well-resourced have some domains where resources are scarce, for example corpora of dialogue and situated interaction. Another similarity of these domains with under-resourced languages is that since they focus on spoken or spoken-like interaction (either in a written or an audio form) they show a high variability of input data. Applying state-of-the-art deep-neural-network-based methods for the development of data-driven systems in such resource-constrained environments is a non-trivial task. For this workshop, we encourage contributions in the area of resource creation and representation learning in limited or low-resource environments that are tackling the above mentioned problems. In particular we would like to open a forum by bringing together students, researchers, and experts to address and discuss the following questions:
- How can new resources be constructed or extended for languages and domains that lack standardised representations of linguistic units?
- What experience from building resources for languages that have a good coverage today (for example Scandinavian languages) can be ported to building resources for under-resources languages and domains?
- How to deal with the variability of data and its standardisation in machine learning approaches?
- What algorithms and methods can we employ to transfer learning from related domains/languages that have good coverage?
- What is the role of multi-task learning in this domain?
- What representations can be learned and how effective are they in different low-resource scenarios?
- How can newly created resources and learned representations be evaluated?
- What ethical considerations are involved?
- Intended participants are researchers, PhD students and practitioners from diverse backgrounds (linguistics, computational linguistics, speech, machine learning etc). We foresee an interactive workshop with plenty of time for discussion, complemented with invited talks and short presentations of on-going or completed research.
- All areas of natural language processing have achieved visible breakthroughs from the use of data-driven models. Contemporary machine learning is significantly influenced by techniques that rely on large datasets that demand substantial computational resources to solve practical problems in a tangible way (e.g. models based on transformers such as BERT, VilBERT, ALBERT, and GPT-2 that are pre-trained on large corpora of unlabelled data). However, many of the world’s languages lack the availability of linguistic description as well as of sufficiently large computer-readable corpora of linguistic material. Even those languages that are considered well-resourced have some domains where resources are scarce, for example corpora of dialogue and situated interaction. Another similarity of these domains with under-resourced languages is that since they focus on spoken or spoken-like interaction (either in a written or an audio form) they show a high variability of input data. Applying state-of-the-art deep-neural-network-based methods for the development of data-driven systems in such resource-constrained environments is a non-trivial task. For this workshop, we encourage contributions in the area of resource creation and representation learning in limited or low-resource environments that are tackling the above mentioned problems. In particular we would like to open a forum by bringing together students, researchers, and experts to address and discuss the following questions:
List of Topics
- Topic 1: Under-resourced Languages and Domains
- Topic 2: Computational Linguistics
- Topic 3: Machine Learning
Committees
Organizing committee
- Person 1: Tewodros Gebreselassie
- Person 2: Simon Dobnik
- Person 3: Barbara Plank
- Person 4: Lars Borin
Venue
The conference will be held virtually.
Contact
All questions about submissions should be emailed to Tewodros Abebe <tewodros.gebreselassie@gu.se> and Simon Dobnik <simon.dobnik@gu.se>