CFP

RESOURCEFUL-2020: RESOURCEs and representations For Under-resourced Languages and domains

Nya Humanisten

Gothenburg, Sweden, November 25, 2020

Conference website	https://gu-clasp.github.io/resourceful-2020/
Abstract registration deadline	October 13, 2020
Submission deadline	October 13, 2020

Topics: computational linguistics machine learning deep neural network

RESOURCEs and representations For Under-resourced Languages and domains

Submission Guidelines

All papers must be original and not simultaneously submitted to another journal or conference. The following paper categories are welcome:

Full papers
- All areas of natural language processing have achieved visible breakthroughs from the use of data-driven models. Contemporary machine learning is significantly influenced by techniques that rely on large datasets that demand substantial computational resources to solve practical problems in a tangible way (e.g. models based on transformers such as BERT, VilBERT, ALBERT, and GPT-2 that are pre-trained on large corpora of unlabelled data). However, many of the world’s languages lack the availability of linguistic description as well as of sufficiently large computer-readable corpora of linguistic material. Even those languages that are considered well-resourced have some domains where resources are scarce, for example corpora of dialogue and situated interaction. Another similarity of these domains with under-resourced languages is that since they focus on spoken or spoken-like interaction (either in a written or an audio form) they show a high variability of input data. Applying state-of-the-art deep-neural-network-based methods for the development of data-driven systems in such resource-constrained environments is a non-trivial task. For this workshop, we encourage contributions in the area of resource creation and representation learning in limited or low-resource environments that are tackling the above mentioned problems. In particular we would like to open a forum by bringing together students, researchers, and experts to address and discuss the following questions:
  - How can new resources be constructed or extended for languages and domains that lack standardised representations of linguistic units?
  - What experience from building resources for languages that have a good coverage today (for example Scandinavian languages) can be ported to building resources for under-resources languages and domains?
  - How to deal with the variability of data and its standardisation in machine learning approaches?
  - What algorithms and methods can we employ to transfer learning from related domains/languages that have good coverage?
  - What is the role of multi-task learning in this domain?
  - What representations can be learned and how effective are they in different low-resource scenarios?
  - How can newly created resources and learned representations be evaluated?
  - What ethical considerations are involved?
  - Intended participants are researchers, PhD students and practitioners from diverse backgrounds (linguistics, computational linguistics, speech, machine learning etc). We foresee an interactive workshop with plenty of time for discussion, complemented with invited talks and short presentations of on-going or completed research.

List of Topics

Topic 1: Under-resourced Languages and Domains
Topic 2: Computational Linguistics
Topic 3: Machine Learning

Committees

Organizing committee

Person 1: Tewodros Gebreselassie
Person 2: Simon Dobnik
Person 3: Barbara Plank
Person 4: Lars Borin

Venue

The conference will be held virtually.

Contact

All questions about submissions should be emailed to Tewodros Abebe <tewodros.gebreselassie@gu.se> and Simon Dobnik <simon.dobnik@gu.se>