The documentation process will draw on the collective expertise of the project’s contributors, each of whom are experts in their respective fields:
Project oversight will be performed by I. Wayan Arka, an Associate Professor at the School of Culture History and Language (ANU College of Asia Pacific), and an Associate Investigator with the ARC Centre of Excellence for the Dynamics of Language. He has a deep working knowledge of documentation theory and practices, informed by his previous research and workshop-teaching.
Prof. B. Waluyo, an experienced ethnobotanist from LIPI (Jakarta), is responsible for the enthnobiological strand of the project. He will be working with a highly capable team of local researchers, including biologist, Norce Mote (Musamus University, Merauke); anthropologist, Ngurah Suryawan (Universitas Papua, Manokwari) and La Hisa (Forestry Officer, Wasur National Park).
The methodology will follow widely-agreed upon conceptions and principles in modern language documentation (Himmelmann 1998, 2006; Dwyer 2006; Woodbury 2003, among others). In terms of corpus development (Woodbury 2013), we aim for the following qualities:
- diversity of corpus;
- size of corpus;
- production of corpus that is ongoing, distributed, and opportunistic;
- transparent and properly annotated materials;
- preservable and portable material;
- ethical corpus.
In terms of community involvement to maximize the success of documentation research (Dwyer 2006), we value, develop and maintain the following qualities:
- a good relationship between the researchers and indigenous partners, and
- ethical work with shared and mutually negotiated common goals.
We will involve native speakers as much as possible in all steps of the documentation process. Some speakers will be given proper training to participate in data collection, transcription, translation, and review. This allows for native speakers’ perspective and input regarding which data is deemed appropriate to collect.
Such involvement would raise community awareness of the significance of language preservation. The special focus on the ethnobiology addresses the concerns of the elders to preserve and transmit this knowledge; in the case of Marori, this collection is only possible while the few remaining elders are still alive.
The project will also involve collaboration with a variety of institutions and stakeholders, including the local government in Merauke, Balai Taman Nasional Wasur, the World Wildlife Fund (WWF), Indonesian Institute of Sciences (LIPI), and the newly established ARC Centre of Excellence for the Dynamics of Language (COEDL). We will make use of the infrastructure and solid network of community contacts of the Merauke branch of the WWF to aid us in the conservation and documentation of vanishing traditional ecological knowledge.
The proposed parallel corpus outcomes fall into two strands. The first strand involves parallelism at the broadest topic level for typological-linguistic purposes. For this, we will use controlled picture-based elicitation tasks. We will start with the Frog Story, and progress to pictures with local content on botanical topics, either in the form of hand-drawn pictures or still photos using a digital camera.
The second strand, associated with a separate Australian Research Centre project, involves finer-grained parallelism, at clausal or even phrasal levels, for computational testing of the theoretical analysis within the LFG-based ParGram framework. For this, we will use the edited version of the results from the first strand activity, in conjunction with translation techniques. The translation into Marori and Smärky Kanum will be done and checked by a group of native speakers to ensure the naturalness of these texts.
For the relevant tagging, we will follow the standard procedures and use the XLE parser (Crouch et al. 2008) to create required functional annotations. The incorporation of endangered language documentation in the computational ParGram project is novel, and we expect to gain new insights from the process.