To visit our new website, go to: www.corcencc.org
A corpus is a collection of language data from real-life contexts that allows users to identify and explore language as it is actually used, rather than relying on intuition or prescriptive accounts of how it ‘should’ be used. This is of benefit to academic researchers, lexicographers, teachers, language learners, assessors, resource developers, policy makers, publishers, translators and others by providing them with concrete evidence. CorCenCC is the first general corpus to represent modern Welsh and is revolutionary in that it is community-driven, using mobile and digital technologies to enable public collaboration.
The project breaks new ground as both a language resource and a model of corpus construction and provides societal, economic and academic benefits. These include facilitating the use of Welsh in public, commercial, educational and governmental settings while redeveloping the scope, relevance and design infrastructure of corpus development methodology. CorCenCC also aids the development of technologies such as predictive text production, word processing tools, machine translation, voice recognition and web search tools. Until now, the Welsh language has not had a comprehensive corpus facility to achieve and enable these developments.