To visit our new website, go to:


Logo CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes – The National Corpus of Contemporary Welsh): A community driven approach to linguistic corpus construction is an interdisciplinary, collaborative project led by the School of English, Communication and Philosophy at Cardiff University. The £1.8m project commenced on 1st March 2016 and is funded by the Economic and Social Research Council (ESRC) and the Arts and Humanities Research Council (AHRC).

A corpus is a collection of language data from real-life contexts that allows users to identify and explore language as it is actually used, rather than relying on intuition or prescriptive accounts of how it ‘should’ be used. This is of benefit to academic researchers, lexicographers, teachers, language learners, assessors, resource developers, policy makers, publishers, translators and others by providing them with concrete evidence. CorCenCC is the first general corpus to represent modern Welsh and is revolutionary in that it is community-driven, using mobile and digital technologies to enable public collaboration.

The project breaks new ground as both a language resource and a model of corpus construction and provides societal, economic and academic benefits. These include facilitating the use of Welsh in public, commercial, educational and governmental settings while redeveloping the scope, relevance and design infrastructure of corpus development methodology. CorCenCC also aids the development of technologies such as predictive text production, word processing tools, machine translation, voice recognition and web search tools. Until now, the Welsh language has not had a comprehensive corpus facility to achieve and enable these developments.