Corpus of Western Armenian beta

Options

Search Target
Part of Speech
Case-sensitive Search
Exclude Tokens with Ambiguous Lemmas

No results found for the query "KhaïdaRaguevil"!

No Lemma or Word Form selected!

Overview

The Corpus of Western Armenian (COWA) is a partially annotated text corpus of originally digital as well as older, digitized texts in the Western Armenian language.

The Need

A text corpus is the main tool for Linguistic Research used by scholars and linguists to study a language. Text corpora are custom-created for each language and COWA is the first one for Western Armenian.

A text corpus is used in Lexicography to create and update dictionaries; in Syntax and Grammar Studies to analyze sentence structure and grammatical rules; and in Semantics to study the meaning and usage of words and phrases.

A text corpus has wide-ranging applications for Education and Language Learning. Given that Western Armenian lacks an explanatory dictionary with example sentences, COWA tries to fill this gap, by providing contextual usage of searched words across a large sample of Western Armenian text, making COWA a convenient companion to the corpus of dictionaries on Nayiri.com.

Furthermore, having a general Search Engine with the vocabulary and inflectional rules of the Western Armenian language is essential for information search and retrieval from Western Armenian texts, making COWA an indispensable tool for researchers and everyday users alike.

Design

COWA is inspired by the Corpus of Contemporary American English (COCA) , a state-of-the-art corpus developed at Brigham Young University in the U.S.

Current Release

Launched in May 2024, COWA is the first text corpus and the first search engine for the Western Armenian language, which properly indexes words based on its complex, highly inflective morphology.

COWA is currently in beta release:

  • The size of the Corpus is constantly growing.
  • Search is currently possible by individual Lexeme or Word Form only; more advanced, multi-variate search is under development now.
  • The vocabulary of recognized words is actively being expanded.
  • Support for pronouns will be added shortly.

What is a text corpus?

You can think of a text corpus as an advanced search engine for a large selection of text (also known as a corpus) with a focus on language and linguistic analysis.

For example, when you search the Lexeme ազատութիւն, the user interface will first return a summary of the frequency count of various word forms of ազատութիւն as they occur in texts throughout the Corpus. You can then choose to see the citations of either all word forms of ազատութիւն or of a particular word form (e.g. ազատութեան, ազատութիւնը, ազատութիւն, ազատութիւնս, ազատութենէ, ազատութենէս etc.) as they occur in specific sentences in the Corpus.

The search engine of COWA can differentiate between homographic word forms of different Lexemes and parts of speech (i.e. word forms with the same spelling but different meanings). For example, upon a search of the Word Form հայերէն, it can differentiate between citations that refer to հայերէն as the adjective (“of the Armenian language”), հայերէն as the adverb (“in the Armenian language”), հայերէն as the noun (“the Armenian language) and հայերէն as the inflected form of the noun հայ (“from the Armenians”).

What can I use a text corpus for?

The practical uses of a text corpus are numerous and wide-ranging. These include:

  • Discovering the usage patterns of a word by viewing examples of its actual use in text.

    Broadly speaking, human language consists of vocabulary, grammar, usage, and phonology. Whereas dictionaries are great at documenting a language's vocabulary, grammar textbooks at documenting grammar, and audio recordings at documenting phonology, a text corpus can be used to document and discover word usage.

    This is particularly significant for the Armenian language, in which verbs often dictate the grammatical case of their objects or complements; this is known as case governmentխնդրառութիւն» in Armenian).

    For example, while transitive verbs usually take a direct object in the accusative case, some transitive verbs like յաղթել ("to conquer"), զարնել ("to strike"), and խնայել ("to spare") take a direct object in the dative case (the latter two may also use the accusative, depending on the context). Using the Corpus, you can search for a Verb (e.g. յաղթել) and discover which grammatical case(s) its direct objects may take by browsing the citations in the Context section (in this example, all relevant citations have the direct object in the dative case: e.g. «բարութեամբ յաղթել չարին» [not չարը], «վախին յաղթելու կամք ունին» [not վախը], etc.).

    More generally, the Corpus can help you quickly identify common usage patterns of a given word such as collocations, phrasal verbs and idioms.

    For example, a search for նախապատուութիւն produces several citations with the Verb-Noun Collocation «նախապատուութիւն տալ». That is to say, the noun նախապատուութիւն ("preference") and the verb տալ ("to give") frequently occur together, forming a meaningful unit ("to give preference to, to prefer").
  • Comparing the usage frequencies of different Lexemes.

    For example, you can compare how frequently each of the synonymous Lexemes սպիտակ and ճերմակ are used in Western Armenian.

    Similarly, for a given Lexeme, you can view the usage frequency of each of its Word Forms (for example, for the nouns գիրք, երկիր, զգացումն, ժողովուրդ, or պատմութիւն; for the verbs ապրիլ, գալ, գտնուիլ, երթալ, ըլլալ, ըսել, կարդալ, ներկայացնել, տալ or տեսնել).
  • Discovering new words and new senses of words.

    Languages are continually evolving. A text corpus can help you understand the meaning of a new word (for example գործնապաշտ, meaning pragmatic, or pragmatist) or a new sense of an existing word by looking at the context in which it is used. You can also discover its usage paradigms (for example, the inflections it takes or the inflections taken by related words in its context).
  • Search

    Finally, a text corpus can be used as a general search engine when researching an individual (such as Կոմիտաս, Վրացեան, Օշական), a place (such as Կիլիկիա, Ժիպէյլ, Երուսաղէմ), an event (such as եղեռն), and so on.

Who is COWA made for?

COWA is a widely applicable linguistic tool and search engine that's useful to anyone with an interest in the Western Armenian language:

  • Students and Teachers of the Western Armenian language - to learn and to teach word usage, idioms, case government, and how to put words together
  • Writers and Translatorsto identify the proper use and context in which to use a given word
  • Lexicographers (dictionary authors) - to discover idioms, new words and new senses of existing words, and the contexts in which they are used
  • Linguiststo research word usage, sentence structure, case government, and Armenian grammatical rules using empirical data
  • Researchersto discover word frequency patterns and as a general search engine for researching a particular topic, such as a person, place, event or thing

How can I help?

COWA is a work in progress. If you have digitized works written in the Western Armenian language (either your own or by others) which you’d like to share for possible inclusion in COWA, please send us an email to corpus@nayiri.com.

You can also send us suggestions for works and authors to add to COWA.

Thank You


The Corpus of Western Armenian has been sponsored by the Calouste Gulbenkian Foundation