Nayiri Developers: Nayiri Armenian Lexicon

Nayiri Armenian Lexicon

Overview

The Nayiri Armenian Lexicon aims to deliver a comprehensive lexicon covering the full set of meaningful linguistic units in Armenian. All data is released under a permissive open-source license to encourage broad use and collaboration.

The Nayiri Lexicon Management System is designed to support all major written forms of Armenian: Western Armenian in traditional orthography, Eastern Armenian in traditional orthography, Eastern Armenian in reformed orthography, and Classical Armenian.

We’ve taken care of the heavy lifting, enabling you, as a software developer, to integrate the Armenian language—complete with its rich and complex morphology—directly into the applications you’re building.

The Data

The Lexicon seeks to provide a complete set of valid Armenian word forms, together with their full morphological inflections. Each word form is hierarchically organized under its corresponding lemma, and each lemma is in turn grouped under its corresponding lexeme.

This hierarchical structure allows the data to be used effectively for both morphological analysis and stemming.

Current Release

The current release (2026-04-25-v3) supports Western Armenian in traditional orthography only. While the morphological analyzer available on the Nayiri website includes some support for Eastern Armenian, that functionality is not included in the present open-source release.

The lexicon currently contains:

7,500 Lexemes
8,547 Lemmas
1,597,297 Word Forms
709 Inflections

Development is ongoing, and the dataset will continue to expand over time.

Limitations and future directions

Pronouns are not yet supported. In addition, certain irregular verbs (for example, եմ), defective verbs (such as ունիմ), adpositions (such as վրայ), and some irregular nouns (for example, հայր) are not yet fully implemented.

Documentation

Before downloading and working with the dataset, we recommend reviewing the documentation in the following order:

Data Model – Start here to understand the high-level organization and overall structure of the data.
File Format – Next, review a simplified example that illustrates how the JSON file is organized.
Data Schema Reference – Finally, explore the complete attribute reference for every JSON object in the dataset.

To help you get started, we also provide a sample Lexicon JSON file containing 20 Lexemes with around 3,600 Word Forms and all 700+ Inflection objects. The file is small enough to open and explore in a text editor and can be downloaded from the Download section below.

Licensing and Attribution

Licensing

This dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

You are free to:

Use the data for commercial and non-commercial purposes
Modify, adapt, and bulid upon the data
Redistribute the original or modified versions

Provided that you:

Give appropriate credit to the original source
Indicate if changes were made
Do not imply endorsement by the Nayiri Institute or Serouj Ourishian

A copy of the full license text is available at:
https://creativecommons.org/licenses/by/4.0/

Attribution

When using or redistributing this dataset, please include attribution in a reasonable and visible manner. The attribution should include the name of the dataset, the authoring organization, and the license.

Recommended attribution format:

Nayiri Armenian Lexicon © Serouj Ourishian. Licensed under CC BY 4.0.

If you modify the data, please indicate that changes were made, for example:

Nayiri Armenian Lexicon © Serouj Ourishian. Modified by <Your Name or Organization>. Licensed under CC BY 4.0.

Attribution in Software and Derived Works

For software applications, attribution may be included in:

Project documentation or README files
An “About” or “Credits” screen
License or NOTICE files

For academic or research use, attribution should appear in:

Papers, footnotes, or bibliographies
Dataset citations

Rationale

The CC BY 4.0 license is intended to encourage broad adoption and reuse of the Nayiri Armenian Lexicon while ensuring proper attribution to the original work.

Download

Alongside the full Lexicon dataset, we provide two compact sample files to help you quickly understand the data structure and begin integrating the Lexicon into your project.

Each sample file contains 20 Lexemes with around 3,600 Word Forms and all 700+ Inflections, and Metadata, giving you a realistic subset of the full dataset without the large file size.

Sample Lexicon JSON (Indented)

This version is formatted with line breaks, indentation, and additional whitespace to maximize readability. At under 700 KB, it can be easily opened in any text editor and is ideal for exploring the schema and becoming familiar with the structure of the data.

Download: nayiri-armenian-lexicon-2026-04-25-v3-sample-indented.json (660 KB)

Sample Lexicon JSON (Minified)

This file contains the exact same data as the indented version, but without extra whitespace or line breaks. The single-line format is convenient for parser development and rapid prototyping, allowing you to test your integration without loading the full dataset.

Download: nayiri-armenian-lexicon-2026-04-25-v3-sample.json (513 KB)

Full Lexicon JSON

This is the complete dataset intended for production use.

Download: nayiri-armenian-lexicon-2026-04-25-v3.json.zip (7.2 MB)

(7.2 MB ZIP archive containing a ~73.4 MB JSON file)

Sponsorship

The design, creation, and open-source release of the Nayiri Armenian Lexicon has been supported by the Calouste Gulbenkian Foundation.

with the sponsorship of the Calouste Gulbenkian Foundation