Nayiri Armenian Lexicon — Data Schema Reference
The lexicon data is provided in JSON format. The following sections describe the attributes of each JSON object.
Root-level
The root-level JSON Object has the following attributes:
| Attribute Key | Type | Description |
|---|---|---|
| lexemes | JSON Array | An Array of Lexeme objects (described below) |
| inflections | JSON Array | An Array of globally-defined Inflection objects (described below) shared by Word Forms |
| metadata | JSON Object | Provides an overview of the data set, including versioning, licensing, and some basic statistics. |
Lexeme
| Attribute Key | Type | Description |
|---|---|---|
| lexemeId | String | The 4-digit identifier that uniquely identifies this Lexeme in the Nayiri Lexicon. It is the base64url encoding of the underlying 24-bit unique identifier. |
| description | String | A human-readable description of this Lexeme that by convention has a comma-separated list of lemmas and a short English definition in parantheses, meant to provide a way to disambiguate distinct Lexemes with the same Lemma. For example, the Lexeme representing the postposition համար has the description “համար (for, on account of)”, whereas the Lexeme representing the noun համար has the description “համար (account, number, count, calculation, enumeration)”. |
| lemmaType | String | The type of Lemmas this Lexeme can contain. One of: { NOMINAL, VERBAL, UNINFLECTED }Lexemes with lemmaType of NOMINAL can store Lemmas that represent Nouns, Adjectives, Adverbs, and Adpositions Lexemes with lemmaType of VERBAL are meant to store Lemmas of Verbs only Lexemes with lemmaType of UNINFLECTED are meant to store Lexemes that are exclusively uninflected, such as adverbs (e.g. անմիջապէս), adpositions, conjunctions, interjections, articles, and determiners. |
| lemmas | JSON Array | The Lemma objects belonging to this Lexeme. |
Lemma
| Attribute Key | Type | Description |
|---|---|---|
| lemmaId | String | The 5-digit identifier that uniquely identifies this Lemma in the Nayiri Lexicon. It is the base64url encoding of the underlying 30-bit unique identifier. |
| lemmaString | String | The canonical word form of this Lemma. (For example, ճշդել) There may be more than one Lemma with the same lemmaString in a given Lexeme. For example, in the Uninflected Lexeme with the description “որ (that; when, whenever; if; so that, in order to)”, the two contained Uninflected Lemmas for the conjunction and adverb both have the same lemmaString (որ). |
| partOfSpeech | String | The part of speech of this lemma, which is one of: { NOUN, PRONOUN, VERB, ADJECTIVE, ADVERB, CONJUNCTION, INTERJECTION, ARTICLE, DETERMINER, ADPOSITION }
|
| lemmaDisplayString | String | A human-readable description of this Lemma. By convention, it is the lemmaString followed by an English definition in parentheses. It is meant to provide a way to disambiguate Lemmas with the same lemmaString within the same Lexeme. In the preceding example, the Uninflected Lemma for the conjunction որ has the lemmaDisplayString “որ (that; if; so that, In order to)”, where as the adverb has “որ (when, whenever)”. |
| numWordForms | Integer | A convenience attribute showing the total number of Word Forms in this Lemma. |
| wordForms | JSON Array | The WordForm objects attributed to this Lemma. |
Word Form
| Attribute Key | Type | Description |
|---|---|---|
| s | String | An inflected word form (e.g. ճշդեմ, ճշդես, ճշդէ) of the containing Lemma (e.g. ճշդել) |
| i | String | The unique identifier of the Inflection object representing the morphological analysis of this Word Form. |
Inflection
| Attribute Key | Type | Description |
|---|---|---|
| inflectionId | String | The 4-digit unique identifier of this Inflection object. It is the base64url encoding of the underlying 24-bit unique identifier of this Inflection object. |
| lemmaType | String | One of: { NOMINAL, VERBAL, UNINFLECTED }Note that no attributes besides inflectionId and displayName apply to the special Inflection object with lemmaType == UNINFLECTED
|
| displayName | JSON Object | Provides an internationalized human-readable display name for this Inflection. The keys are the locale, and the values are the localized display names. Both the keys and values are Strings. At present, only the hy (Armenian) and en (English) locale Strings are supported.
|
| verbalInflectionClass | String | Signifies the broad category of Verbal Inflections represented by this Inflection object. Applicable only when lemmaType == VERBALOne of: { REGULAR_VERB, INFINITIVE, PRESENT_PARTICIPLE, PAST_PARTICIPLE, FUTURE_PARTICIPLE, PRESENT_PARTICIPLE_SUBSTANTIVE, PAST_PARTICIPLE_SUBSTANTIVE, FUTURE_PARTICIPLE_SUBSTANTIVE }
|
| verbPolarity | String | Signifies the polarity of the verb for this Inflection. Applicable only when lemmaType == VERBALOne of: { POSITIVE, NEGATIVE }
|
| verbTense | String | Signifies the grammatical tense of the verb for this Inflection. Applicable only when verbalInflectionClass == REGULAR_VERBOne of: { SIMPLE_PRESENT, PRESENT_CONTINUOUS, PRESENT_PERFECT, SIMPLE_PAST, PAST_PERFECT, PAST_IMPERFECT, PAST_CONTINUOUS, SIMPLE_FUTURE, FUTURE_PERFECT, NONE }
|
| verbMood | String | Signifies the grammatical mood of the verb for this Inflection. Applicable only when verbalInflectionClass == REGULAR_VERBOne of: { INDICATIVE, IMPERATIVE, PROHIBITIVE, SUBJUNCTIVE, CONDITIONAL }
|
| grammaticalPerson | String | Signifies the grammatical person of the verb for this Inflection. Applicable only when verbalInflectionClass == REGULAR_VERBOne of: { FIRST, SECOND, THIRD, NONE }
|
| grammaticalNumber | String | Signifies the grammatical number of the noun, verb, or substantive participle for this Inflection. Applicable only when (lemmaType == NOMINAL) || (verbalInflectionClass == (REGULAR_VERB || PRESENT_PARTICIPLE_SUBSTANTIVE || PAST_PARTICIPLE_SUBSTANTIVE || FUTURE_PARTICIPLE_SUBSTANTIVE)One of: { SINGULAR, PLURAL }
|
| grammaticalCase | String | Signifies the grammatical number of the noun, infinitive, or substantive participle for this Inflection. Applicable only when (lemmaType == NOMINAL) || (verbalInflectionClass == (INFINITIVE || PRESENT_PARTICIPLE_SUBSTANTIVE || PAST_PARTICIPLE_SUBSTANTIVE || FUTURE_PARTICIPLE_SUBSTANTIVE))One of: { NOMINATIVE, ACCUSATIVE, GENITIVE, DATIVE, ABLATIVE, INSTRUMENTAL, LOCATIVE }
|
| grammaticalArticle | String | Signifies any grammatical article appended to the noun, infinitive, or substantive participle for this Inflection. Applicable only when (lemmaType == NOMINAL ) || (verbalInflectionClass == (INFINITIVE || PRESENT_PARTICIPLE_SUBSTANTIVE || PAST_PARTICIPLE_SUBSTANTIVE || FUTURE_PARTICIPLE_SUBSTANTIVE))One of: { NONE, DEFINITE_ARTICLE_UHT, DEFINITE_ARTICLE_NOO, POSSESSIVE_ARTICLE_SINGULAR_FIRST_PERSON, POSSESSIVE_ARTICLE_SINGULAR_SECOND_PERSON, POSSESSIVE_ARTICLE_UHT, POSSESSIVE_ARTICLE_NOO, DEFINITE_ARTICLE_NOO_WITH_FIRST_PERSON_POSSESSIVE_ARTICLE, DEFINITE_ARTICLE_NOO_WITH_SECOND_PERSON_POSSESSIVE_ARTICLE, DEFINITE_ARTICLE_NOO_WITH_THIRD_PERSON_POSSESSIVE_ARTICLE_UHT, DEFINITE_ARTICLE_NOO_WITH_THIRD_PERSON_POSSESSIVE_ARTICLE_NOO }
|
Metadata
The Metadata object provides version information of the lexicon data, some statistics about the data, and human-readable descriptions of its authorship, licensing, and attribution requirements.
| Attribute Key | Type | Description |
|---|---|---|
| version | String | A version String that uniquely identifies this release. It is formatted as YYYY-MM-DD-vN, where YYYY is the year, MM is the month (01-12), and DD is the day of the month (01-31), and N is the revision number for that day.
|
| license | String | The license under which the data is released |
| attribution | String | The attribution text that consumers of the data should display in their application or derivative work when using the data |
| publisher | String | |
| sponsorship | String | |
| author | String | |
| contactEmail | String | A contact email address for support |
| website | String | URL to the Nayiri website |
| numLexemes | Integer | The number of Lexemes in the data set |
| numLemmas | Integer | The total number of Lemmas across all Lexemes in the data set |
| numWordForms | Integer | The total number of Word Forms across all Lemmas of all Lexemes in the data set |
| numInflections | Integer | The number of Inflection objects defined globally |
Previous: File Format