Allele

The allele class is used for representing contiguous changes on a reference sequence. This class covers the most commonly described forms of variation, including all “small” variants such as SNVs and indels that are also representable in other contemporary genomic variant formats, such as SPDI, HGVS, and VCF.

Definition and Information Model

Note

This data class is at a trial use maturity level and may change in future releases. Maturity levels are described in the GKS Maturity Model.

Computational Definition

The state of a molecule at a Location.

GA4GH Digest

Prefix

Inherent

VA

[‘location’, ‘state’, ‘type’]

Information Model

Some Allele attributes are inherited from Variation.

Field

Flags

Type

Limits

Description

id

string

0..1

The ‘logical’ identifier of the Entity in the system of record, e.g. a UUID. This ‘id’ is unique within a given system, but may or may not be globally unique outside the system. It is used within a system to reference an object from another.

name

string

0..1

A primary name for the entity.

description

string

0..1

A free-text description of the Entity.

aliases

string

0..m

Alternative name(s) for the Entity.

extensions

Extension

0..m

A list of extensions to the Entity, that allow for capture of information not directly supported by elements defined in the model.

digest

string

0..1

A sha512t24u digest created using the VRS Computed Identifier algorithm.

expressions

Expression

0..m

type

string

1..1

MUST be “Allele”

location

iriReference | Location

1..1

The location of the Allele

state

Sequence Expression

1..1

An expression of the sequence state

Example

{
    "id": "ga4gh:VA.Oop4kjdTtKcg1kiZjIJAAR3bp7qi4aNT",
    "type": "Allele",
    "expressions": [
        {
            "syntax": "spdi",
            "value": "NC_000001.11:40819438:CTCCTCCT:CTCCTCCTCCT"
        }
    ],
    "location": {
        "type": "SequenceLocation",
        "sequenceReference": {
            "refgetAccession": "SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO",
            "residueAlphabet": "na",
            "id": "NC_000001.11"
        },
        "start": 40819438,
        "end": 40819446
    },
    "state": {
        "type": "ReferenceLengthExpression",
        "length": 11,
        "repeatSubunitLength": 3
    }
}

Implementation Guidance

Sequence Location Coordinates

The location property of the allele will almost always have start and end coordinates that are specified using integers (not Range). There are some situations, such as the detection of deleted sequence by microarray, where it may be appropriate to represent the variant as an Allele; however, other classes for representing such findings should also be considered (e.g. Copy Number Count).

Normalization

The Allele also includes conventions for variant normalization (see Allele Normalization) that allows for compact and uniform representation of variants.

New in v2

In VRS v1.x, normalization included methods for full justification of variants, as derived from the NCBI VOCA algorithm. In v2, this has been extended to include reference length encoding (see Reference Length Expression), to accommodate compressed representation of variants that occur in large repetitive regions.

For alleles in small repeating regions, it may be convenient to also use the ReferenceLengthExpression.sequence attribute to represent the sequence state explicitly alongside the reference encoding.

Expressions

New in v2

The v2 Variation classes now support Expression. This is a convenient mechanism for annotating Alleles using string syntaxes following the conventions other variant standards (e.g. HGVS, SPDI) and resources (e.g. ClinVar, gnomAD).