Allele
The allele class is used for representing contiguous changes on a reference sequence. This class covers the most commonly described forms of variation, including all “small” variants such as SNVs and indels that are also representable in other contemporary genomic variant formats, such as SPDI, HGVS, and VCF.
Definition and Information Model
Note
This data class is at a trial use maturity level and may change in future releases. Maturity levels are described in the GKS Maturity Model.
Computational Definition
The state of a molecule at a Location.
GA4GH Digest
Prefix |
Inherent |
---|---|
VA |
[‘location’, ‘state’, ‘type’] |
Information Model
Some Allele attributes are inherited from Variation.
Field |
Flags |
Type |
Limits |
Description |
---|---|---|---|---|
id |
string |
0..1 |
The ‘logical’ identifier of the Entity in the system of record, e.g. a UUID. This ‘id’ is unique within a given system, but may or may not be globally unique outside the system. It is used within a system to reference an object from another. |
|
name |
string |
0..1 |
A primary name for the entity. |
|
description |
string |
0..1 |
A free-text description of the Entity. |
|
aliases |
⋮ | string |
0..m |
Alternative name(s) for the Entity. |
extensions |
⋮ | 0..m |
A list of extensions to the Entity, that allow for capture of information not directly supported by elements defined in the model. |
|
digest |
string |
0..1 |
A sha512t24u digest created using the VRS Computed Identifier algorithm. |
|
expressions |
⋮ | 0..m |
||
type |
string |
1..1 |
MUST be “Allele” |
|
location |
1..1 |
The location of the Allele |
||
state |
1..1 |
An expression of the sequence state |
Example
{
"id": "ga4gh:VA.Oop4kjdTtKcg1kiZjIJAAR3bp7qi4aNT",
"type": "Allele",
"expressions": [
{
"syntax": "spdi",
"value": "NC_000001.11:40819438:CTCCTCCT:CTCCTCCTCCT"
}
],
"location": {
"type": "SequenceLocation",
"sequenceReference": {
"refgetAccession": "SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO",
"residueAlphabet": "na",
"id": "NC_000001.11"
},
"start": 40819438,
"end": 40819446
},
"state": {
"type": "ReferenceLengthExpression",
"length": 11,
"repeatSubunitLength": 3
}
}
Implementation Guidance
Sequence Location Coordinates
The location property of the allele will almost always have start and end coordinates that are specified using integers (not Range). There are some situations, such as the detection of deleted sequence by microarray, where it may be appropriate to represent the variant as an Allele; however, other classes for representing such findings should also be considered (e.g. Copy Number Count).
Normalization
The Allele
also includes conventions for variant normalization (see Allele Normalization) that allows for compact and
uniform representation of variants.
New in v2
In VRS v1.x, normalization included methods for full justification of variants, as derived from the NCBI VOCA algorithm. In v2, this has been extended to include reference length encoding (see Reference Length Expression), to accommodate compressed representation of variants that occur in large repetitive regions.
For alleles in small repeating regions, it may be convenient to also use the ReferenceLengthExpression.sequence attribute to represent the sequence state explicitly alongside the reference encoding.
Expressions
New in v2
The v2 Variation classes now support Expression. This is a convenient mechanism for annotating Alleles using string syntaxes following the conventions other variant standards (e.g. HGVS, SPDI) and resources (e.g. ClinVar, gnomAD).