Sequence Reference

New in v2

In VRS v1.x, sequence references were limited to the refget sequence accession within Sequence Location objects. This made it difficult to indicate in a message that the referenced sequence was, for example, “GRCh38 chr11”. The SequenceReference class was created to enable the addition of such metadata.

The SequenceReference class is used to refer to a sequence by its refget accession. The class also allows implementations to optionally specify extra characteristics about the sequence, such as the alphabet used (nucleic acid or amino acid), if the sequence represents a circular molecule, and labels used to describe the sequence.

Definition and Information Model

Note

This data class is at a trial use maturity level and may change in future releases. Maturity levels are described in the GKS Maturity Model.

Computational Definition

A sequence of nucleic or amino acid character codes.

GA4GH Digest

Prefix

Inherent

None

[‘refgetAccession’, ‘type’]

Information Model

Some SequenceReference attributes are inherited from Entity.

Field

Flags

Type

Limits

Description

id

string

0..1

The ‘logical’ identifier of the Entity in the system of record, e.g. a UUID. This ‘id’ is unique within a given system, but may or may not be globally unique outside the system. It is used within a system to reference an object from another.

name

string

0..1

A primary name for the entity.

description

string

0..1

A free-text description of the Entity.

aliases

string

0..m

Alternative name(s) for the Entity.

extensions

Extension

0..m

A list of extensions to the Entity, that allow for capture of information not directly supported by elements defined in the model.

type

string

1..1

MUST be “SequenceReference”

refgetAccession

string

1..1

A GA4GH RefGet identifier for the referenced sequence, using the sha512t24u digest.

residueAlphabet

string

0..1

The interpretation of the character codes referred to by the refget accession, where “aa” specifies an amino acid character set, and “na” specifies a nucleic acid character set.

sequence

sequenceString

0..1

A sequenceString that is a literal representation of the referenced sequence.

moleculeType

string

0..1

Molecule types as defined by RefSeq (see Table 1). MUST be one of “genomic”, “RNA”, “mRNA”, or “protein”.

circular

boolean

0..1

A boolean indicating whether the molecule represented by the sequence is circular (true) or linear (false).

Example

{
  "type": "SequenceReference",
  "refgetAccession": "SQ.F-LrLMe1SRpfUZHkQmvkVKFEGaoDeHul",
  "label": "NC_000007.14"
}