SequenceLocation
The sequence location class is a fundamental concept in VRS, and is used to describe every form of Variation, and has stand-alone utility for describing sequence locations in other (non-variation) contexts. This class is used to represent a subsequence of a specified SequenceReference. The reference is typically a chromosome, transcript, or protein sequence.
Definition and Information Model
Computational Definition
A Location defined by an interval on a referenced Sequence.
Information Model
Some SequenceLocation attributes are inherited from Ga4ghIdentifiableObject.
Field |
Type |
Limits |
Description |
---|---|---|---|
id |
string |
0..1 |
The ‘logical’ identifier of the entity in the system of record, e.g. a UUID. This ‘id’ is unique within a given system. The identified entity may have a different ‘id’ in a different system, or may refer to an ‘id’ for the shared concept in another system (e.g. a CURIE). |
label |
string |
0..1 |
A primary label for the entity. |
description |
string |
0..1 |
A free-text description of the entity. |
extensions |
Extension |
0..m |
|
type |
string |
0..1 |
MUST be “SequenceLocation” |
digest |
string |
0..1 |
A sha512t24u digest created using the VRS Computed Identifier algorithm. |
sequenceReference |
IRI | SequenceReference |
0..1 |
A reference to a Sequence on which the location is defined. |
start |
integer | Range |
0..1 |
The start coordinate or range of the SequenceLocation. The minimum value of this coordinate or range is 0. MUST represent a coordinate or range less than or equal to the value of end. |
end |
integer | Range |
0..1 |
The end coordinate or range of the SequenceLocation. The minimum value of this coordinate or range is 0. MUST represent a coordinate or range greater than or equal to the value of start. |
Implementation Guidance
Start, End, and Ranges
At least one of the start
and end
properties MUST be specified in any SequenceLocation
instance.
When only one of these properties is specified, this represents an open interval beginning at the specified
coordinate and extending left (when start
is null
) or right (when end
is null
).
When there is ambiguity at a coordinate (e.g., when using a SequenceLocation to describe the confidence boundary of a copy number segment), this is specified using the Range class for that coordinate.
New in v2
In VRS v1.x, the SequenceLocation
class had an interval
property which contained start
and end
attributes. This intermediate object layer has been removed in v2.0, making start
and end
top-level properties of the SequenceLocation
.
Linear and Circular Sequence Coordinates
When representing a linear sequence, it is expected that for a Sequence of length n, 0 ≤ start ≤ end ≤ n
For a circular sequence, 0 ≤ end ≤ start ≤ n
is also allowed. In cases where end < start
, this represents
a location that spans the circular sequence origin coordinate.
New in v2
The v2 SequenceLocation
now also supports circular sequences. The optional circular
property of the
SequenceReference class may be set to True
or False
to explicitly indicate if a reference is
circular, and therefore if 0 ≤ end ≤ start ≤ n
is also allowed.
Implied Sequence Coordinates
The Sequence Location class refers to coordinates on a SequenceReference; if that sequence represents a coding transcript, then the coordinates refer to the coding transcript, and not a chromosome sequence to which it aligns. VRS intentionally does not allow for start or end values that use an offset system to represent sequence not found on the SequenceReference.
Todo
Describe and add a ref to an intronic variant profile