Splice donor — upstream intronic variant

This example demonstrates how a splice-adjacent intronic HGVS variant is represented using RelativeAllele. The variant occurs immediately upstream of a splice donor site and is expressed using HGVS c. notation.

HGVS expression

We will use the following HGVS expression from ClinVar: NM_001034954.3(SORBS1):c.2865+1G>T.

This expression specifies a single-nucleotide variant one base upstream of the coding exon boundary, placing the variant in the intron adjacent to a splice donor site.

Transcript context

NCBI Reference Sequence: NM_001034954.3

  • Minus strand

  • Splice donor (exon → intron boundary)

  • Upstream intronic (exon side of the exon boundary)

Transcript features at the splice junction

The exon boundaries below provide the local transcript context used to interpret the splice-adjacent HGVS position and to identify the exon boundary referenced by the HGVS expression.

CDS            184..4062
exon 29        2854..3048
                    /gene="SORBS1"
exon 30        3049..3822
                    /gene="SORBS1"

NCBI graphical sequence view

The figure below shows the same region in the NCBI Sequence Viewer, which displays transcript structure and exon boundaries aligned to the reference sequence.

NCBI Sequence Viewer showing the splice donor region in NM_001034954.3

NCBI Sequence Viewer showing the splice donor region in NM_001034954.3

Anchor selection

To represent this variant, an anchor is chosen at the inter-residue position corresponding to the exon boundary referenced by the HGVS expression. For splice-donor variants, this anchor is placed at the end of the exon preceding the intron.

Anchor selection is determined by splice context and does not depend on transcript orientation. In this example, the exon end occurs at inter-residue position 3048, which is used as the anchor.

The diagram below illustrates the selected anchor position relative to the exon boundary.

Calculating interbase coordinates for anchor

Mapping relative to the anchor

Offsets are applied relative to the anchor to identify the transcript-relative inter-residue interval corresponding to the HGVS position.

Because the anchor represents the point at which an alignment gap occurs (e.g. an exon junction mapped to two sides of an intronic sequence), anchorOrientation is used to select which side of the anchor is used as the reference point. For this splice-donor variant, the anchor is oriented to the left, selecting the side of the anchor immediately preceding the exon-intron boundary.

Offsets are expressed in inter-residue coordinates. In this example, offsetStart = 0 and offsetEnd = 1 select the single inter-residue interval immediately upstream of the exon boundary, corresponding to the nucleotide referenced by the HGVS expression.

Transcript orientation is then used to map this inter-residue interval to the genomic reference.

In this example, the variant is described relative to the transcript reference sequence (NM_001034954.3), where the reference base at the variant position is G.

The variant state is therefore shown on the transcript as the mapped state (G T), and on the corresponding genomic reference (NC_000010.11) as the base state (A T). Because this transcript is on the minus strand, the mapped and base states differ.

The diagram below illustrates the application of offsets relative to the anchor, the resolved inter-residue interval, and the relationship between the mapped and base states.

mapping

The corresponding exon boundaries on the genomic reference are shown below for visual confirmation.

UCSC Exon 30

Exon 30 boundary in the UCSC Genome Browser

UCSC Exon 29

Exon 29 boundary in the UCSC Genome Browser

Relative Allele representation

Together, the anchor position and offsets resolve the location of the variant, which can then be represented as a VRS RelativeAllele. The resulting object captures the transcript-relative mapping, the resolved genomic location, and the allele state expressed on both sequences.

{
  "type": "RelativeAllele",
  "relativeLocation": {
    "baseSequenceLocation": {
      "type": "SequenceLocation",
      "sequenceReference": {
        "type": "SequenceReference",
        "id": "NC_000010.11",
        "refgetAccession": "SQ.ss8r_wB0-b9r44TQTMmVTI92884QvBiB"
      },
      "start": 95339131,
      "end": 95339132
    },
    "mappedSequenceLocation": {
      "sequenceReference": {
        "type": "SequenceReference",
        "id": "NM_001034954.3",
        "refgetAccession": "SQ.SDt4gIJa8ChOmuI3te-3gpbJExmt1dHX"
      },
      "anchor": 3048,
      "anchorOrientation": "left",
      "offsetStart": 0,
      "offsetEnd": 1
    }
  },
  "mappedState": {
    "type": "LiteralSequenceExpression",
    "sequence": "T"
  },
  "baseState": {
    "type": "LiteralSequenceExpression",
    "sequence": "A"
  }
}