Splice acceptor — downstream intronic variant

This example demonstrates how a splice-adjacent intronic HGVS variant is represented using RelativeAllele. The variant occurs immediately downstream of a splice acceptor site and is expressed using HGVS c. notation.

HGVS expression

We will use the following HGVS expression from ClinVar: NM_001034954.3(SORBS1):c.1361-2836A>G.

This expression specifies a single-nucleotide variant located downstream of the coding exon boundary, placing the variant deep within the intron adjacent to a splice acceptor site.

Transcript context

NCBI Reference Sequence: NM_001034954.3

  • Minus strand

  • Splice acceptor (intron → exon boundary)

  • Downstream intronic (exon side of the exon boundary)

Transcript features at the splice junction

The exon boundaries below provide the local transcript context used to interpret the splice-adjacent HGVS position and to identify the exon boundary referenced by the HGVS expression.

CDS            184..4062
exon 15        1481..1543
                    /gene="SORBS1"
exon 8         1544..1591
                    /gene="SORBS1"

NCBI graphical sequence view

The figure below shows the same region in the NCBI Sequence Viewer, which displays transcript structure and exon boundaries aligned to the reference sequence.

NCBI Sequence Viewer showing the splice acceptor region in NM_001034954.3

NCBI Sequence Viewer showing the splice acceptor region in NM_001034954.3.

Anchor selection

To represent this variant, an anchor is chosen at the inter-residue position corresponding to the exon boundary referenced by the HGVS expression. For splice-acceptor variants, this anchor is placed at the start of the exon following the intron.

Anchor selection is determined by splice context and does not depend on transcript orientation. In this example, the exon start occurs at inter-residue position 1543, which is used as the anchor.

The diagram below illustrates the selected anchor position relative to the exon boundary.

Calculating interbase coordinates for anchor

Mapping relative to the anchor

Offsets are applied relative to the anchor to identify the transcript-relative inter-residue interval corresponding to the HGVS position.

Because the anchor represents the point at which an alignment gap occurs (e.g. an exon junction mapped to two sides of an intronic sequence), anchorOrientation is used to select which side of the anchor is used as the reference point. For this splice-acceptor variant, the anchor is oriented to the right, selecting the side of the anchor immediately following the intron-exon boundary.

Offsets are expressed in inter-residue coordinates. In this example, offsetStart = -2837 and offsetEnd = -2836 select the single inter-residue interval downstream of the exon boundary, corresponding to the nucleotide referenced by the HGVS expression.

Transcript orientation is then used to map this inter-residue interval to the genomic reference.

In this example, the variant is described relative to the transcript reference sequence (NM_001034954.3), where the reference base at the variant position is A.

The variant state is therefore shown on the transcript as the mapped state (A G), and on the corresponding genomic reference (NC_000010.11) as the base state (C G). Because this transcript is on the minus strand, the mapped and base states differ.

The diagram below illustrates the application of offsets relative to the anchor, the resolved inter-residue interval, and the relationship between the mapped and base states.

mapping

The corresponding exon boundaries on the genomic reference are shown below for visual confirmation.

UCSC Exon 16

Exon 16 boundary in the UCSC Genome Browser

UCSC Exon 15

Exon 15 boundary in the UCSC Genome Browser

Relative Allele representation

Together, the anchor position and offsets resolve the location of the variant, which can then be represented as a VRS RelativeAllele. The resulting object captures the transcript-relative mapping, the resolved genomic location, and the allele state expressed on both sequences.

{
  "type": "RelativeAllele",
  "relativeLocation": {
    "baseSequenceLocation": {
      "type": "SequenceLocation",
      "sequenceReference": {
        "type": "SequenceReference",
        "id": "NC_000010.11",
        "refgetAccession": "SQ.ss8r_wB0-b9r44TQTMmVTI92884QvBiB"
      },
      "start": 95387120,
      "end": 95387121
    },
    "mappedSequenceLocation": {
      "sequenceReference": {
        "type": "SequenceReference",
        "id": "NM_001034954.3",
        "refgetAccession": "SQ.SDt4gIJa8ChOmuI3te-3gpbJExmt1dHX"
      },
      "anchor": 1543,
      "anchorOrientation": "right",
      "offsetStart": -2837,
      "offsetEnd": -2836
    }
  },
  "mappedState": {
    "type": "LiteralSequenceExpression",
    "sequence": "G"
  },
  "baseState": {
    "type": "LiteralSequenceExpression",
    "sequence": "C"
  }
}