Ambiguity characters are easily derived and complemented using Ambiscript

Ambiscript symbols can be simply overlaid encode the range of nucleotides found at a genetic locus. As with their constituent parts, it is possible to complement Ambiscript ambiguity characters by rotating them 180°.

In addition to defining characters for the four common DNA bases1, the International Union of Pure and Applied Chemistry (IUPAC) formalized an eleven-character alphabet for representing the full range of possible nucleotide combinations that could be found at a specific genetic locus2. As illustrated in the figure on the right, the IUPAC “ambiguity” characters were derived from the Roman alphabet in a largely arbitrary fashion, making them difficult to remember and complement.

Ambiscript overcomes difficulties associated with IUPAC ambiguity characters by merging the symbols for all possible bases to convey the full range of positional variability. This approach, which is also illustrated in the accompanying figure, has two important consequences:

  1. Ambiguity characters are readily constructed and deciphered as the sum of their parts.
  2. As with other Ambiscript symbols, ambiguity characters can be complemented by rotating the text 180°.

1 1970. IUPAC-IUB Commission on Biochemical Nomenclature (CBN). Abbreviations and symbols for nucleic acids, polynucleotides and their constituents. Recommendations 1970.Biochem. J. 120(3):449-454. (PMID: 5499957)

2 1986. Nomenclature for incompletely specified bases in nucleic acid sequences. Recomendation 1984. Nomenclature Committee of the International Union of Biochemistry (NC-IU). Proc. Natl. Acad. Sci. USA 83:4-8. (PMID: 2417239)