Nomenclature and Symbolism for Amino Acids and Peptides

3AA-18 and 3AA-19

Continued from 3AA-17

Contents of 3AA-18 and 3AA-19

3AA-18 Symbols for Substituents

3AA-19 Peptide Symbolism

References for 3AA-18 to 3AA-19

Continued in 3AA-20 and 3AA-21


3AA-18. SYMBOLS FOR SUBSTITUENTS

3AA-18.1. Use of Symbols

Groups substituted for hydrogen or hydroxyl may be indicated by their formulas or by symbols or by combination of both, e.g.

Benzoylglycine (hippuric acid) PhCO-Gly or C6H5CO-Gly

Note: the symbol Bz is often used for benzoyl in organic chemistry, and Bzl for benzyl, but because these symbols are so similar, the alternative PhCO and PhCH2 are preferable.

Glycine methyl ester Gly-OCH3 or Gly-OMe

Trifluoroacetylglycine CF3CO-Gly (Table 3, Note ii)

Suggestions for symbols to designate substituent (or protecting) groups common in peptide and protein chemistry are given in Tables 2, 3 & 4.

Click here for "table free" view if the tables below are faulty.

Table 2. Nitrogen substituents (protecting groups) of the urethane type

Benzyloxycarbonyl-Z- or Cbz-
2-(p-Biphenylyl)isopropyloxycarbonyl- [strictly 1-(biphenyl-4-yl)-1-methylethoxycarbonyl-]Bpoc-
p-Bromobenzyloxycarbonyl-Z(Br)-
t-Butoxycarbonyl- Boc- or ButOCO- or t-BuOCO- or Me3C-OCO-
α,α-Dimethyl-3,5-dimethoxybenzyloxycarbonyl-Ddz-
Fluoren-9-ylmethoxycarbonyl-Fmoc-
p-Methoxybenzyloxycarbonyl-Z(OMe)-
p-Nitrobenzyloxycarbonyl-Z(NO2)-
p-Phenylazobenzyloxycarbonyl-Pz-

Table 3. Non-urethane substituents for nitrogen, oxygen or sulfur

Acetamidomethyl- Acm-
Acetyl- Ac-
Benzoyl- (C6H5-CO-) PhCO- (or Bz-; see note in 3AA-18.1)
Benzyl- (C6H5 -CH2-) PhCH2- (or Bzl; see note in 3AA-18.1)
Carbamoyl- NH2CO- (preferred to Cbm-)
(3-Carboxy-4-nitrophenyl)thio- Nbs- (see 3AA-18.2)
3-Carboxypropanoyl- (HOOC-CH2-CH2-CO-) Suc- (see Note i)
Dansyl-, 5-(dimethylamino)naphth-l-ylsulfonyl- Dns-
2,4-Dinitrophenyl- Dnp- or N2ph (see Note ii)
Formyl- HCO- or For- (see Note iii)
4-Iodophenylsulfonyl- (pipsyl-)Ips-
Maleoyl- (-OC-CH=CH-CO-) -Mal- or Mal< (C-404.1 of [14])
Maleyl- (HOOC-CH=CH-CO-) Mal-
2-Nitrophenylthio- NpS (Nps- often used)
Phenyl(thiocarbamoyl)- PhNHCS- or Ptc-
Phthaloyl- -Pht- or Pht<
Phthalyl- (o-carboxybenzoyl-) Pht-
Succinyl- (-OC-CH2-CH2-CO-) -Suc- or Suc< (see Note i)
Tosyl- Tos-
Trifluoroacetyl- CF3CO-
Trityl- (triphenylmethyl-) Ph3C- or Trt-

Notes

(i) In organic nomenclature (C-404.1 of [14]), 'succinyl' signifies the bivalent group formed from succinic acid by removal of both hydroxyl groups, but in biochemical usage it usually signifies the 3-carboxypropanoyl group, e.g. succinyl-CoA.

(ii) The use of D for 'di' and T for 'tri' and 'tetra' is discouraged if these apply to atoms or groups for which simple symbols exist, e.g. in CF3 CO-, Me3Si and H4 folate. We feel less strongly when their avoidance involves giving unusual meanings to symbols, e.g. N for nitro, so Dnp and N2ph are offered as alternative symbols for dinitrophenyl. See also Note ii of 3AA-15.2.5.

(iii) The symbol HCO- is preferred to CHO- for the formyl group, because CHO- has sometimes been used to indicate the attachment of carbohydrate.

Table 4. Substituents at the carboxyl group

GroupSymbolName of glycine derivative (see note)
Benzotriazol-1-yloxy-OBt1-(Glycyloxy)benzotriazole
Benzyloxy-OCH2Ph (or-OBzl, see note in 3AA-18.1)Glycine benzyl ester
tert-Butoxy-OCMe3 or -OButGlycine t-butyl ester
Diphenylmethoxy-OCHPh2 or -OBzh Glycine diphenylmethyl ester (or benzhydryl ester)
Ethoxy-OEtGlycine ethyl ester
Methoxy-OMeGlycine methyl ester
4-Nitrobenzyloxy-ONbGlycine 4-nitrobenzyl ester
4-Nitrophenoxy-ONpGlycine 4-nitrophenyl ester
4-Nitrophenylthio-SNpThioglycine S-(4-nitrophenyl ester)
Pentachlorophenoxy-OPcpGlycine pentachlorophenyl ester
Phenylthio-SPhThioglycine S-(phenyl ester)
Quinolin-8-yloxy-OQuGlycine quinolin-8-yl ester
Succinimido-oxy-ONSu or -OSuN-(Glycyloxy)succinimide
2,4,5-Trichlorophenyloxy-OTcpGlycine 2,4,5-trichlorophenyl ester

Note. Carboxyl substituents will not normally appear as prefixes in the names of derivatives of amino acids or peptides, so the name of the group, its prefix name, given in column 1, is little used in naming compounds. Column 3 is therefore given to show how derivatives containing the group are named (by one of the alternative methods of 3AA-9.1).

See Addendum for substituents on a terminal amide group.

3AA-18.2. Principles of Symbolizing Substituent Groups and Reagents

Many reagents used in peptide and protein chemistry for modifying (often protecting) amino, carboxyl and side-chain groups in amino-acid residues have been designated by a variety of acronymic abbreviations, too numerous to list here. Extensive and indiscriminate use of such abbreviations is discouraged, especially when the accepted trivial name of the reagent is short, e.g. tosyl chloride, trityl chloride, etc.

It can be useful to symbolize a reagent in such a way that the group transferred retains its identity in a reaction, e.g.

Dns-Cl + Gly → Dns-Gly + H+ + Cl-
Dnp-F + NH2-R → Dnp-NH-R + H+ + F-

For this reason Dns-Cl is usually preferred to DNS for dansyl chloride (although the full name is short enough for most textual use), and Dnp-F to the original FDNB for l-fluoro-2,4-dinitrobenzene, and similarly Nbs2 in place of DTNB for 3,3'-dithiobis(6-nitrobenzoic acid) (Ellman's reagent) and (PriO)2PO-F or Dip-F for diisopropyl fluorophosphate.

Symbols constructed from known elements are more readily understood than arbitrary abbreviations, e.g. Tos-Arg-OMe rather than TAME for tosylarginine methyl ester, and Tos-Phe-CH2Cl rather than TPCK for 'tosylphenylalanine chloromethyl ketone', a name incorrectly used for tosylphenylalanylchloromethane (3AA-10.2), but misleading because it erroneously specifies the carbonyl group twice.

3AA-19. PEPTIDE SYMBOLISM

3AA-19.1. Peptide Chains

The amino-acid symbols were developed for representing peptide sequences (3AA-16). Peptides containing bonds other than between C-1 and N-2 of adjacent residues are also easily represented (3AA-16 to 3AA-18). Examples:

Click here for "table free" view if the examples below are faulty.

GlycylglycineGly-Gly
N-α-GlutamylglycineGlu-Gly
N-γ-Glutamylglycine
ThyroliberinGlp-His-Pro-NH2
Angiotensin IIAsp-Arg-Val-Tyr-Ile-His-Pro-Phe
Glutathione
Note. would represent the corresponding thiol ester with a bond between the γ>-carboxyl of glutamic acid and the thiol group of cysteine.
N2-α-GlutamyllysineGlu-Lys
N6-α-Glutamyllysine
N2-γ-Glutamyllysine
N6-γ-Glutamyllysine

Symbols for modified residues or names of compounds may be used in such formulas. Thus a peptide with a C-terminal aldehyde may be shown using either a name or a symbol constructed according to 3AA-16.3. Example:

Ac-Leu-Leu-argininal or Ac-Leu-Leu-Arg-H

(If the second method is used, the symbol should be explained to avoid confusion.)

If part of a sequence is unknown, but its composition can be specified, this may be indicated by parentheses, with commas between the residues listed as present, e.g. Ala-Lys-(Ala,Gly3,Val2)-Glu-Val.

If a peptide must be written on more than one line, we advise placing a hyphen at the end of each line to be continued (where it has its usual meaning of a continuation symbol), and also at the start of the next line (where it represents the peptide bond), e.g.

Ala-Ser-Tyr-Phe-Ser-
-Gly-Pro-Gly-Trp-Arg

In diagrams the two lines can usually be joined, as in but such a break may also be needed in textual material where this is not possible.

3AA-19.2. Use of Configurational Prefixes

Residue symbols written in a sequence denote the L configuration for chiral amino acids, unless otherwise indicated (3AA-14.5). A D residue is shown by inserting a D before the symbol, separated from it by a hyphen (which may be omitted to make the number of residues appear more clearly).

The symbol DL signifies a racemic mixture, so should not occur in the designation of peptides with more than one chiral residue; coupling of a DL-amino acid with a chiral peptide leads to a mixture of diastereoisomeric products whose ratio may depend on the conditions of the reaction and will not in general be unity. To indicate that both are present, ambo may be used (3AA-13.2), and thus the mixture of products formed by acylating L-leucine with DL-alanine may be represented as ambo-Ala-Leu, and a mixture of Phe-Ala-Leu and Phe-D-Ala-Leu may be represented as Phe-ambo-Ala-Leu.

A residue of unknown configuration may be indicated by the prefix ξ (Greek xi), e.g. ξ-Ala.

3AA-19.3. Representation of Charges on Peptides

It is usually convenient to use the same abbreviated formula for a peptide regardless of its state of ionization. To indicate or stress the charges on a peptide, plus and minus signs may be placed over residues with charged side chains and on either side of the formula to represent charged termini, e.g.

error details
Such signs may be circled for clarity.

If, however, it is desired to indicate charge by formal modification of the symbols for residues, this may be done as follows.

(i) Protonation of the N-terminus. The sign +H is placed beside the symbol for the N-terminal residue without a hyphen between (since a hyphen would signify removal of H). This gives, for example, +HGly-. We prefer this to the alternative recommendation [10] of adding +H2-, to give, for example, +H2-Gly-, because it seems artificial to remove one hydrogen before adding two, and because the hyphen here fails to represent a single bond.

(ii) Deprotonation of the C-terminus. The symbol -O- is placed on the right of the C-terminal residue. Its hyphen signifies removal of -OH from the carboxyl group, so this is replaced by -O.

(iii) Protonation of Side-Chain basic groups. 'H+' is placed above the amino-acid symbol in the two-line representation, or after it, e.g. LysH+, in the one-line system. No lines or parentheses are used, since they would imply removal of H. In earlier [10] recommendations 'H2+, was added with a vertical line or parentheses, but again (cf. i) the line represented no single bond.

(iv) Deprotonation of Side-Chain Acidic Groups. The symbols Asp and Glu may have O- placed at the end of a vertical line above or below them, or in parentheses after them (cf. ii), since O- replaces the OH removed. Other acidic residues, e.g. Cys, have the charge alone at the end of the vertical line or in parentheses, since the group removed here is H.

Hence the two ionic forms shown above for a peptide could be drawn as

An isoelectric form of Gly-Lys-Gly could be drawn as

whereas its dihydrochloride could be drawn as

3AA-19.4. Peptides Substituted at N-2 (see 3AA-16.2 and 3AA-17.1)

Click here for "table free" view if the examples below are faulty.

Glycylnitrosoglycine
Glycylsarcosine (see Appendix)
Glycyl-N-acetylglycine
N,N-diglycylglycine

3AA-19.5. Cyclic Peptides

3AA-19.5.1. Homodetic Cyclic Peptides

Cyclic peptides in which the ring consists solely of amino-acid residues in eupeptide linkage may be called homodetic cyclic peptides. Three representations are possible:

(i) The sequence is formulated in the usual manner but placed in parentheses and preceded by 'cyclo'. Example: gramicidin S

cyclo(-Val-Orn-Leu-D-Phe-Pro-Val-Orn-Leu-D-Phe-Pro-)

or (see 3AA-19.2, sentence 2)

cyclo(-Val-Orn-Leu-DPhe-Pro-Val-Orn-Leu-DPhe-Pro-)

(ii) The sequence is again written in one line, but the residues at each end of the line are joined by a lengthened bond, e.g.

or (3AA-19.2, sentence 2)

(iii) The residues are written on two lines, so that the sequence is reversed on one of them. Hence the CO to NH direction within the peptide bond must be indicated by arrows (3AA-16.2 and 3AA-16.3). Hence gramicidin S may be written (using the option of 3AA-19.2, sentence 2):

3AA-19.5.2. Heterodetic Cyclic Peptides

Heterodetic cyclic peptides are peptides consisting only of amino-acid residues, but the linkages forming the ring are not solely eupeptide bonds; one or more is an isopeptide, disulfide, ester, or other bond.

Their symbolic representation follows logically from that of substituted amino acids (3AA-16.4). Examples:

Cyclic ester of threonylglycylglycylglycine or (3AA-17.6)

3AA-19.6. Depsipeptides

Depsipeptides are oligomers formed from amino acids and other bifunctional acids, usually hydroxy acids. They are often cyclic. In symbolic representation, any special symbols used for the hydroxy acids should be defined.

3AA-19.7. Peptide Analogues

Analogues of peptides in which the -CO-NH- group that joins residues is replaced by another grouping may be indicated [25] by placing a Greek psi, followed by the replacing group in parenthesis, between the residue symbols where the change occurs. Examples:

Ala-[psi](NH-CO)-Ala for NH3+ -CHMe-NH-CO-CHMe-COO-
Ala-[psi](CH=CH,trans)-Ala for NH3+ -CHMe-CH=CH-CHMe-COO-

3AA-19.8. Alignment of Peptide and Nucleic-Acid Sequences

Although hyphens between residues are important in representing peptide sequences (3AA-16), they may be omitted (I) if it is necessary to align sequences with those of nucleic acids; this is an alternative to separating triplets (II):

            MetSerIleGlnHis                 Met-Ser-Ile-Gln-His

   (I)   AGTATGAGTATTCAACAT      (II)   AGT ATG AGT ATT CAA CAT
         TCATACTCATAAGTTGTA             TCA TAC TCA TAA GTT GTA

References

7. International Union of Biochemistry (1978) Biochemical Nomenclature and Related Documents, The Biochemical Society, London.

10. IUPAC-IUB Commission on Biochemical Nomenclature (CBN), Symbols for Amino-Acid Derivatives and Peptides, Recommendations 1971, Arch. Biochem. Biophys. 150, 1-8 (1972); Biochem. J. 126, 773-780 (1972), corrected l35, 9 (1973); Biochemistry 11, 1726-1732 (1972); Biochim. Biophys. Acta, 263, 205-212 (1972); Eur. J. Biochem. 27, 201-207 (1972), corrected 45, 2 (1974); J. Biol. Chem. 247, 977-983 (1972); Pure Appl. Chem. 40, 315-331 (1974); also pp. 78-84 in [7].

14. International Union of Pure and Applied Chemistry (1979) Nomenclature of Organic Chemistry, Sections A, B, C, D, E, F and H, Pergamon Press, Oxford.

25. Morley, J. S. (1981) Neuropeptides, 1, 231-235.


Continue to the next section with 3AA-20 and 3AA-21 of Amino Acids and Peptides.

Return to Amino Acids and Peptides home page.