Molecule : Molecules

Define Molecular Information in a CCPN Project

This popup window is used to setup all of the molecular information that will be used in a CCPN project. Naturally, the objective is to define the molecular entities that were present in the sample when an NMR experiment was run so that the resonances that appear in the resulting spectra can be ascribed to particular atoms, residues and polymer chains.

Chemical Elements & Compounds

In general this system is concerned with several different concepts which combine to give the final molecular description. The first concept is that of the reference information that relates to chemical elements and chemical compounds. As far as the user is concerned this kind of information is mostly fixed; there should be only one invariant definition of of the element carbon or the amino acid histidine for example. A chemical compound in this respect contains a description of a group of atoms and how they are bound together, and also lists the different forms that the compound will take, like protonation states. For a given type of molecule (protein, DNA, RNA etc) a chemical compound will be identified by a residue code name, e.g. histidine is “His”. In CCPN this name is usually referred to as the “Ccp code”, and is often, but not always three letters. The common biological residues will also have a one-letter code, e.g. histidine is “H”, that may be used in some circumstances. Defining the sequence of a biopolymer with residue code names, or letters, is really selecting a list of connected chemical compound descriptions. Most of the chemical compounds that are in common use will be pre-defined and available to any CCPN project; either directly or via automated download. It is only bespoke or unusual compounds that will need to be defined by the user and imported. For example the user can import PDB and mol2 format compound descriptions and import them into CCPN via the Format Converter.

Molecule Templates

The second major molecular concept in CCPN is the molecule template, which records a sequence of chemical compounds and how they are linked together. A template will also record which version of a chemical compound is used, for example whether a Cys residue is disulphide linked and how acidic and basic residues are protonated. A molecule template does not directly contain any atom information, but this is available indirectly by virtue of the template residues linking to a given chemical compound. As the name suggests, a molecule template is not the entity that is directly used during the NMR assignment process; it is just a sequence template that is used to make one or more molecular systems. If a single chemical compound is going to be used in NMR assignment, e.g. a ligand or cofactor, then that compound will still be used within a molecule template, it is just that the template will only contain one sequence item/residue.

Molecular Systems

Molecular systems are groups of atoms, residues and chains (protein, DNA etc..) that are involved in the assignment process. They are constructed using sequence information from molecule templates and atom information from the chemical compound descriptions. Molecular systems can be thought of as representing all of the atoms that are present within an NMR sample or biolmolecular assembly. If a molecular system is a homo-oligomeric protein complex, where there are several chains of the same sequence, then the sample molecule template will be used to generate each of the sub-units, but the result will be three separate polypeptide chains, and each can be assigned separately to NMR data. For example the bacteriorhodposin complex is a homo-trimer, and so to make this complex available in CCPN the user will enter the sequence of the protein chain once (to make the a template) and then use the sequence three times to make three chains A, B and C.

Chains Tab

The first tab in the popup window lists all on the assignable molecular chains that are available in the project. For example if the CCPN project was setup for a dimer then you might have chain A and chain B listed in a given molecular system. Here the user can delete chains, if they are not assigned, and create new chains. There are three basic ways of making chains, the first is to copy an existing chain, although naturally the user is asked to select a different identifying chain letter/code to distinguish it from the original. The second way is to make a chain from an existing molecule template ([Make Chain From Template]) and the third way is to make a new chain and a new molecular template at the same time (via [Add Sequence]).

Most commonly, for simple monomeric protein systems, the user will click [Add Sequence]. After entering a sequence of residue codes, and making sure that a new molecular system and a new template molecule are to be made, a new molecular chain is created and ready for assignment. If the molecular system is more complicated and involves several different chains, then the user may use [Add Chain] several times for each of the chains within the complex, making sure that the same destination molecular system is chosen each time. Alternatively, the user can add all of the template molecules first, and then make the chains from the template molecules afterwards. This method gives most flexibility and is recommended if any of the molecules have a non-standard connectivity or pronation state; this cannot be changed in a template once a chain has bee created. For example to add insulin, which has two discrete regions of polypeptide that are disulphide linked, the user would make a molecule first by adding two polypeptide sections. change the Cys residues to the disulphide state and make the two disulphide links. Only after the molecule template definition was complete would the user [Make Chain From Template].

The lower table lists the “Chain Fragments” that are present in the chain entity selected in the upper table. Each fragment represents a discrete segment of biopolymer sequence, and although fragments are bound together in the same chain they are not connected by the usual biopolymer links. For example insulin will consist of two polypeptide chain fragments that are connected together by two disulphide bonds. Most of the time however a biopolymer will consist of only one chain fragment with a continuous backbone. The starting sequence number of a chain fragment can be edited by the user, as long as fragments do not overlap numerically. In this way the user can control the numbering that is present in a chain and thus used in assignment. This sequence numbering is reversible and can be changed at any time with immediate effect (although Analysis can be slow to update if there are many assignments).

Mol Systems Tab

This table list all of the molecular systems within the CCPN projects. These are used to contain the chains that go together to form a multi-component complex or NMR sample. The user can add new, empty molecular systems with a view to adding assignable chains to them at a later time. The user can also delete a molecular system, and thus all of the chains that it contains, but only if these chains do not have any assignments.

Sequences Tab

This table is used to list the sequences that are defined within the CCPN project. The user selects the molecular template in the upper pulldown menu and the sequence is listed in the table. If a sequence has not been used to make any chains (in a molecular system) then the sequence can be changed; the user can adjust the kind of residue at a given position, the compound version or (descriptor) and how the residue links to others. For example the user could setup protonation states, disulphide links and join/break sequence fragments. For a new molecule the user can [Add Polymer] and [Add Compound] multiple times, thus creating discrete bits of sequence that can be combined together into a larger molecule. In this way the user could add a polymer sequence and then a compound representing a cofactor. Alternatively, to construct insulin [Add Polymer] twice and then [Edit Links] after setting Cys residue “descriptors” to “link:SG”.

The [Copy Molecule] function is used to make a new molecule based on the sequence of an existing one, but this is intended to help add similar sequences with small differences in residues, protonation states and links. It is not intended to make identical copies of a molecule; identical copies are never needed since a single chain can be used multiple times in as many chains and molecular systems as is required.

The [Lock Molecule] function can be used to prevent any further alteration to a sequence. Any molecule that has been used to make any chains (for assignment) will be automatically be locked and cannot be unlocked until its chains have been deleted. This locking mechanism means that chain any assignments then may carry cannot be ruined by editing sequences. If the user genuinely made a mistake in entering a sequence after assignments have already been made to a chain then the best course of action is to make a new molecule template, with the correct sequence, a new molecular system, i.e. building chains using the new template, and copy the assignments from the old molecular system to the new one via Copy Assignments; selecting “Between Molecule Chains”.

Template Molecules Tab

This section lists all of the available sequence templates in the upper table and a list of the selected molecule’s attributes in the lower table. The user can copy, add, extend and delete molecules. Any fine scale editing of the molecule template should be done via the adjacent “Sequences” tab. In keeping with the above description [Delete Molecule], [Add Polymer Residues] and [Add Compound] may only be used if a molecule is unlocked and has not been used to make any chain entities, i.e. in a molecular system. Otherwise the [Add ...] functions operate in the same way as the equivalent in the “Sequences” tab.

Links

This tab is used to specify how the sequence elements of a molecule template are connected together. It should be noted that these links can only be edited if the template in question is not used in any chains and is unlocked (via the Sequences tab). The general idea is that in the sequences table you select a template residue and click [Edit Links] or you select the required template molecule and residue in the pulldown menus in the “Intra Molecule Links” section.

With the required template residue selected the upper table is filled with rows that represent the covalent links that are, or could be, made from the selected residue to other “Destination” template residues. If the molecule is unlocked then the user can change or set the residue that is being linked to by double clicking in the “Destination Residue” column and selecting the required residue option. Note that only template residues that have an unsatisfied link will be shown as possible targets. If the desired destination is not free you can go to that residue and remove an existing link (setting destination to “<None>”) before making the required a new one. Most biopolymer residues will show the “prev” and “next” connections that represent the regular backbone connectivity; peptide for proteins, phosphodiester for nucleic acids. These connections are rarely changed, but can be adjusted to beak or join a biopolymer. More typically potential side chain and branching links are adjusted. For example if a Cys amino acid has had its descriptor set to “Link:SG” then an extra type of link will appear in the table. The user can then connect the SG of the Cys to any other free Cys SG, thus specifying a disulphide bridge.

The “Inter Molecule Links” section is not currently used, but will be filled in due course.

Add Sequence

This section is used to add new biopolymer sequences to a CCPN project. The sequence can be used to make a new molecule template, extend an existing molecule template or make a molecule template and an assignable chain at the same time. Initially the user should set the required polymer type to say what kind of molecule is being made; “GCAT” in DNA is different to “GCAT” in protein. Then the user sets whether the sequence will be entered as one-letter codes or as Ccp codes (often three-letter residue codes). Note that three-letter codes could be interpreted as one-letter codes without error, i.e. “ALA” could be interpreted as “Ala, Leu, Ala” if the one-letter option was on. Once the sequence is entered correctly the user can switch between the one-letter and Ccp codes without a problem. The start number sets what the number of the first residue in the sequence will be, although this can be changed at any time after. The “Cyclic” option means that a biopolymer link will be made between the first and last residue, e.g. for a protein the amide which would otherwise be the N-terminus is peptide linked to the last carbonyl group.

The “Destination Molecule” and “Destination Mol System” settings are important because they govern how a sequence will be used in relation to existing molecular descriptions. If the destination molecule is set to anything other than “<New>” then the system will attempt to add the entered sequence on to the end of the selected molecule template. This may fail if the selected template has chains or is otherwise locked. If a new section of sequence is added to an existing molecule template then the user can adjust how those template residues are connected to any others via the “Sequences” and “Links” tabs. The setting for the destination molecular system dictates whether the molecule template that the sequence will make (or will extend) is used immediately to make an assignable chain entity. For adding simple, single protein chains this is usually the case and the “Destination Mol System” is set to “<New>”, although by selecting an existing molecular system the user can add a new subunit to the specification of a complex. Setting the molecular system to “<None>” allows the user to add a sequence template without making any chains at all. This is important if the molecule template needs further adjustment, for example to define non-backbone links or set protonation states. If the setting was not “<None>” and a chain was erroneously made, then the user should delete the chain in the “Chains” tab and [Unlock Molecule] in the “Sequences” tab before making alterations.

The required biopolymer sequence is entered in the “Sequence Input” text panel. Usually this is done by using [Read File] to get the sequence data from disk, but it is also possible to place sequences into the text area on Linux systems via the mouse middle button, select-and-paste mechanism. Once sequence appears in the text area it may be further edited by typing. At any stage the user may click the [Tidy] function which will number the sequence and arrange it in neat rows. This also has the helpful effect of checking whether the sequence is valid and will be interpreted by the CCPN system without error. Once the sequence is correct and the other settings, including the destination molecular system, have been checked the clicking the [Add Sequence!] button commits the sequence by making the specified molecule template and any new chains.

Small Compounds

Here the user can add small compounds to molecule templates. These may be any ligands, co-factors, solvents, metabolites etc. that are assignable. Once a compound had been added to a new or existing molecule template, the compound can be made accessible for assignment by using its template to create a chain.

Generally the user selects which kind of biopolymer the compound relates to, or “other” of is not associated with any polymer type. In the table on the right the user scrolls and selects the required chemical compound row. If idealised atomic coordinates are known for the selected compound these are displayed in the 3D coordinate display on the left. An absence of such coordinates does not matter with regard to selecting a compound for assignment. If a compound row is coloured blue, then the compound is available locally and may be used immediately. If the row is grey, then the compound will be downloaded from a remote database before use. Any downloading of this kind is automatic, but naturally requires an active Internet connection.

Once the desired small molecule is selected the user next selects the required destination molecule template in the pulldown menu below the table. The compound can either be put into a new template, i.e. on its own, or added to an existing molecule, as would normally be done for a covalently linked cofactor. Finally [Add Compound] actually added the selected compound to the molecule template. To actually assign to this molecule the molecule template must be used to make a chain (i.e. in a given molecular system), which is readily achieved by using the [Make Chain From Template] function from the “Chains” tab.

Caveats & Tips

In order to delete a chain that carries assignments, the chain must be cleared of assignments first. The assignments could be moved to a different chain via the Copy Assignments system or removed completely. Completely removing a chain’s assignments can be done via the Spin Systems or Resonances tables by selecting the rows that relate to the relevant chain (sorting rows by residue may help) and then deassigning the spin systems or resonances.

The sequence numbering of a chain can be altered at any stage by editing the “Start Seq Number” in the “Chain Fragments” table of the “Chains” tab. Although the start number can be adjusted, currently the residue numbering will always be sequential within a given fragment. The residue template numbering in the “Sequences” tab cannot be changed, although this may change in the future. However, the template numbering system is separate from the chain numbering system.

Main Panel

button Clone: Clone popup window

button Help: Show popup help document

button Close: Close popup

Chains

A table of all the molecular chains used in assignment, possibly from various samples and complexes

Table 1
Mol System The short name of the molecular system to which the chain belongs; a grouping of chains, e.g. in a complex
Chain Code The short textual identifier for the chain, for graphical displays and assignment annotations (where required)
Molecule Template The name of the molecular sequence template on which the chain is based
Residues The number of residue monomers (or other separate chemical compounds) that make up the chain
Chain Fragments The number of discontinuous polymer sequence fragments into which the chain is broken
Molecular Types The kind of bio-polymer the chain represents; usually just a single kind, but hybrids like DNA/RNA are possible
Details A user-editable textual comment for the molecular chain, which defaults to the initial snippet of one-letter sequence (Editable)

button Show Seq: Show a table containing the residue sequence for the chain, together with associated linking and protonation states

button Delete Chain: Delete the selected chain specification, if it is not assigned to NMR resonances (via its atoms)

button Copy Chain: Make a duplicate of the selected chain within its molecular system, using the next available chain code letter; convenient for making homo-oligomers

button Add Sequence: Add a new bio-polymer sequence, to make a molecule template, from which a new chain may be built

button Edit Molecule Templates: Show a table of all the molecule templates available in the project, from which chains may be built

pulldown Mol System for new chain: Selects which molecular system group to place new chains in (when constructed with a sequence template)

pulldown Template for new chain: Selects which molecular sequence template is used to make a new chain (which can then be assigned)

button Make Chain From Template: Using the selected molecule template make a new chain withing the stated molecular system group; the chain made can be used for NMR assignments

Chain Fragments

Table 2
# The serial number of the fragment within a chain
Mol Type The type of bio-polymer represented in the fragment
Residues The number of residues connected within the fragment of the chain
Start Seq Number The number of the first residue in the fragment, for graphical display; change this to set the residue numbering used in assignment (Editable)
Linear Polymer? Whether the fragment represents a standard linear bio-polymer
Sequence A short snippet of the fragments sequence, for human appreciation

Mol Systems

A table of the molecular system groups: collections of polymer chains and small molecules that are in the same sample or complex

Table 3
Code A sort textual code for the molecular system grouping; set when the molecular system is first created
Name An editable, descriptive name for the molecular system, e.g. the name of a multimeric complex (Editable)
Chains The identifying codes of the molecular chains that are present in the molecular system group
Molecules The names of the molecule templates which have been used to make the chains of the molecular system
Mol Types The bio-polymer or other molecule types present in the molecular system
Structures The number of structure ensembles (coordinate sets) associated with the molecular system
Details A user-edited verbose description or comment about the molecular system (Editable)

button Add Mol System: Add a new, blank molecular system to the CCPN project; which will then go on to contain chains of residues and their atoms

button Delete Mol System: Delete the selected molecular system definition, but only if it carries no NMR assignments

Sequences

A table of the residue sequences for the molecules defined in the project, including bio-polymer links and protonation state

pulldown Molecule: The name of the molecular template to display the sequence for

Table 4
Seq Code The number of the residue in the sequence; separate to the chain (fragment) numbering which can be edited by the user
Ccp Code The identifying code for the kind of residue; usually three letters, but not always so
Descriptor & Stereochemistry A description of the residue’s protonation and stereochemical state; can only be set for unlocked templates, which have not yet been used to make chains
Mol Type The bio-polymer or other type of the residue
Polymer Linking The position of the residue relative to the bio-polymer backbone; a description of which kinds of links are made
Linked Residues The identities of residues that are covalently linked, including both regular bio-polymer links and other non-standard kinds

button Add Polymer: Add a bio-polymer sequence to construct a new molecular template, from which chains may be constructed

button Copy Sequence: Copy the currently visible residue sequence into the “Add Sequence” tool; as a basis for making a similar sequence

button Add Compound: Add an isolated compound (small molecule) to the current template; which may include cofactors or sugar residues

button Edit Links: Display and edit the covalent links for the selected residue; only possible for unlocked molecules (without assignments)

button Lock Molecule: Unlock the molecular sequence template so that its links and protonation states etc make be modified. Unlocking is only possible if the template has not been used to make chains

Template Molecules

A table of all of the molecules in the project; used as sequence templates to build assignable “chain” entities

Table 5
Name The short name for the molecule template, for use in graphical displays; set when the template is first created
Long Name A verbose name describing the molecular sequence; typically the protein or gene name (Editable)
Mol Type The types of bio-polymer or other molecule represented by the sequence template
Chains Which chains the sequence template has been used to construct (inside molecular systems)
Details User-supplied textual comment for the molecule template (Editable)
Seq Details User-supplied comments specifically regarding the sequence of the molecule (Editable)

button Show Seq: Show a table containing the residue sequence of the selected molecule template

button Add New Molecule: Add a new, blank molecular template, to which bio-polymer sequence or small molecules may be added

button Copy Molecule: Make another molecular template with the same sequence as the selected; allows identical sequences but different protonation states

button Delete Molecule: Delete the selected molecular sequence template; only allowed if it is not used by any molecular system chains

button Add Polymer Residues: Add a sequence of bio-polymer residues to the selected template; only allowed if the template is unlocked

button Add Other Compound: Add a non-bio-polymer or small compound to the selected template; only allowed if the template is unlocked

Molecule Data

Table 6
Attribute Various characteristics and attributes of the selected molecule
Value The value of the parameter described on each row

Add Sequence

Load or paste bio-polymer (protein, DNA, RNA, carbohydrate) sequences to build molecule templates

pulldown Polymer Type: Selects which kind of bio-polymer sequence is being added

int 1: Sets the number of the first residue in the sequence; currently not subsequently editable, but can be adjusted when making chains

check Cyclic: Whether the polymer is circular, with the end of the sated sequence connecting to the beginning in the normal bio-polymer manner

radio 1-Letter: The sequence will be specified in one-lette code form like “GCAT” or “TEAANDCAKE”

radio 3-Letter/Ccp: The sequence will be specified in three-letter (or full CCPN) code form like “ALA GLY MET SER”

pulldown Destination Molecule: Selects which molecule template the sequence will be added to; when making a new molecule you will be prompted for the name

pulldown Destination Mol System: If set, specifies that the sequence should be used to immediately make an assignable chain; this disallows further editing (e.g. of protonation state)

pulldown Ccp Codes: Selects a residue type from a list of all of the CCPN residue codes available for the selected kind of bio-polymer molecule

button Append: Add a residue of the kind selected in the adjacent pulldown menu to the end of the sequence

Sequence Input

*None*: Cut, paste and edit bio-polymer residue sequences in this text box

entry Documentation missing

button Add Sequence!: Add the states sequence to the selected, or new, molecule making a new chain if a molecular system is specified

button Tidy: Tidy the text window containing the sequence; lines things up and checks if the residue codes will be interpreted properly

button Read File: Read residue sequences from a file; places the result in the above text window

button Clear: Clear the text window containing the sequence

button Reverse Complement: For DNA and RNA make a sequence that is the reverse complement of the visible sequence

Small Compounds

A system to choose smaller chemical components for an assignable system; includes ligands, cofactors and unusual residues

pulldown Mol Type: Selects which bio-polymer type to show compounds for; defaults to “other”, representing things that do not fit into other categories

button Import XML File: Import a non-standard compound definition from a CCPN ChemComp XML file, e.g. something made by CcpNmr ChemBuild

Table 8
Ccp Code The CCPN code of the compound
3-Letter Code The PDB three-letter code of the compound
Long Name The long chemical name of the compound. Note this is not standardised

pulldown Destination Molecule: Selects which molecul template the selected chemical compound may be added

button Add Compound: Add the selected chemical compound to the above selected, or new, molecule template; when done the template can be used to make an assignable chain