Replica: Commands which deal with replication of the molecular system: Replica. # <caves>-Aug-18-1993 (Leo Caves) Initial release. # REPLICA/PATH method added by B. Brooks March 1994. The commands described in this node are associated with the replication of regions of the PSF, see *note gener:(struct.doc)Generate. A facility for replication of regions of the PSF has been implemented to support a class of methods which seek to improve the sampling of a (usually small) region of the molecular system, by selective replication. Such methods include LES (Locally Enhanced Sampling [Elber and Karplus 1990, J. Amer. Chem. Soc. 112, 9161-9175]) and MCSS (Multiple Copy Simultaneous Search [Miranker and Karplus 1991, Proteins 11, 29-34]). * Menu: * Syntax:: Syntax of the replication commands * Usage:: Description of command usage * Implementation:: A brief description of the anatomy of replication * Restrictions:: Restrictions on usage * Examples:: Supplementary examples of the use of REPLica * Path:: Replica Path Method
Syntax of PSF Replication commands [SYNTAX: REPLication commands] REPLica { [segid] [NREPlica integer] [SETUP] [atom-selection] [COMP] } { RESEt } segid:== Basename for replica segment identifiers. atom-selection:== (see *note select:(select.doc).) RPATh { [ KRMS real ] [ KANGle real ] [ COSMax real ] [MASS] [WEIGht] } { OFF }
Description of REPLica command usage 1) (The implicit GENERate subcommand) This command performs the essential act of replication. Its action is to replicate (to a degree specified by NREPlica, default: 2) (a subset of) the molecular system, as specified in the (primary) atom-selection (default: all). All atomic properties and topological attributes of the region are replicated (for a full list, see *note implem:(replica.doc)Implementation). Each replica of the primary atom selection constitutes a new segment in (and appended to) the PSF, however the atom and residue names and the residue identifiers of the primary atom selection are carried over. The implicit generation subcommand optionally accepts a segment identifier (segid). The length of segid must be such that when concatenated with the (integer representing the) maximum number of replicas specified for generation, it does not exceed 4 characters. If omitted, then replica segment identifiers will simply be set to the replica number. At present no check is made for duplicate segment identifiers, so choose with care. The command is designed to operate in a manner similar to the GENErate command from the main parser. The effect of the replication command may be classified into two areas: structure and interactions. Structurally, as mentioned above, the command performs the necessary book-keeping work for CHARMM, in order that each individual replica is functionally equivalent to the region of the structure specified in the atom selection. ie. in the case where the atomic positions of an individual replica are the same as the primary atom selection (as they will be immediately after issuing the REPLica command), the energy and forces of the individual replica and the appropriate region of the primary system are identical (there is an important corollary to this statement which is now discussed). In the area of discussing the interactions of replicas it is useful to introduce the concept of a subsystem. Before issuing a REPLica command, there is considered to be one subsystem, the primary subsystem, to which all atoms belong. Upon issuing the REPLIca command a new subsystem is generated, which consists of replicas of a subset of the primary subsystem (as specified in the atom selection). In this case there are now two subsystems. The simple cases specifying interactions of subsystems and replicas may now be stated: * Replicas within a subsystem do NOT interact. * Replicas belonging to different subsystems do interact. In CHARMM, the interaction rules of replicas are applied in the non-bonded list generation routines, through appropriate group/atom exclusions. You will notice some diagnostic messages from the list generation routines indicating the number of group/atom interactions excluded on the basis of replication. In following the rules of interaction of replicas it is important to note that a given replication of a subset of the primary subsystem, results in a new subsystem. Thus the subset of the primary subsystem and its individual replicas are now in different subsystems and are thus will interact. For this reason, the replication action is usually followed by an immediate removal of the atoms of the subset of the primary subsystem, through a call to DELEte *note dele:(struct.doc)Delete). This leaves all replicas of the specified region in a single subsystem, arranged as contiguous segments appended to the current PSF. A note on renormalization of energy and forces: In the original implementation of REPLica in a developmental version of CHARMM at Harvard, there exists a close coupling of the REPLica command and the energy/force evaluation routines. In the current REPLica implementation in the standard CHARMM distribution, appropriate energy/force scaling for the system in question may be achieved through the use of the BLOCK facility of CHARMM see *note Block(block.doc). The combination of REPLica and BLOCK provides for very flexible method of handling replica interactions. Note that if the primary system is FIXed and that only one replicated subsystem is present (the case in many MCSS applications) then normalization of energy/forces is NOT required. Example: In the following section of CHARMM command script, a segment named PROT is generated from a sequence read from a coordinate file. A couple of selection definitions are made which together identify the sidechain atoms of residue 12. In the REPLIca command, 4 copies of the sidechain are generated and placed in three new segments A1 to A4 at the end of the PSF. Next the selections are redefined (as REPLica has altered the PSF and this corrupts existing selections made with the DEFIne command). These (newly redefined) selections are made to remove the sidechain atoms in the primary system that were selected for replication. Next BLOCK is used to setup the scaling of energy and forces in this system with a primary and a single replicated subsystem. In the call to BLOCK, 2 blocks are requested. By default BLOCK places all atoms of the system in block 1, so the first action is to redefine the replicated subsystem to block 2. Next we simply set up the desired interaction matrix. Primary subsystem self interactions are simply set to unity (no scaling). Interactions within each replica are set to 0.25 (the reciprocal of the number of replicas). Primary <--> replicated subsystem interactions are similarly scaled by 0.25. (Note that the REPLica interface to the non-bonded list generation routines removes all inter-replica (intra-subsystem) interactions.) Finally, the masses of the replicated atoms are scaled by 0.25, by using the SCALar commands. (Note that mass-scaling may not be desirable as it has been demonstrated that in the original LES framework, the thermal properties of the replicas are such that at thermal equilibrium, the mapping of replicas back to the "physical" system (with a single copy) results in too high a temperature. The overestimation of the temperature in the physical system is a factor of N in the simplest case of a uniform "weighting" of all replicas by a factor of 1/N, where N is the number of replicas employed in the simulation. This effect is an active field of research, though a solution for systems where only equilibrium properties are desired is to either scale up the masses of the replicas by a factor of N, or to selectively rescale the velocities of the replicas.) ... ! { read sequence and generate segment } READ SEQU COOR UNIT 11 GENErate PROT ! { define some useful atom selections } DEFIne backbone SELEct TYPE N .OR. TYPE CA .OR. TYPE C .OR. - TYPE HN .OR. TYPE HA .OR. TYPE CB END DEFIne disorder SELEct (SEGID PROT .AND. RESId 12) .AND. .NOT. backbone END ! { replicate the selected sidechain four times } REPLIcate A NREPlica 4 SELEct ( disorder ) END ! { redefine as REPLIca has changed PSF and this trashes SELEction definitions } DEFIne backbone SELEct TYPE N .OR. TYPE CA .OR. TYPE C .OR. - TYPE HN .OR. TYPE HA .OR. TYPE CB END DEFIne disorder SELEct (SEGID PROT .AND. RESId 12) .AND. .NOT. backbone END DELEte ATOM SELEct ( disorder ) END DEFIne replicas SELEct SEGId A* END ! { set up an appropriate interaction matrix } BLOCK 2 CALL 2 SELEct ( replicas ) END COEF 1 1 1.0 COEF 2 2 0.25 COEF 2 1 0.25 END ! { note masses can be modified if desired through the SCALar commands } ! { note that this may not always be desirable --- see comments above } SCALar MASS MULt 0.25 SELEct replicas END ... load/generate some coordinates and proceed.. 2) The RESEt subcommand. The RESEt subcommand has the effect of reducing all current subsystems to a single primary subsystem. This is accomplished by simply switching off the CHARMM's REPLica handling mechanism. This change is currently irreversible, in as much as the REPLica state must be restored through appropriate calls to the REPLica command. This command is there to support the use of REPLica for simple replication of PSF elements for which subsequent REPLIca handling is not required. Example: The following example begins by building a PSF containing a single CO molecule. An immediate call to REPLica requests the generation of 256 replicas (with SEGId's of R1 to R256) of the primary subsystem (the CO molecule with the SEGId CO). Next the original CO molecule is removed. The final command, switches off the CHARMM's replica handling, leaving a PSF with 256 CO's which interact with each other. This may seem like a redundant command given the ability to generate a long sequence with commands like READ SEQU COOR or a little copy and paste with your favorite editor, but remember that REPLica can handle replication of ANY subset of the PSF, reducing the need for tampering with RTF definitions and creating new PATCh residues (PRES's). READ SEQUence CARDS * a single carbon monoxide molecule * 1 CO GENErate CO ! generate the primary system REPLica R NREP 256 SELEct SEGId CO END ! replicate DELEte ATOM SELEct SEGId CO END ! remove primary system REPLica RESEt ! reduce replicates to primary
Notes on Implementation of REPLica in CHARMM. This node is of primary directed at CHARMM developers, but may be of interest to the curious user. Structurally, the call to REPLica handles all atomic and topological properties of atoms in the primary atom selection. Properties that are replicated include group/residue membership, atom-code, IUPAC name, partial charge, parameter type code, fixed atom flag, X,Y,Z and W for main and comparison and reference coordinates, the forces DX,DY and DZ, the friction coefficient FBETa, and the harmonic constraint. Topological entries include bond, angle, dihedral, improper terms, explicit non-bonded exclusion flags and H-bond donor and acceptor arrays. Optionally, IC table entries for the primary selection are replicated. For interactions, the handling of replicas in CHARMM has been implemented using a very simple data structure which allows for a simple and efficient interface to the central CHARMM routines. Essentially, subsystem and replica identities are maintained through the use of linked lists. On the first call to REPLica, the primary system (the existing PSF) is initialized to be subsystem 1 (repID), consisting of 1 replica. Each call to REPLica, establishes a new subsystem, and each replica requested is distinguished by a separate replica number. The replica number is maintained at both the group (repNoG) and atom (repNoA) level for efficiency in the non-bonded list generation routines. In the following schematic we see the state of the data structure in which there is a primary system consisting of 4 atoms. The threefold replication of atoms 2 and 3 (which form a distinct group in the primary system) is shown. The replication forms a new subsystem (repID). Each replicated group gets a distinct flag representing the individual replica, as do the replicated atoms. These flags index into the repID array which contains the subsystem membership flags. In this way the subsystem/replica membership is easily established through knowledge of the group or atom number. Atom Name repID repNoG repNoA Comments # 1 N 1 1 1 | Primary subsystem 2 CA 1 | 3 C 1 | 4 O 1 | 5 CA 2 2 2 & Replicated substem (NREP=3) 6 C 2 & 7 CA 3 3 & 8 C 3 & 9 CA 4 4 & 10 C 4 & An schematic of the replica exclusion code in the non-bond list generation is now given for an atom pair i and j. IF ( ( repNoA(i) .NE. repNoA(j) ) .AND. ( repID(repNoA(i)).EQ.repID(repNoA(j)) ) ) THEN EXCLUDE PAIR (i,j) in list ELSE INCLUDE PAIR (i,j) in list ENDIF There is another component of the REPLica data structure which is a array (byatom) of "weights". These weights in general reflect the degree of replication of the subsystem to which the atom belongs, but may be changed through SCALar commands (SCALar WEIGht SET..). This array was used in the developmental version of CHARMM with REPLicas as the interface to the energy/force routines for correct normalization. In the current standard CHARMM release, this array exists, but is redundant. Currently it will be filled by a value of the reciprocal of the number of replicas requested for any subsystem. It has been retained for some degree of flexibility in future releases. At present it may be used as an additional array for book-keeping.
The only absolute requirement for this command is that a PSF of the molecular system be present prior to the call to REPLIca. All non-bonded list generation options are currently supported, however IMAGES and EXTENDED electrostatics are currently not supported. Please note that the replica group flags follow the group membership of the primary atom selection, therefore take care not to split groups in a selection if group-based energy evaluations are to be subsequently used. Run-time attributes of the system such as SHAKE constraints, BLOCK membership and SBOUND flags will not be replicated. (Re)Issue such commands after replication has been performed. It must be noted that currently the replica handling mechanisms of CHARMM are generated through the run-time use of the REPLica command. i.e. the REPLica data structure is not currently incorporated in the standard system PSF or able to be saved to an external file for restoring its state. The philosophy is that all necessary attributes of the replicas are contained in the primary system PSF and that it is therefore only necessary to keep that explicitly. Of course, the coordinates of the individual replicas must be saved.
Supplementary examples. Replication of PHE 22 and 33 and TYR 35 of BPTI These examples illustrate two ways of setting up replicated subsystems. In both cases replicas of the sidechains are created from CG outwards. In the first example three calls to REPLica are made, one for each sidechain, which create 5 replicas for each subsystem. In the second example, one call to REPLica is made, which replicates all three of the sidechains, to create one replicated subsystem containing five 5 replicas of the triad. In each case an appropriate interaction matrix for the subsystems is created with the use of the BLOCK command. Example 1: 3 replicated subsystems: 5 copies of each individual sidechain in each. REPLicate A NREPlica 5 SETUP - SELEct (SEGId 4PTI .AND. RESId 22) .AND. .NOT. - (type N .or. type CA .or. type C .or. - type O .or. type HN .or. type HA .or. type CB) END REPLicate B NREPlica 5 SETUP - SELEct (SEGId 4PTI .AND. RESId 33) .AND. .NOT. - (type N .or. type CA .or. type C .or. - type O .or. type HN .or. type HA .or. type CB) END REPLicate C NREPlica 5 SETUP - SELEct (SEGId 4PTI .AND. RESId 35) .AND. .NOT. - (type N .or. type CA .or. type C .or. - type O .or. type HN .or. type HA .or. type CB) END ! DELETE the necessary regions of the primary sub-system DELEte ATOM - SELEct (SEGId 4PTI .AND. (RESI 22 .OR. RESI 33 .OR. RESI 35)) .AND. .NOT. - (type N .or. type CA .or. type C .or. - type O .or. type HN .or. type HA .or. type CB) END DEFIne phe22 SELEct SEGId A* END DEFIne phe33 SELEct SEGId B* END DEFIne tyr35 SELEct SEGId C* END ! set up the correct energy/force scaling. ! the default coefficient is one. BLOCK 4 CALL 2 SELEct phe22 END ! assign replicated subsystems to blocks CALL 3 SELEct phe33 END CALL 4 SELEct tyr35 END COEF 2 1 0.2 ! primary <-> replicated subsystems COEF 3 1 0.2 COEF 4 1 0.2 COEF 2 2 0.2 ! replicated subsystem self-terms COEF 3 3 0.2 COEF 4 4 0.2 COEF 3 2 0.04 ! replicated <-> replicated subsystems COEF 4 2 0.04 COEF 4 3 0.04 END Example 2: 1 replicated subsystem: 5 replicas consisting of the 3 different sidechains REPLicate A NREPlica 5 SETUP - SELEct (SEGId 4PTI .AND. (RESId 22 .OR. RESID 33 .OR. RESID 35) ) - .AND. .NOT. (type N .or. type CA .or. type C .or. - type O .or. type HN .or. type HA .or. type CB) END ! DELETE the necessary regions of the primary sub-system DELEte ATOM - SELEct (SEGId 4PTI .AND. (RESI 22 .OR. RESI 33 .OR. RESI 35)) .AND. .NOT. - (type N .or. type CA .or. type C .or. - type O .or. type HN .or. type HA .or. type CB) END ! set up the correct energy/force scaling. BLOCK 2 CALL 2 SELEct SEGId A* END COEF 1 2 0.2 COEF 2 2 0.2 END
Replica Path Method The replica/path method allows the postions between sequential replicas to be restrained. This allows minimization and simulated annealing methods to be used to search for transition states. B. Brooks, NIH, March 1994 This code currently requires exactly one replicated subsystem with at least three replicas. Syntax: RPATh { [ KRMS real ] [ KANGle real ] [ COSMax real ] [MASS] [WEIGht] } [ KMAXrms real ] [RMAXrms real ] { OFF } MASS - Use mass weighting in rms determination. WEIGht - Use the main weighting array for weighting the rms vector. KRMS - The rms deviation force constant. KANGle - The COS(angle) deviation force constant. PCOSmx - The value of COS(theta) below which the vectors are restrained. Replica/Path energy functions: Erms = sum { 0.5* Krms * ( rms - <rms> )**2 } I=1,NREP-1 I I where rms is the weighted rms deviation between replica I and I+1 I Erms = sum { 0.5* Kmaxrms * ( rms - Rmaxrms )**2 } I=1,NREP-1 I I Eang = sum { 0.5* Kang * ( cosmax - cos(theta) )**2 } (cos(theta)<cosmax) I = 0.0 when cos(theta) >= cosmax I=1,NREP-2 where cos(theta) is determined from the dotproduct of weighted deviation vectors between replicas I and I+1 and between I+1 and I+2. An example of use: ! { read sequence and generate segment } READ SEQU COOR UNIT 11 GENErate PROT ! { define atom region in which to search for transition state } DEFINE active sele segid prot .and. resid 15 : 19 end ! { replicate the selected residues 20 times } REPLIcate A NREPlica 20 SELEct ( active ) END ! { redefine is necessary } DEFINE active sele segid prot .and. resid 15 : 19 end ! { read product coordinates } OPEN read card unit 12 name products.crd READ coor card unit 12 COMP ! { read reactant coordinates } OPEN read card unit 12 name reactants.crd READ coor card unit 12 COOR orient rms mass sele .not. ( active .or segid A* ) end ! { setup initial guess coordinates for all intermediates } set 1 1 set a 0.0 label loop coor duplicate sele active end sele segid A@1 end coor duplicate sele active end sele segid A@1 end comp coor average fact @a sele segid A@1 end incr a by 0.05 incr 1 by 1 if @1 .lt. 20.5 goto loop DELEte ATOM SELEct active END DEFIne replicas SELEct SEGId A* END ! { average the non active reactant and product atoms } COOR average sele .not. replicas end COOR copy comp ! { set up an appropriate interaction matrix } BLOCK 2 CALL 2 SELEct replicas END COEF 1 1 1.0 COEF 2 2 0.05 COEF 2 1 0.05 END ! { specify residue 17 as more inportant in the weighting } SCALAR wmain set 1.0 SCALAR wmain set 4.0 sele replicas .and. resid 17 end ! invoke the path code RPATH KRMS 100.0 KANGle 100.0 COSMax 0.5 MASS WEIGHT ! { fix the endpoints } cons fix sele segid a1 .or. segid a20 end minimize abnr nstep 100 ! {.... perhaps simulated annealing using MD ...} ! { plot energy as a function of the path } open write card unit 20 name energy.dat set 1 1 label eloop BLOCK 2 CALL 1 sele all end CALL 2 sele replicas .and. .not. segid A@1 end COEF 1 1 1.0 COEF 2 1 0.0 COEF 2 2 0.0 END ENERGY write title unit 20 * @1 ?energy * incr 1 by 1 if @1 .lt. 20.5 goto loop .... more analysis ... STOP
NIH/DCRT/Laboratory for Structural Biology
FDA/CBER/OVRR Biophysics Laboratory