The commands described in this section are used to partition a molecular system into "blocks" and allow for the use of coefficients that scale the interaction energies (and forces) between these blocks. This has a number of applications, and specific commands to carry out free energy simulations with a component analysis scheme have been implemented. BLOCK was recently modified so that it works with the IMAGE module of CHARMM. As some changes to the documentation were necessary anyways, it was decided to also improve the existing documentation. The Syntax and Function section below are relatively unchanged; the added documentation is in the Hints section (READ IT if you are using BLOCK for the first time!). Comments/suggestions to boresch@tammy.harvard.edu. * Menu: * Syntax:: Syntax of the block commands * Function:: Purpose of each of the commands * Hints:: Some further explanations/hints * Limitations:: Some warnings...
Syntax of BLOCK commands BLOCk [int] Subcommands: miscellaneous-command-spec ! see *note miscom:(miscom.doc). CALL int atom-selection LAMBda real COEFficient int int real NOFOrce FORCe FREE_energy_evaluation [OLDLambda real] [NEWLambda real] - FIRSt int [NUNIT int] [BEGIn int] [STOP int] [SKIP int] - [TEMPerature real] [CONTinuous int] [IHBF int] [INBF int] [IMGF int] INITialize CLEAr Energy_AVeraGe [OLDLambda real] [NEWLambda real] - FIRSt int [NUNIT int] [BEGIn int] [STOP int] [SKIP int] - [CONTinuous int] [IHBF int] [INBF int] [IMGF int] COMPonent_analysis DELL real NDEL int [TEMPerature real] - FIRSt int [NUNIT int] [BEGIn int] [STOP int] [SKIP int] [IHBF int] [INBF int] [IMGF] int AVERage {DISTance int int} {STRUcture} [PERT] [TEMPerature real] [OLDLambda real] [NEWLambda real] - FIRSt int [NUNIT int] [BEGIn int] [STOP int] [SKIP int] END
1) BLOCk [int] enters the block facility. The optional integer is only read when the block structure is initialized (usually the first call to block of a run) to specify the number of blocks for space allocation. If not specified, the default of three is assumed. 2) END exits the block facility. The assignment of blocks, the coefficient weighting of the energy function, the force/noforce option, etc. remain in place. For the terms of the energy function that are supported, each call to ENERGY (either directly or through MINIMIZE, DYNAMICS, etc. commands) results in an energy and force weighted as specified. The matrix of interaction coefficients is printed upon exiting. 3) CALL removes the atoms specified by "atom-selection" from their current block and assigns them to the block number specified by the integer. Initially all atoms are assigned to block 1. If atoms are removed from any block other than block 1, a warning message is issued. If blocks are assigned such that some energy terms (theta, phi, or imphi) are interactions between more than two blocks, a warning is issued when the END command is encountered. (Take such warnings seriously; this is a severe error and indicates that something is wrong. However, the problem might be not the CALL statement (or the atom selection) itself; quite possibly your hybrid molecule was generated impromperly) 4) LAMBda sets the value of lambda to "real". This command is only valid when there are three blocks active. Otherwise multiple COEF commands may be used to set the interaction coefficients manually. LAMBda x is equivalent to (let y=1.0-x) COEF 1 1 1.0 COEF 1 2 y COEF 1 3 x COEF 2 2 y COEF 2 3 0.0 COEF 3 3 x 5) COEF sets the interaction coefficient between two blocks (represented by the integers) to a value (the real number). When the block facility is invoked, all of the atoms are initially assigned to block 1 and all interaction coefficients are set to one. 6) NOFOrce specifies that in subsequent energy calculations, the forces are not required. This is economical when using the post-processing commands (FREE,EAVG,COMP). Forces may be turned back on with the FORCe command; this is necessary before running minimizations and dynamics if there was a prior NOFO command. 7) FREE calculates a free energy change using simple exponential averaging, i.e. the "exponential formula". If the old and new lambdas (OLDL,NEWL) are specified (can only be done when three blocks are active), the perturbation energy is calculated for these values (i.e. FREE gives you the free energy difference between NEWLambda and OLDLambda via perturbation from the lambda value at which your trajectory was calculated. If not, the current coefficient matrix is used (FREE should be used with three blocks, and the use of OLDL and NEWL is recommended). FIRSt_unit, NUNIt, BEGIn, STOP, and SKIP specify the trajectory/ies that is/are to be read (for a further description see the TRAJ command elsewhere in the CHARMM documentation). TEMPerature defaults to 300 K and gives the temperature value to be used in k_B*T. CONTinuous specifies the interval for writing cumulative free energies. A negative value causes binned (rather than cumulative average) values to be written. Be careful to make sure that you use correct non-bonded lists (see the hints section!) 8) INITialize is called automatically when the BLOCK facility is first entered and may also be called manually at some other point. All atoms are assigned to block one and all interaction coefficients are set to their initial value. 9) CLEAr removes all traces of the use of the BLOCK facility. The next command should generally be END, and then CHARMM will operate as if BLOCK had not ever been called. 10) EAVG The average value of the potential energy during a simulation can be calculated with the EAVG (Energy_AVeraGe) command. The parsing is very much like the FREE command above. The most frequent use of this command is to calculate the average value of dV/dlambda during the course of a simulation for use in thermodynamic integration. CONTinuous specifies the interval for writing cumulative free energies. A negative value causes binned (rather than cumulative average) values to be written. Be careful to make sure that you use correct non-bonded lists (see the hints section!) The command accepts the OLDL / NEWL option, similarly to FREE, but for EAVG it is recommended to set up the interaction matrix (using COEF commands) yourself -- see the hints section. 11) [COMP] The COMP module is essentially a modified version of the EAVG module which aside from calculating <dU/dl> = <U_1 - U_0> at a given value of lambda l(i) will also give you expectation values of this quantity at l(i+-1), l(i+-2) etc. based on perturbation theory. COMP requires 4 blocks. Put the usual WT (reactant) in block 2 and MUT (product) in block 3. Put the portion of the environment whose contribution to the free energy change is desired into block 4 (this can be everything else, or just a subset) (Note that the same can be achieved easily with the EAVG command) You have to set up your own coefficient matrix. Much of the parsing is like the EAVG command. CONT is not supported. Two special subcommands (required) are DELL and NDEL. The normal output of COMP is <U_1 - U_0> evaluated at the lambda of the simulation. However, COMP also evaluates the same ensemble averages perturbed to lambda = lambda +/- {0,1,2,...NDEL}*DELL. This (sometimes) helps the quadrature in thermodynamic integration. Note that NDEL must be at least 1, and DELL should not be zero. (You have to specify these values; the default values will lead to an invalid input, i.e. you bomb...) Be careful to make sure that you use correct non-bonded lists (see the hints section!) A word of warning: If your initial ensemble average (at the lambda of the simulation) is not well converged, then your perturbed values are most likely random numbers. The approach taken by COMP is theoretically sound, but it should only be applied if convergence has been established! The output format of COMP is somewhat messy: COMP first prints <dU/dl> = <U_1 - U_0> at lambda = lambda - NDEL*DELL lambda - (NDEL-1)*DELL ... lambda lambda + DELL ... lambda + NDEL*DELL; then it prints an average (integral) value over these results. The meaning of this last value is unclear to me. In earlier versions of this documentation, COMP has been recommended over EAVG. In my experience the opposite is true. There is little COMP can do which you can't do with EAVG (aside from obtaining expectation values for dU/dl). (Maybe the unclear output of the COMP module is the main reason why I don't like it). 12) [AVER] The AVERage command is used to extract ensemble average structural properties from a dynamics simulation. Features in this implementation allow averages taken over ensembles that are perturbed from that which the simulation corresponds to. This is particularly useful for calculating the average structure expected at lambda=0.0 from a simulation run at lambda=0.1, for example. One may calculate average structures [STRUcture] and average distances [DISTance int int; where the two integers are the atom numbers between which the average distance is requested], currently. The PERT keyword indicates that a perturbed ensemble from the dynamics trajectory is desired, with TEMPerature giving the temperature to use in the exponential for the perturbation (defaults to 300 K), OLDLambda and NEWLambda are the lambdas for which the simulation was run and for which the ensemble is requested, respectively (only valid if three blocks are active; if these are not specified, the perturbation energy is calculated with the current coefficient matrix), and the remaining keywords are used to specify the trajectory. NOTE: TO THE BEST OF MY KNOWLEDGE THIS COMMAND HAS NOT BE MAINTAINED (so you are on your own if you use it!)
A warning is in order: the BLOCK module is quite user-unfriendly, AND the user (=you) has to know what he/she is doing, otherwise you won't get anywhere! (Of course, this could be a blessing in disguise) There are two applications for BLOCK: (i) Mere use as an energy partitioning facility, which may, e.g., very helpful as an alternative to the INTEraction energy command and (ii) use in free energy simulations. The focus here is on free energy applications. The following paragraphs assume that you are familiar with the theory of free energy difference simulations (e.g. Brooks et al. Advances in Chem. Physics, Vol. LXXI, 1988, chapter V); the emphasis here is to show how a rough tool as BLOCK can be used to implement the theory in a program and (of course) how to use it. Using BLOCK in order to calculate a free energy difference consists out of two rather dissimilar parts (as far as practical problems are concerned): (i) Run your system at various values of lambda and save trajectories. (ii) Postprocess the trajectories with the FREE or the EAVG command (possibly COMP), use the quantities which these modules give you to calculate the free energy difference. (i) The actual simulations ========================== It's probably easiest to use a concrete example, and the free energy difference between ethane and methanol in aqueous solution is used for that purpose. BLOCK is a so-called dual topology method (D. Pearlman, JPC 1994, 98, 1487) i.e. one has to duplicate any atom that is different with respect to any of its parameters. In the ethane/methanol case this means that you have to run with a solute which looks something like H1 \ /H4 \ C1E ---- C2-H5 H2 = { } \H6 / C1M --- OG / \HG1 H3 (and there is water.) Conceptually, this system is divided into three regions: environment: water, H1, H2, H3 (the region where nothing changes) reactant: C1E, C2, H4, H5, H6 (ethane half) product: C1M, OG, HG1 (methanol half), where of course the role of reactant and product is interchangeable. The steps involved to start running dynamics are as follows: (1) set up the hybrid (generate psf). In principle straightforward, but there is a practical pitfall: The autogenerate angles and dihedrals option(s) may produce artificial dihedrals/angles between the two/three parts of the system, e.g. you don't want angles H1-C1E-OG etc. or dihedrals H3-C1M-C2-H4 etc. Also, make sure to specify nonbonded exclusions between the reactant and product part, otherwise you'll get endless distance warnings and may even bomb if two atom positions coincide. (2) Place the hybrid into water (stochastic or periodic boundary conditions -- yes, IMAGE is now supported) as usual (3) Partition the system, i.e. enter BLOCK The following script fragment will do the trick: block 3 call 2 sele <reactant> end call 3 sele <product> end end (reactant and product have to be defined according to your system). BLOCK 3 initializes the block module with 3 blocks, all atoms are in block 1. The two CALL commands bring the reactant and product part of the system into block 2 and 3 respectively. (4) Run the necessary MD simulations. Let's assume that you decide to use the following values of lambda, lambda = 0.1, 0.3, 0.5, 0.7, 0.9. You want to start your simulation at lambda = 0.1 and you have already partitioned your system as shown in (3). (This information is kept within the same script between calls to block, but it is not saved in restart files or the psf, i.e. you have to repeat this step (as well as step (3)) in every input file). Enter block again, e.g. block lamb 0.1 end From now on interactions between the 3 blocks will be scaled according to the following matrix (lambda = l = 0.1 ==> 1-l = 0.9): block | 1 2 3 ------|-------------------- 1 | 1.0 1-l l 2 | 1-l 1-l 0. 3 | l 0. l Please note that BLOCK will first calculate an interaction, then check to which block the two atoms belong and scale the energy (and forces) appropriately. Therefore, if the distance between 2 atoms is zero (e.g. in the ethane/methanol example I would define C1M and C1E on top of each other!) then you need non-bonded exclusions, otherwise you encounter a division by 0 error! The LAMB command is a shortcut for the following sequence of COEF commands, the following code fragment should be self-explanatory: block coef 1 1 1.0 coef 1 2 0.9 coef 1 3 0.1 coef 2 2 0.9 coef 2 3 0.0 coef 3 3 0.1 end BLOCK only accepts and uses symmetric matrices, i.e. it doesn't matter whether you specify COEF 1 2 or COEF 2 1. Whenever you now call the energy routines, the energies/forces returned from them will be scaled according to the matrix you have set up. Minimizers and Dynamcis can be used as always. So you are ready to run dynamics, and for arguments sake say that you run at every value of lambda 10,000 steps equilibration and 20,000 steps production (i.e. you save coordinates to trajectories) You don't need to save every step, every 5th to 20th step is probably more than enough. (If you saved every step you'd obtain highly correlated data, i.e. you have larger trajectories, but you won't gain anything in terms of convergence.) (ii) Post-processing -- how to obtain a free energy difference ============================================================== At this point in our example, you would have five trajectories corresponding to lambda = 0.1, 0.3, ..., 0.9 The BLOCK module now has to be used to obtain the average quantities you need for either the exponential formula (FREE) or thermodynamic integration (EAVG,COMP) from the trajectories you generated in step (i) (1) At this point, some issues regarding the non-bonded list have to be considered. No special considerations were necessary while running dynamics (aside from having some non-bonded exclusions where necessary); you just set up list updates as usual. During post-processing there are two considerations: (a) efficiency -- you just want to calculate the necessary subset of interactions (otherwise your post-processing run will take about as much time as the simulation itself), and (b) proper list-updating. (a) Efficiency: In none of the post-processing routines do you need the interactions between particles that belong to the environment; therefore you should avoid calculating them. This can be done easily by specifying cons fix sele <environment> end Note that this is not necessary, but it will reduce the CPU time necessary from hours to minutes (and results are identical!) However, if you had atoms belonging to reactant or product or both FIXed during the simulations in step (i), you MUST NOT FIX them now; otherwise you'll omit contributions. (b) List updating: While the efficiency considerations in principle are optional, you have to follow one of the two strategies below otherwise you'll get erroneous results. If you used IMAGE, you have to use the second protocol! Originally, the BLOCK post-processing commands would not do any list updating. This meant that you had to have a nonbonded list which would include all possible interactions before starting post-processing -- don't forget that you post-process over, e.g., 20 ps and particles will move quite far. You can easily create such a nonbonded list by specifying a CUTNB value of, e.g. 99. or 999. Ang (surely, all possible interactions will be included). A CHARMM script looks approximately as follows: !set up system (psf, initial coordinates) block !partition system end cons fix sele <environment> end ==> energy cutnb 99. <all other options as during dynamics> !open trajectories block !postprocessing end In this case, do not use the inbf, ihbf and imgf options of the post-processing commands, they will default to 0, i.e. no update. This approach, however, CANNOT work with IMAGES! Proper use of IMAGEs requires that the minimum image convention is checked periodically, i.e. particles have to be repartitioned between primary and image region. As the BLOCK post-processing commands now understand INBF, IHBF and IMGF, this doesn't pose a problem. However, the automated update is not supported (if you specify a negative value, you'll get a mild warning and the system will default to +1), and I recommend that you use 1 for all frequencies (don't forget, the frames in your trajectory are several steps apart, i.e. in general an update may be necessary) The above scheme now looks like: !set up system (psf, initial coordinates) block !partition system end cons fix sele <environment> end ! set up images if needed ==> energy <all options, incl. CUTNB, as during dynamics> !open trajectories block eavg <other options> inbf 1 ihbf ? (imgf 1) end Unless you have explicit hbond terms, ihbf can of course be 0! (Please note that there may or may not be problems with CRYSTAL, see Limitations section) (2) The actual post-processing commands. In the following I'll explain how to set things up for FREE, EAVG and COMP (as well as why). To speed up things further, you'll also want to specify the NOFOrce option at some point. FREE: This module allows you to calculate the necessary ensemble average for the exponential formula. Using our example, you can for example estimate the free energy difference between l=0.1 (a value at which you ran a trajectory) and l=0.0, or, based on your l=0.1 trajectory the free energy difference between l=0.0 and 0.2 (double wide sampling), i.e. A(0.0)-A(0.1) = -k_B*T*ln <exp[-(U(l=0.0)-U(l=0.1))/kT]>_(l=0.1) or A(0.2)-A(0.0) = -k_B*T*ln <exp[-(U(l=0.2)-U(l=0.0))/kT]>_(l=0.1) You should set up your system with 3 blocks and the usual environment, reactant and product partitions. Before entering block to issue the free command, you have to open the trajectory/ies. ! all the stuff shown above for non-bond lists open file unit 10 read name dat01.trj block free oldl 0.1 newl 0.0 first 10 nunit 1 [temp 300. - inbf 1 imgf 1] end or, for double wide sampling, the free line would be replaced by free oldl 0.0 newl 0.2 first 10 nunit 1 [temp 300. - inbf 1 imgf 1] Here dat01.trj is the trajectory which contains your 20 ps of dynamics at lambda = 0.1. Based on the oldl/newl values (which correspond to A(newl) - A(oldl)), FREE generates the appropriate interaction matrix, which it prints; I recommend that you try to understand why it generates this matrix! FIRST is the unit number of the first trajectory file (10 in our example), NUNIT is the number of trajectories (1 in our example). These (and the other options regarding the trajectories work as in any other post-processing command in CHARMM, see e.g. the TRAJ command) The update frequencies are optional depending on how you decided to handle your non-bonded updates. temp defaults to T=300 K, cf. equations above. If you specify CONT +n, you'll get a cumulative average every n steps; in this case the last value equals the final result; if you specify CONT -n, you'll get the average over every n frames, plus of course the final result at the end. Note that trajectories are not rewound after use; i.e. before any subsequent FREE (or EAVG,COMP) command you have to rewind (or reopen) them! Once you have all the free energy pieces you need, you simply add them up to obtain the free energy difference (beware of sign errors depending on how you defined oldl/newl) EAVG: The main use of this module lies in obtaining the required ensemble averages for thermodynamic integration. The most significant difference to EAVG is that you have to specify your own interactions matrix. BLOCK uses linear coupling in lambda in the potential energy function, i.e. V(l) = V0 + (1-l)*V_reac + l*V_prod, where V0 contains all the intra-environment terms, V_reac are the intra-reactant and reactant-env. interactions, and V_prod are the intra-product and product-env. interactions, respectively. The quantity of interest in TI is dV/dl; for the above potential energy function we have dV/dl = V_prod - V_reac It's very easy to obtain this quantity from EAVG. Use 3 blocks, partition the system as before. ! all the stuff shown above for non-bond lists open file unit 10 read name dat01.trj block coef 1 1 0. coef 1 2 -1. coef 2 2 -1. coef 1 3 1. coef 2 3 0. coef 3 3 1. eavg first 10 nunit 1 [inbf 1 imgf 1 cont +-n] end You will calculate the average interaction energy over all the frames in the trajectory according to the following (symmetric) matrix 0.0 -1.0 -1.0 1.0 0.0 1.0; i.e. it's easy to see that the above script will give you <V_prod - V_reac>_(l=0.1). If you post-process the other trajectories (l=0.3, 0.5, ..,0.9) in an analogous fashion, you just have to approximate the TI integral by the trapezoidal formula (for basic Newton Cotes formulae (open and closed) see, e.g., Numerical Recipes), i.e. in this case you would have dA = 0.2 * (dV(0.1)+dV(0.3)+...+dV(0.9)), where dV(0.1) = <V_prod - V_reac>_(l=0.1), etc. The above is an example of the basic use of EAVG. You automatically get the formal components according to interaction type. Cont +-n works similarly to the FREE case. If you wanted to exclude the intramolecular contributions from ethane and methanol you could set up a slightly different coefficient matrix, i.e. coef 1 1 0. coef 1 2 -1. coef 2 2 0. coef 1 3 1. coef 2 3 0. coef 3 3 0. and you'll get only the solute-solvent contributions. You can use more blocks (m > 3) to extract only a subset of interactions, e.g. block 1: environment - x block 2: reactant block 3: product block 4: x, where x is the region of interest, e.g. a specific sidechain in a protein (but not the one that is mutated!) Using EAVG with an appropriate coefficient matrix, e.g. coef 1 1 0. coef 1 2 0. coef 1 3 0. coef 1 4 0. coef 2 2 0. coef 2 3 0. coef 2 4 -1. coef 3 3 0. coef 3 4 1. coef 4 4 0. will give you (after integration over lambda) the free energy contribution of the interaction of sidechain x with the mutation site. Note that such formal free energy components may be (strongly) path-dependent. These last two examples have hopefully provided a flavor of what can be done with the EAVG module. COMP: This module is also used for thermodynamic integration. It always operates with four (and only four) blocks, just as the advanced example last given for EAVG, so it facilitates COMPonent analysis. Here I want to focus on the second unique aspect of COMP, it's capability to extrapolate additional datapoints, and so I consider in the framework of our ethane/methanol example the "special" case where I want the total free energy difference (as before in EAVG). In order to do this, the system needs to be partitioned as follows block 1: -- block 2: reactant block 3: product block 4: environment Whereas EAVG gave us <V_prod - V_reac>_l only for those lambda values at which we had actually done the simulations, COMP gives us additional values via perturbation (see Bruce Tidor's thesis). Using ! all the stuff shown above for non-bond lists open file unit 10 read name dat01.trj block coef 1 1 0. coef 1 2 0. coef 1 3 0. coef 1 4 0. coef 2 2 -1. coef 2 3 0. coef 2 4 -1. coef 3 3 1. coef 3 4 1. coef 4 4 0. comp first 10 nunit 1 [inbf 1 imgf 1] dell 0.06667 ndel 1 end will now give us <V_prod - V_reac>_l at l=0.03334, l=0.1 and l=0.16667. If we use the same script on the other trajectories, we have 15 instead of 5 datapoints for the integration, i.e. we can obtain dA as dA = 0.06667 * (dV'(0.03334)+dV(0.1)+...+dV'(0.96667)), where dV(0.1) = <V_prod - V_reac>_(l=0.1), etc. and the ' indicates that this is a perturbed quantity. In principle, this should give a better numerical integration; however, in practice everything depends on how well your actual data (l=0.1, 0.3, ...,0.9) are converged. There is no check whether your ndel/dell combination is meaningful; and you cannot run COMP without using the perturbation feature, i.e. NDEL should be set to at least 1 (valid values are 1 through 5). The defaults (if you don't specify ndel/dell) lead to an invalid input (This should be fixed...)
(1) Please be advised (again) that the AVERage command is unsupported, and I would not be surprised if it does not work (anymore). Unless someone who understands this module better than I do maintains it, I recommend that we remove it. (2) BLOCK now coexists with IMAGE "peacefully" and essentially transperantly to the user. It works correctly for the case of a periodic water-box (cf. the block3.inp testcase). I would, however, check carefully whether things really work before I would use it on something fancier like infinite alpha helices. Similarly, it is not clear to me whether things work with the CRYSTAL facility. If one modifies block3 as to use CRYSTAL instead of IMAGE things (seem to) work. On the other hand, I know that I didn't support XTLFRQ in the post-processing routines as I don't understand its meaning. I'll fix things if someone is willing to help me with the bits and pieces I don't understand. (3) Bond and bond angle terms (including Urey-Bradleys). Be advised that if you run a simulation at lambda = 0 or lambda = 1 you may effectively remove bond (and bond angle terms) as they get scaled by zero. In other words, you would have ghost particles that can move freely through your systems, and this leads to all sorts of nasty side-effects. Furthermore, this approach is not sound theoretically (S. Boresch & M. Karplus, unpublished). So in general, avoid running at lambda = 0 and 1. If you have your bonds constrained you're safe as the constraint will keep things together (that won't take care of angles however!) In order to avoid artifacts from noisy, diverging bond and bond angle contributions throw them out during post-processing, e.g. by using the SKIP BOND ANGL UREY command before starting block post-processing. If you want to see what can go wrong, look at the block2 test-case...
NIH/DCRT/Laboratory for Structural Biology
FDA/CBER/OVRR Biophysics Laboratory