1. Introduction

Biology (Basel)

biology

Biology

2079-7737

MDPI

25811640

4498300

10.3390/biology4020282

biology-04-00282

Article

NPPD: A Protein-Protein Docking Scoring Function Based on Dyadic Differences in Networks of Hydrophobic and Hydrophilic Amino Acid Residues

Shih

Edward S. C.

Hwang

Ming-Jing

Berg

Thorsten

Academic Editor

Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei 115, Taiwan; E-Mail: shihds@gate.sinica.edu.tw

*Author to whom correspondence should be addressed; E-Mail: mjhwang@ibms.sinica.edu.tw; Tel.: +886-2-27899033; Fax: +886-2-27887641.

2432015

62015

42282297271120141632015

2015

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Protein-protein docking (PPD) predictions usually rely on the use of a scoring function to rank docking models generated by exhaustive sampling. To rank good models higher than bad ones, a large number of scoring functions have been developed and evaluated, but the methods used for the computation of PPD predictions remain largely unsatisfactory. Here, we report a network-based PPD scoring function, the NPPD, in which the network consists of two types of network nodes, one for hydrophobic and the other for hydrophilic amino acid residues, and the nodes are connected when the residues they represent are within a certain contact distance. We showed that network parameters that compute dyadic interactions and those that compute heterophilic interactions of the amino acid networks thus constructed allowed NPPD to perform well in a benchmark evaluation of 115 PPD scoring functions, most of which, unlike NPPD, are based on some sort of protein-protein interaction energy. We also showed that NPPD was highly complementary to these energy-based scoring functions, suggesting that the combined use of conventional scoring functions and NPPD might significantly improve the accuracy of current PPD predictions.

protein-protein dockingscoring functiondyadicityamino acid network

1. Introduction

Living cells are a crowded environment in which most proteins interact with other proteins to exert cellular functions. To understand how protein-protein interactions mediate cellular processes, scientists often need to describe the structures of protein complexes at the atomic level. However, due to the difficulty in determining the atomic structures of protein complexes using experimental methods, protein-protein docking (PPD), a computational approach, is often used to complement results from experimental studies [1].

Most methods for PPD predictions involve a two-step strategy, sampling and scoring. For sampling, numerous docking models, also referred to as docking poses or decoys, are often generated from a global search of all possible relative orientations of, and separations between, two proteins that are brought together to form a complex, then these docking poses are ranked by a scoring function. To evaluate the performance of a given scoring function for a set of protein complexes, the TopN success rate is usually employed, in which a “success” hit for a complex is defined as when at least one of its top N docking poses, as ranked by the scoring function, satisfies a specified criterion for being a good (i.e., near-native) model. It follows that, for a given scoring function, a higher success rate (i.e., a higher number of correctly predicted complexes) can be obtained by choosing to compute the success rate at a larger N, since, for a given complex, there will be more poses and, thus, a higher probability of at least one being considered good. The objective when developing a good PPD scoring function is, therefore, to rank good poses as high and bad poses as low. However, despite significant progress in recent years, this is still an active area of research [2,3], as success rates are still low when small values of N are used (e.g., using a stringent criterion, Top1 and Top10 success rates are, respectively, generally below 10% and 20%), unless dockings are guided by experimentally-derived data or information [4,5].

Most PPD scoring functions use a set of mathematical equations to compute the energy resulting from the formation of the protein complex. To do so, many use molecular mechanics functions [6,7,8,9,10,11,12,13,14,15,16], while others use statistical mechanics methods to derive potentials from various sources, including experimentally-determined protein structures [8,10,17,18,19], docking decoys [6,20,21,22,23], homology models [24,25,26], or binding energy funnels [27,28]. Many non-energy-based PPD scoring functions have also been developed, including those that utilize bioinformatics-predicted information [29,30], shape complementarity [31,32], machine learning [33,34,35], coevolution [36], and amino acid networks (AANs) [37,38].

As described in the Experimental Section below, NPPD, the network-based PPD scoring function developed in this work, is based on AANs, which have also been referred to as residue contact networks [39], protein contact networks [40], protein structure networks [41], or residue interaction networks [42], although these networks may not be completely identical in terms of their construction (for reviews, see [39,40,41,43,44]). Owing to the appeal of network analysis in the era of post-genomics research, there has been an increase in the number of studies utilizing AANs to predict a protein’s functional sites [45,46,47], protein-protein [48,49,50,51] and protein-nucleic acid interaction [52,53], and to probe protein dynamics [42,54,55], folding [56,57,58] and structure [59,60,61,62,63]. Of these studies using AANs, two reports by Pons et al. [37] and Chang et al. [38] on PPD are directly relevant to the present work.

In AANs, the protein structure is modeled by a three-dimensional geometric network, with the amino acid residues (usually the Cα or Cβ atoms) being represented as network nodes and their contacts as network edges to capture the interactions between amino acids within the same protein structure and/or between two interacting proteins. Pons et al. [37] showed that network parameters, such as closeness and betweenness, can be used to suggest protein-protein interaction regions, and that an energy term that models this information can be added to an energy-based scoring function to improve PPD predictions. Chang et al. [38] used two networks for a single protein structure, one formed by hydrophobic residues and the other by hydrophilic residues, and analyzed the two networks from the same complex (docking pose) separately; their results again demonstrated that network properties can be used to assist conventional scoring functions to distinguish between good and bad PPD decoys.

Unlike Chang et al., in developing NPPD, we constructed only a single network for a single protein structure, allowing both the hydrophobic (H) and hydrophilic (i.e., polar, P) residue nodes to coexist in the same network. We were then able to investigate not only the effects of dyadicity calculated from the hydrophobic-hydrophobic (HH) and polar-polar (PP) interactions, but also the effects of heterophilicity calculated from the hydrophobic-polar (HP) interactions on the scoring of PPD poses. Benchmark evaluations showed that, using network parameters alone in all three methods, NPPD performed better than the network-assisted PPD predictions reported by Pons et al. [37] and Chang et al. [38], and that NPPD also performed well compared to most energy-based scoring functions. In addition, further analysis revealed significant complementarity between NPPD and the other scoring functions evaluated, demonstrating the merit of using a combination of NPPD and other types of scoring functions to further improve PPD predictions.

2. Experimental Section

Figure 1 outlines the procedures used to develop NPPD. Briefly, the interface residues of a given complex (i.e., docking pose) of protein A and protein B were determined, yielding the H and P nodes for the construction of the AANs for A and B. Eight parameters for each of the two networks were computed and served as attributes for training and testing a Bayesian network model using a PPD benchmark dataset. Note that, during the training of the Bayesian model, the complex context of all the poses was removed and each AAN was treated independently, although, during the machine learning, those that came from a good pose were used as positive incidences and those from a bad pose as negative incidences. Using the Bayesian model thus derived, NPPD can then score any given pose by multiplying together the Bayesian probabilities of the two AANs. This has the advantage of quickly eliminating most of the bad poses since it takes just one bad AAN (i.e., a low Bayesian probability) to produce a bad product (pose) of two AANs. Note that, as illustrated in Figure 1, our AAN was constructed on one side of the interface and did not extend to include contacts from the other side, because including inter-protein contacts did not improve the results [64], possibly owing to the fact that the connections of an inter-protein network can change significantly even by minor changes in the configuration of the docking pose. Still, it may be warranted for future studies to find a way to use inter-protein contacts productively in the Bayesian model.

Figure 1

Procedures used to develop NPPD. (a) An example of an amino acid network and the network parameters used in this study for a docking pose; (b) Flowchart of the training and testing of a Bayesian network model of NPPD.

2.1. Docking Datasets, Poses, and Quality Measures

The 176 protein complexes used in this study were retrieved from a PPD benchmark dataset of known atomic structures of complex component proteins in both the bound (complex) and unbound (free) form [65]. For each of the 176 complexes, two sets of docking poses from the unbound form were used to evaluate the performance of NPPD and compare it with those of several other PPD scoring functions. One set contained the top 54,000 poses for each of 176 complexes generated by ZDOCK [66] and was downloaded from its website (http://zlab.umassmed.edu/zdock/decoys.shtml). The other set, kindly provided by the authors of a large-scale evaluation of 115 scoring functions [67], consisted of ~500 poses generated using SwarmDock [68] for each of a subset containing 118 complexes. The two sets came with their own quality measures for near-native poses, i.e., the so-called good poses; that used for the ZDOCK-generated set was an interface RMSD (IRMSD) < 2.5 Å, where IRMSD is the root mean square displacement of the interface residue’s Cα atoms from the experimentally determined structure of the bound complex and an interface residue is defined as one having at least one heavy (non-hydrogen) atom within 5 Å of any heavy atom in the second protein of the complex, while those used for the SwarmDock-generated set were three quality measures from the CAPRI criteria [2] for acceptable, medium, and high quality.

2.2. Amino Acid Networks and Network Parameters

As described above, two AANs were constructed from the interface residues of two interacting proteins locked in a docking pose. In this work, the 20 amino acids were divided into two classes according to Eisenberg et al. [69], the H class consisting of Gly, Ile, Leu, Val, Phe, Met, Trp, Cys, Tyr, and Ala, and the P class consisting of Lys, Thr, Ser, Gln, Asn, Glu, Asp, Arg, His, and Pro. Our AANs, thus, contained two types of nodes, H and P, and a network edge was established to connect any two nodes (residues) if any heavy atom in one of the residues was within 5.0 Å of any heavy atom in the other (Figure 1a).

For each AAN, we computed two dyadicity parameters, D_p-p and D_h-h, and one heterophilicity parameter, H_p-h, which, following the work of Park and Barabasi [70], are defined as: (1)Dpp≡mppE(mpp),Dhh≡mhhE(mhh), and Hph≡mphE(mph) where m_pp, m_hh, and m_ph are, respectively, the number of P-P, H-H, and P-H edges in the AAN, and the three denominators are the respectively expected values of m_pp, m_hh, and m_ph, which can be computed as: (2)E(mpp)=np(np−1)2p, E(mhh)=nh(nh−1)2p and E(mph)=npnhp where n_p is the number of P nodes, n_h the number of H nodes, and p = 2M/N(N-1) (M and N are the total number of edges and nodes, respectively) is connectance, which represents the average probability that two nodes in a dyadic network are connected [71].

2.3. Bayesian Network

To infer whether two AANs would generate a near-correct docking pose, we employed the machine learning algorithm implemented in the Weka platform [72] to derive a Bayesian network model [73], which we then used to compute the probability for every AAN of being at the interface of a protein complex. We then computed the probability product of two AANs to give an estimate of the likelihood of the resulting docking pose being a good one (Figure 1b). The aforementioned 176 benchmark complexes and their 54,000 poses per complex generated by ZDOCK were used in a leave-one-out training and testing of the Bayesian model, i.e., each of the 176 complexes was, in turn, left out during training of the model on AANs randomly selected from poses of the remaining 175 complexes and was then used as a test case. As shown in Figure 1b, we randomly selected 27,000 AANs from good poses, irrespective of whether they came from the same complex or not, as positive incidences and an equal number of AANs from bad poses as negative incidences, and used the values of the 8 parameters of D_p-p, D_h-h, H_p-h, m_pp, m_hh, m_ph, n_p and n_h of the AANs as attributes for training. The training set-derived Bayesian model was then used to score poses of the left-out complex as a test of the model.

3. Results and Discussion3.1. Performance of NPPD and IRAD

The TopN success rates obtained using poses created and ranked by ZDOCK [66] and IRAD [74], a state-of-the-art PPD scoring function, have often been used as yardsticks to evaluate PPD scoring functions [3,4,5]. Both ZDOCK and IRAD use a multitude of scoring terms, such as shape complementarity, interface atomic contact energy, and electrostatics, and IRAD also uses both atom-based and residue-based potentials [66,74]. As can be seen in Figure 2, using the 54,000 poses created by ZDOCK for each of the 176 benchmark complexes, the Bayesian probabilities of NPPD produced worse Top1 and Top10 success rates than either ZDOCK or IRAD, but, as N increased, the success rates increased faster for NPPD than for ZDOCK or IRAD, with NPPD outperforming the other two when N > 100.

Figure 2

TopN success rates for NPPD, ZDOCK, and IRAD on the benchmark dataset of the unbound docking poses of 176 protein complexes. IRMSD < 2.5 Å was used to determine good (near-correct) poses. The success rates of ZDOCK and IRAD were obtained from the ZDOCK website (http://zlab.umassmed.edu/zdock/perf_decoys.shtml).

Despite the low success rates of NPPD at a low N, it is interesting that, as shown in Table 1, many of the complexes that NPPD succeeded at predicting were different from those predicted by IRAD and vice versa. The complementarity between the two methods, measured as the ratio of the method-unique successes divided by all successes and expressed as a percentage, was especially significant at low N, being as high as 86% for the Top1 success rate (only 3 out of 22 complexes were successfully predicted by both methods).

biology-04-00282-t001_Table 1

Table 1

Number of benchmark complexes successfully predicted by NPPD and/or IRAD at different TopN success rates.

Set	Top1	Top10	Top100	Top1000	Top2000
NPPD (A)	9	28	65	102	110
IRAD (B)	16	43	64	92	102
Intersection (A∩B)	3	15	44	80	95
Union (A∪B) = a	22	56	85	114	117
Unique to NPPD or IRAD (A⊖B) = b	19	41	41	34	22
Complementarity = b/a	86%	73%	48%	30%	19%

⊖ (Symmetric difference): the set of elements in either of the sets and not in their intersection.

3.2. Comparison with Other Network-Based Methods

As mentioned in the Introduction, two other groups have used AANs to help score docking poses [37,38]. Table 2 compares our results with their reported success rates and shows that, using the same benchmark dataset and the same criterion for success hits, when the scoring was based on network parameters alone, NPPD produced a better Top1 and Top10 success rate: e.g., the values for the Top10 success rate was 18.5% using NPPD versus 10.6% in Pons et al. [37] for the 176 complexes of the benchmark and 25.6% using NPPD versus 23.2% in Chang et al. [38] for a subset of 43 complexes. However, it should be noted that different sampling algorithms (FTDOCK [16], RossettaDock [75], and ZDOCK [66]) were used to generate the same number of poses for evaluation, which may have contributed to the differences in success rates obtained. Several aspects of the use of AANs were also different: (i) as mentioned earlier, our AAN was different from that of Pons et al. [37], which represents all amino acids by just one type of network node, and from that of Chang et al. [38], which, although, like ours, has both H and P nodes, creates two separate AANs for the two different types of nodes; (ii) as also mentioned earlier, unlike these two other networks, our AAN did not include inter-protein contacts; (iii) whereas we used dyadicity and heterophilicity parameters for scoring, the other two studies used more conventional network parameters, such as degree and cluster coefficient [38] and closeness and betweenness [37]; (iv) NPPD was used to score docking poses by itself, whereas the network-based scoring functions of the other two studies are additional terms that can be added to an existing scoring function to give a better result [37,38] (Table 2), and, if these results also apply to our method, incorporating NPPD into existing scoring functions should achieve significantly higher success rates.

biology-04-00282-t002_Table 2

Table 2

Conditions and Top1/Top10 success rates for NPPD and two other network-based scoring functions.

Conditions of docking poses	176 Complexes		43 Complexes
Conditions of docking poses	Pons et al. [37]	NPPD	Chang et al. [38]	NPPD
Generation of docking poses	FTDock [16]	ZDOCK	RossettaDock 1.0 [75]	ZDOCK
Number of poses generated	10,000		1000
Criterion for a success hit	L-RMSD < 10 Å		L-RMSD < 5 Å
Top 1 success rate *	5.0% (7.0%)	8.0%	2.3% (25.6%)	11.6%
Top10 success rate *	10.6% (29.8%)	18.5%	23.2% (53.4%)	25.6%

* The values in parenthesis are success rates produced by combining the network parameters and the energy terms of the sampling method.

3.3. Performance of NPPD in a Comprehensive Evaluation of a Number of PPD Scoring Functions

Since many factors can affect the performance of PPD scoring functions, one example being the evaluation of docking poses produced by different sampling methods as mentioned above, it was important to evaluate NPPD further. Recently, a large-scale evaluation of 115 PPD scoring functions was reported [67], in which the authors ranked these scoring functions by comparing their Top1, Top10, and Top100 success rates on a set of docking poses produced by SwarmDock [68]. As shown in Figure 3a, using the same set of docking poses, the leave-one-out Bayesian model of NPPD produced TopN success rates comparable to those produced by the best performers of the 115 scoring functions evaluated (ranked 7th by Top10 success rate). Note that, with the exception of the 1^st-ranked ZRANK2 method [12], an earlier version of IRAD, which perhaps stands out a little bit from the others, these 20 top performers were more or less equally good, as the absolute ranking depended on which success rate (Top1, 10, or 100) and which quality measure (acceptable, medium, or high) were used as the basis for ranking. Note also that, of these top performers, NPPD was the only one using network parameters (the scoring functions of Pons et al. [37] and Chang et al. [38] were not included in the 115 PPD scoring functions previously evaluated [67]).

Using the complementarity between two PPD scoring functions as defined in Table 1, i.e., the ratio of the number of complexes successfully predicted by either, but not both, of the two functions divided by the total number of successfully predicted complexes, the results, presented in Figure 3b, showed that the complementarity of NPPD with each of 16 other best performers was generally higher than the averaged complementarity exhibited by the other methods, especially in the case of the Top1 and Top10 success rates. Interestingly, although SPIDER [76], another AAN-based PPD scoring function, ranked only 38th of the 115 scoring functions evaluated [67], it is especially good at predicting complexes not detected by conventional scoring functions [67]. Unlike NPPD and the methods used by Pons et al. [37] and Chang et al. [38], SPIDER uses motifs of network structures, rather than network parameters, for scoring.

3.4. Some Limitations and Prospects

Without the ability to handle large conformational change induced by complex formation, PPD methods would perform badly for such complexes [2]. Indeed, both NPPD and IRAD failed to produce a Top100 success hit for those in the benchmark set with the largest unbound/bound IRMSDs, indicative of a significant change in conformation between the unbound and bound form of the complex (Figure 4). However, conformational change is not the only culprit for failures in PPD predictions. Figure 4 shows that if sampling could not produce a sufficient number, say 300, of positive (good) poses as defined by IRMSD < 2.5 Å (see Figure 1b) to score upon, the likelihood for either NPPD or IRAD to succeed was drastically decreased, even for complexes considered as “rigid” [65]. Further analysis indicated that some of these “rigid” complexes had a particularly small interface and hence might be difficult to sample and predict [77]. Since the best current scoring functions all performed similarly (Figure 3), we speculate that the same two factors, conformational change and insufficient sampling of good poses, also limit the success of other PPD methods. Note that while the sampling of good poses among different complexes was unbalanced, the distribution of the attributes used by NPPD was not (Figure 4), suggesting that sampling bias would not significantly affect training of the Bayesian model. While it is not entirely clear to us what gave rise to the apparently poor correlation between the number of good poses sampled and unbound/bound IRMSD as observed in Figure 4, it is notable that NPPD was better than IRAD for a few of those with the smallest unbound/bound IRMSDs and poor sampling, whereas IRAD did much better than NPPD for those ranked next in unbound/bound IRMSD (roughly between complex 1PPE and 2QFW in Figure 4), thereby contributing partly to the high complementarity between the two methods (Table 1). Taken all these results together, we can conclude that while it is still likely to significantly improve PPD performance by combining all the different scoring functions, the main barriers to overcome remain those arising from sampling and conformational change.

Figure 3

Benchmark results for NPPD and complementarity of NPPD and several best performing PPD scoring functions. (a) The 20 best performing PPD scoring functions ordered, from left to right, by increasing Top10 success rate. All data except those for NPPD were taken from [67]. Note that the Top1, Top10, and Top100 success rates for each method, shown, respectively, as the left, center, and right bar in each group, were computed using a set of unbound docking poses (~500 for each of 118 complexes) generated by SwarmDock [68], which was different from the set generated by ZDOCK used in Figure 2 and Table 1. The leave-one-out Bayesian model of NPPD was therefore derived using these SwarmDock poses, but otherwise using the same procedures described in Figure 1. The portions of success rates for high, medium, and acceptable quality poses are shown, respectively, in red, orange, and yellow, the criteria for the three quality measures being those used by CAPRI [2]; (b) Complementarity between NPPD and each of another 16 best performing PPD scoring functions. The blue, purple, and green bars indicate the complementarity, as defined in Table 1, computed based on, respectively, the Top1, Top10, or Top100 success rates. The horizontal blue, purple, and green lines are the averaged complementarity for, respectively, theTop1, Top10, or Top100 success rates for all pairs of the 16 scoring functions (three of the scoring functions (SIPPER, PYDOCK_TOT, and PROPNSTS) of the 19 compared in (a) were not included because the data were not made available to us). References for these 19 PPD scoring functions can be found in Reference [67] and references therein.

In this work, instead of using two-fold validation as did Chang et al. [38], we opted for the leave-one-out validation of machine learning so that every complex of the benchmark set can be a test and the performance of NPPD can be fully compared with other scoring functions. Technical differences aside, machine learning techniques are known to be unreliable for extrapolation, and only methods based on first-principles physics can truly predict and would not fail miserably when encountering complexes with an unusual interface [78]. However, as such an ideal method is not yet in sight, there is room and merit to further develop empirical methods, such as NPPD, since a new method, particularly a nonconventional one, can often reveal shortfalls of existing methods.

Figure 4

Number of positive poses and D_p-p plotted against unbound/bound IRMSD. The 176 benchmark complexes of ZDOCK are ordered in increasing unbound/bound IRMSD, the best RMSD of interface residues superimposed between the unbound form and the bound form of the complex, with the PDB ID of every 5^th complex indicated on the X-axis. Dashed line denotes a number of 300 positive poses. In the top half of the figure are the averages and standard deviations of the parameter D_p-p computed from the positive poses of each complex; all other attributes used by NPPD, and for negative poses, showed a similar random distribution [64].

4. Conclusions

In this work, we showed that a Bayesian model based on the dyadic parameters of AANs of docking poses performed well compared to the best scoring functions currently used for PPD predictions. Furthermore, the results showed that our method can complement other methods by finding good poses for a significant number of complexes missed by these methods. Taken together with the findings in a recent large-scale evaluation of 115 PPD scoring functions [67], these results suggest that non-conventional scoring functions, such as that developed in the present study, are worthy of further investigation in the effort to improve the prediction of protein complex structures.

Acknowledgments

We thank Fernández-Recio for providing the SwarmDock models. This work was supported by the Ministry of Science and Technology, Taiwan (grant nos. NSC97-2311-B-001-011-MY3 and NSC-97-2627-P-001-004). We thank Tom Barkas for English editing.

Author Contributions

Edward S.C. Shih and Ming-Jing Hwang conceived and designed the experiments, analyzed the data, and wrote the paper, while Edward S.C. Shih performed the experiments.

Conflicts of Interest

The authors declare no conflict of interest.

References1.

Mosca

Pons

Ceol

Valencia

Aloy

Towards a detailed atlas of protein-protein interactions

Curr. Opin. Struct. Biol. 2013 23 929 940

10.1016/j.sbi.2013.07.005

23896349

Lensink

M.F.

Wodak

S.J.

Docking, scoring, and affinity prediction in CAPRI

Proteins 2013 81 2082 2095

10.1002/prot.24428

24115211

Moal

I.H.

Moretti

Baker

Fernandez-Recio

Scoring functions for protein-protein interactions

Curr. Opin. Struct. Biol. 2013 23 862 867

10.1016/j.sbi.2013.06.017

23871100

Shih

E.S.C.

Hwang

M.J.

A critical assessment of information-guided protein-protein docking predictions

Mol. Cell Proteomics 2013 12 679 686

10.1074/mcp.M112.020198

23242549

Shih

E.S.C.

Hwang

M.J.

On the use of distance constraints in protein-protein docking computations

Proteins Struct. Funct. Bioinform. 2012 80 194 205

10.1002/prot.23179

Viswanath

Ravikant

D.V.

Elber

Improving ranking of models for protein complexes with side chain modeling and atomic potentials

Proteins 2013 81 592 606

10.1002/prot.24214

23180599

Pallara

Jimenez-Garcia

Perez-Cano

Romero-Durana

Solernou

Grosdidier

Pons

Moal

I.H.

Fernandez-Recio

Expanding the frontiers of protein-protein modeling: From docking and scoring to binding affinity predictions and other challenges

Proteins 2013 81 2192 2200

10.1002/prot.24387

23934865

Pons

Talavera

de la Cruz

Orozco

Fernandez-Recio

Scoring by intermolecular pairwise propensities of exposed residues (SIPPER): A new efficient potential for protein-protein docking

J. Chem. Inf. Model 2011 51 370 337

10.1021/ci100353e

21214199

Mitra

Pal

Using correlated parameters for improved ranking of protein-protein docking decoys

J. Comput. Chem. 2011 32 787 796

10.1002/jcc.21657

20941737

10.

Tobi

Designing coarse grained-and atom based-potentials for protein-protein docking

BMC Struct. Biol. 2010 10 40

10.1186/1472-6807-10-40

21078143

11.

Demir-Kavuk

Krull

Chae

M.H.

Knapp

E.W.

Predicting protein complex geometries with linear scoring functions

Genome Inform. 2010 24 21 30

22081586

12.

Pierce

Weng

A combination of rescoring and refinement significantly improves protein docking performance

Proteins 2008 72 270 279

10.1002/prot.21920

18214977

13.

Andrusier

Nussinov

Wolfson

H.J.

FireDock: Fast interaction refinement in molecular docking

Proteins 2007 69 139 159

10.1002/prot.21495

17598144

14.

Cheng

T.M.

Blundell

T.L.

Fernandez-Recio

pyDock: Electrostatics and desolvation for effective scoring of rigid-body protein-protein docking

Proteins 2007 68 503 515

10.1002/prot.21419

17444519

15.

Murphy

Gatchell

D.W.

Prasad

J.C.

Vajda

Combination of scoring functions improves discrimination in protein-protein docking

Proteins 2003 53 840 854

10.1002/prot.10473

14635126

16.

Gabb

H.A.

Jackson

R.M.

Sternberg

M.J.

Modelling protein docking using shape complementarity, electrostatics and biochemical information

J. Mol. Biol. 1997 272 106 120

10.1006/jmbi.1997.1203

9299341

17.

Liu

Vakser

I.A.

DECK: Distance and environment-dependent, coarse-grained, knowledge-based potentials for protein-protein docking

BMC Bioinform. 2011 12 280

10.1186/1471-2105-12-280

18.

Skolnick

Development of unified statistical potentials describing protein-protein interactions

Biophys. J. 2003 84 1895 1901

10.1016/S0006-3495(03)74997-2

12609891

19.

Miyazawa

Jernigan

R.L.

Self-consistent estimation of inter-residue protein contact energies based on an equilibrium mixture approximation of residues

Proteins 1999 34 49 68

10.1002/(SICI)1097-0134(19990101)34:1<49::AID-PROT5>3.0.CO;2-L

10336383

20.

Omori

Kitao

CyClus: A fast, comprehensive cylindrical interface approximation clustering/reranking method for rigid-body protein-protein docking decoys

Proteins 2013 81 1005 1016

10.1002/prot.24252

23344972

21.

Chuang

G.Y.

Kozakov

Brenke

Comeau

S.R.

Vajda

DARS (Decoys as the Reference State) potentials for protein-protein docking

Biophys. J. 2008 95 4217 4227

10.1529/biophysj.108.135814

18676649

22.

Muller

Sticht

A protein-specifically adapted scoring function for the reranking of docking solutions

Proteins 2007 67 98 111

10.1002/prot.21310

17243180

23.

Esmaielbeiki

Nebel

J.C.

Scoring docking conformations using predicted protein interfaces

BMC Bioinform. 2014 15 171

10.1186/1471-2105-15-171

24.

Anishchenko

Kundrotas

P.J.

Tuzikov

A.V.

Vakser

I.A.

Protein models: The grand challenge of protein docking

Proteins 2014 82 278 287

10.1002/prot.24385

23934791

25.

Kundrotas

P.J.

Vakser

I.A.

Global and local structural similarity in protein-protein complexes: Implications for template-based docking

Proteins 2013 81 2137 2142

10.1002/prot.24392

23946125

26.

Torchala

Moal

I.H.

Chaleil

R.A.

Agius

Bates

P.A.

A Markov-chain model description of binding funnels to enhance the ranking of docked solutions

Proteins 2013 81 2143 2149

10.1002/prot.24369

23900714

27.

London

Schueler-Furman

Funnel hunting in a rough terrain: Learning and discriminating native energy funnels

Structure 2008 16 269 279

10.1016/j.str.2007.11.013

18275818

28.

Kozakov

Schueler-Furman

Vajda

Discrimination of near-native structures in protein-protein docking by testing the stability of local minima

Proteins 2008 72 993 1004

10.1002/prot.21997

18300245

29.

Schneidman-Duhovny

Rossi

Avila-Sakar

Kim

S.J.

Velazquez-Muriel

Strop

Liang

Krukenberg

K.A.

Liao

Kim

H.M.

A method for integrative structure determination of protein-protein complexes

Bioinformatics 2012 28 3282 3289

10.1093/bioinformatics/bts628

23093611

30.

De Vries

S.J.

Bonvin

A.M.

CPORT: A consensus interface predictor and its performance in prediction-driven docking with HADDOCK

PLOS ONE 2011 6 e17695

10.1371/journal.pone.0017695

21464987

31.

Koehl

Hass

Amenta

Surface-histogram: A new shape descriptor for protein-protein docking

Proteins 2012 80 221 238

10.1002/prot.23192

22072544

32.

Shentu

al Hasan

Bystroff

Zaki

M.J.

Context shapes: Efficient complementary shape matching for protein-protein docking

Proteins 2008 70 1056 1073

10.1002/prot.21600

17847098

33.

Fink

Hochrein

Wolowski

Merkl

Gronwald

PROCOS: Computational analysis of protein-protein complexes

J. Comput. Chem. 2011 32 2575 2586

10.1002/jcc.21837

21630291

34.

Bourquard

Bernauer

Aze

Poupon

A collaborative filtering approach for protein-protein docking scoring functions

PLOS ONE 2011 6 e18541

10.1371/journal.pone.0018541

21526112

35.

Chae

M.H.

Krull

Lorenzen

Knapp

E.W.

Predicting protein complex geometries with a neural network

Proteins 2010 78 1026 1039

10.1002/prot.22626

19938153

36.

Andreani

Faure

Guerois

InterEvScore: A novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution

Bioinformatics 2013 29 1742 1749

10.1093/bioinformatics/btt260

23652426

37.

Pons

Glaser

Fernandez-Recio

Prediction of protein-binding areas by small-world residue networks and application to docking

BMC Bioinform. 2011 12 378

10.1186/1471-2105-12-378

38.

Chang

Jiao

C.H.

Gong

X.Q.

Chen

W.Z.

Wang

C.X.

Amino acid network and its scoring application in protein-protein docking

Biophys. Chem. 2008 134 111 118

10.1016/j.bpc.2007.12.005

18329160

39.

Zhang

Perica

Teichmann

S.A.

Evolution of protein structures and interactions from the perspective of residue contact networks

Curr. Opin. Struct. Biol. 2013 23 954 963

10.1016/j.sbi.2013.07.004

23890840

40.

Di Paola

de Ruvo

Paci

Santoni

Giuliani

Protein contact networks: An emerging paradigm in chemistry

Chem. Rev. 2013 113 1598 1613

10.1021/cr3002356

23186336

41.

Greene

L.H.

Protein structure networks

Brief Funct. Genomics 2012 11 469 478

10.1093/bfgp/els039

23042823

42.

Giollo

Martin

A.J.

Walsh

Ferrari

Tosatto

S.C.

NeEMO: A method using residue interaction networks to improve prediction of protein stability upon mutation

BMC Genomics 2014 15 S7

10.1186/1471-2164-15-S4-S7

25057121

43.

Krishnan

Zbilut

J.P.

Tomita

Giuliani

Proteins as networks: Usefulness of graph theory in protein science

Curr. Protein Pept. Sci. 2008 9 28 38

10.2174/138920308783565705

18336321

44.

Yan

Zhou

Sun

Chen

Shen

The construction of an amino acid network for understanding protein structure and function

Amino Acids 2014 46 1419 1439

10.1007/s00726-014-1710-6

24623120

45.

Peng

Wang

Chen

Zhong

Zhang

Pan

Predicting Protein Functions by using unbalanced bi-random walk algorithm on protein-protein interaction network and functional interrelationship network

Curr. Protein Pept. Sci. 2014 15 529 539

10.2174/1389203715666140724085224

25059324

46.

Axe

J.M.

Yezdimer

E.M.

O’Rourke

K.F.

Kerstetter

N.E.

You

Chang

C.E.

Boehr

D.D.

Amino acid networks in a (beta/alpha)(8) barrel enzyme change during catalytic turnover

J. Am. Chem. Soc. 2014 136 6818 6821

10.1021/ja501602t

24766576

47.

Lee

B.C.

Park

Kim

Analysis of the residue-residue coevolution network and the functionally important residues in proteins

Proteins 2008 72 863 872

10.1002/prot.21972

18275083

48.

Luo

Hamer

Reinert

Deane

C.M.

Local network patterns in protein-protein interfaces

PLOS ONE 2013 8 e57031

10.1371/journal.pone.0057031

23520460

49.

Johnson

M.E.

Hummer

Interface-resolved network of protein-protein interactions

PLOS Comput. Biol. 2013 9 e1003065

10.1371/journal.pcbi.1003065

23696724

50.

Goebels

Frishman

Prediction of protein interaction types based on sequence and network features

BMC Syst. Biol. 2013 7 S5

10.1186/1752-0509-7-S6-S5

24564924

51.

Del Sol

O’Meara

Small-world network approach to identify key residues in protein-protein interaction

Proteins 2005 58 672 682

10.1002/prot.20348

15617065

52.

Maetschke

S.R.

Yuan

Exploiting structural and topological information to improve prediction of RNA-protein binding sites

BMC Bioinform. 2009 10 341

10.1186/1471-2105-10-341

53.

Sathyapriya

Vijayabaskar

M.S.

Vishveshwara

Insights into protein-DNA interactions through structure network analysis

PLOS Comput. Biol. 2008 4 e1000170

10.1371/journal.pcbi.1000170

18773096

54.

Montiel Molina

H.M.

Millan-Pacheco

Pastor

del Rio

Computer-based screening of functional conformers of proteins

PLOS Comput. Biol. 2008 4 e1000009

10.1371/journal.pcbi.1000009

18463705

55.

Bode

Kovacs

I.A.

Szalay

M.S.

Palotai

Korcsmaros

Csermely

Network analysis of protein dynamics

FEBS Lett. 2007 581 2776 2782

10.1016/j.febslet.2007.05.021

17531981

56.

Wang

Identifying folding nucleus based on residue contact networks of proteins

Proteins 2008 71 1899 1907

10.1002/prot.21891

18175318

57.

Bagler

Sinha

Assortative mixing in protein contact networks and protein folding kinetics

Bioinformatics 2007 23 1760 1707

10.1093/bioinformatics/btm257

17519248

58.

Vendruscolo

Dokholyan

N.V.

Paci

Karplus

Small-world view of the amino acids that play a key role in protein folding

Phys. Rev. E 2002 65 061910

10.1103/PhysRevE.65.061910

59.

Bhattacharyya

Bhat

C.R.

Vishveshwara

An automated approach to network features of protein structure ensembles

Protein Sci. 2013 22 1399 1416

23934896

60.

Khor

Towards an integrated understanding of the structural characteristics of protein residue networks

Theory Biosci. 2012 131 61 75

10.1007/s12064-011-0135-y

21948188

61.

Estrada

Universality in protein residue networks

Biophys. J. 2010 98 890 900

10.1016/j.bpj.2009.11.017

20197043

62.

Brinda

K.V.

Vishveshwara

A network representation of protein structures: Implications for protein stability

Biophys. J. 2005 89 4159 4170

10.1529/biophysj.105.064485

16150969

63.

Bagler

Sinha

Network properties of protein structures

Phys. A 2005 346 27 33

10.1016/j.physa.2004.08.046

64.

Shih

E.S.C.

Hwang

M.-J.

Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan

Unpublished data 2015

65.

Hwang

Vreven

Janin

Weng

Protein-protein docking benchmark version 4.0

Proteins 2010 78 3111 3114

10.1002/prot.22830

20806234

66.

Pierce

B.G.

Hourai

Weng

Z.P.

Accelerating protein docking in ZDOCK using an advanced 3D convolution library

PLOS ONE 2011 6 e24657

10.1371/journal.pone.0024657

21949741

67.

Moal

I.H.

Torchala

Bates

P.A.

Fernandez-Recio

The scoring of poses in protein-protein docking: Current capabilities and future directions

BMC Bioinform. 2013 14 286

10.1186/1471-2105-14-286

68.

Torchala

Bates

P.A.

Predicting the structure of protein-protein complexes using the SwarmDock Web Server

Methods Mol. Biol. 2014 1137 181 197

24573482

69.

Eisenberg

Weiss

R.M.

Terwilliger

T.C.

Wilcox

Hydrophobic Moments and Protein-Structure

Faraday Symp. Chem. S 1982 17 109 120

10.1039/fs9821700109

70.

Park

Barabasi

A.L.

Distribution of node characteristics in complex networks

Proc. Natl. Acad. Sci. USA 2007 104 17916 17920

10.1073/pnas.0705081104

17989231

71.

Fienberg

S.E.

Meyer

M.M.

Wasserman

S.S.

Statistical-Analysis of Multiple Sociometric Relations

J. Am. Stat. Assoc. 1985 80 51 67

10.1080/01621459.1985.10477129

72.

Hall

Frank

Holmes

Pfahringer

Reutemann

Witten

I.H.

The WEKA data mining software: An update

SIGKDD Explor. Newsl. 2009 11 10 18

10.1145/1656274.1656278

73.

Needham

C.J.

Bradford

J.R.

Bulpitt

A.J.

Westhead

D.R.

Inference in Bayesian networks

Nat. Biotechnol. 2006 24 51 53

10.1038/nbt0106-51

16404397

74.

Vreven

Hwang

Weng

Integrating atom-based and residue-based scoring functions for protein-protein docking

Protein Sci. 2011 20 1576 1586

10.1002/pro.687

21739500

75.

Gray

J.J.

Moughon

Wang

Schueler-Furman

Kuhlman

Rohl

C.A.

Baker

Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations

J. Mol. Biol. 2003 331 281 299

10.1016/S0022-2836(03)00670-3

12875852

76.

Khashan

Zheng

Tropsha

Scoring protein interaction decoys using exposed residues (SPIDER): A novel multibody interaction scoring function based on frequent geometric patterns of interfacial residues

Proteins 2012 80 2207 2217

10.1002/prot.24110

22581643

77.

Ritchie

D.W.

Kozakov

Vajda

Accelerating and focusing protein-protein docking correlations using multi-dimensional rotational FFT generating functions

Bioinformatics 2008 24 1865 1873

10.1093/bioinformatics/btn334

18591193

78.

Moreira

I.S.

Martins

J.M.

Coimbra

J.T.

Ramos

M.J.

Fernandes

P.A.

A new scoring function for protein-protein docking that identifies native structures with unprecedented accuracy

Phys. Chem. Chem. Phys. 2015 17 2378 2387

10.1039/C4CP04688A

25490550