Tool from metals' design could aid protein engineering
A tool normally used to improve stainless steel and other metal alloys has now found application to a decidedly non-metallic substance: protein.
Just like inorganic materials, scientists design proteins for enhanced properties, such as survival at high temperatures or tighter binding to surfaces. But doing so involves sorting through the nearly endless possible ways to rearrange a protein's components, called amino acids, making the task extremely time- and computer-intensive.
By applying a computational technique for alloy design called cluster expansion, a team of University of Wisconsin-Madison and Massachusetts Institute of Technology researchers has now been able to search though potential amino acid configurations up to 100 million times faster than with conventional techniques. The result, reported in the September 30 issue of Physical Review Letters, could potentially benefit many fields where revamped proteins with superior properties could make an impact, including medicine and biotechnology.
"Cluster expansion breaks the design problem into meaningful pieces that you can get your brain and your computer around," says UW-Madison professor of materials science and engineering Dane Morgan, who initiated and led the project as a postdoctoral researcher at MIT. "If you can cast a problem into this framework, you can really accelerate how quickly you can look through the possibilities."
Funded by the National Institutes of Health and the Dupont-MIT Alliance, the team includes MIT professor of materials science and engineering Gerbrand Ceder, MIT assistant professor of biology Amy Keating, MIT graduate students Fei Zhou (physics) and Gevorg Grigoryan (biology), and Dupont scientist Steve Lustig.
The similarities between alloy and protein design first struck Morgan as he attended an MIT computational biology course taught by Keating and others.
"In an alloyed material, there are a number of different elements, such as nickel, chromium and iron in stainless steel, which are arranged on a lattice of sites," says Morgan. "It's the same thing with a protein: You have different amino acids occupying various sites in the protein backbone." The trick is determining exactly how to reshuffle these components to enhance the properties of the larger structure, whether metal or biological molecule.
Each protein's function depends on its unique three-dimensional structure, which, in turn, rests on the molecule's specific linear chain, or sequence, of 20 different amino acids. In protein design, scientists start with a protein of known sequence, structure, and function — such as an industrial enzyme that breaks up grime or one that binds and collects certain particles. They then hunt for modified amino acid arrangements that augment the molecule's natural function.
Beginning with an existing protein structure does reduce the possible ways to rearrange the amino acids, but the numbers are still mind-boggling. "For a sequence of just 100 different amino acids, you have 20100 possible combinations," says Morgan. "My guess is this is more than the number of atoms in the universe. The numbers get really big, really fast."
To demonstrate the ability of cluster expansion to manage this complexity, the team focused on protein stability. Highly stable proteins fold into tight three-dimensional structures that stand up well to heat, acidity, and other harsh conditions; less stable ones tend to fall apart. Because a protein's stability relates to its energy in the folded, 3-D state, scientists can calculate an energy term to predict whether a particular amino acid sequence will adopt a robust structure.
The key is to reduce the time required for each computation. "Even if it only takes a second to calculate the energy of a given amino acid sequence, if you're trying to look through billions of sequences, a second is too long," says Morgan.
Cluster expansion breaks an amino acid sequence into small sub-clusters, consisting of two, three, or more amino acids. An energy term is then determined for every possible amino acid sequence within each sub-cluster. For example, says Morgan, the method might compute an energy for two alanines sitting next to one another, then the energy of an alanine and a lysine, two alanines separated by a lysine, and so on. Once the energies of each sub-cluster are known, they can be quickly added to give the energy of the entire sequence.
When the team applied the method to two well-known proteins of distinct structure, they found it calculated amino acid sequence energies that matched well with those computed by an established technique. The difference was, the traditional technique needed more than three minutes to make each energy calculation while cluster expansion took just a microsecond — an improvement that could allow scientists to search through a much wider universe of sequences.
Cluster expansion is so much faster, says Morgan, because instead of factoring in all the interactions taking place between the atoms that make up amino acids, as conventional methods do, it calculates energy strictly in terms of the amino acid sequence itself.
"Essentially, we've clumped all the atomic interactions together into an effective interaction between the amino acids and we use that to think about the energies," says Morgan. "You give me the amino acid sequence, I give you the energy directly, with no reference to atoms in between."
As he thinks about the potential contribution of cluster expansion to biology, Morgan is optimistic but also a bit daunted.
"Sequences play a fundamental roll in biology, and sequence-related computational tools are already incredibly ingenious and elegant. However, because the cluster expansion has been such a powerful tool for thinking about sequences (of elements) in alloys, I'm hoping it will also become a powerful tool for thinking about sequences in biology."