How is a molecule built? Rather, what exactly is required to predict the electronic structure of a molecule using modern tools?
I will use this post to talk about my time spent learning how to apply the classic quantum mechanical calculation of Hartree-Fock (note, this is plain old Hartree-Fock rather than multi-configuration Hartree-Fock or something newer that gives more accurate results). I’ve spoken some about my learning of this theory in a previous post. Since writing that other post, I’ve passed through numerous travails and learned quite a lot more about the process of ab initio molecular calculation.
My original goal was several-fold. I decided that I wanted a structural tool that, at the very least, would allow me access to some new ways of looking at things in my own research. I chose it as a project to help me acquire some skill in a computer programming language. Finally, I also chose to pursue it because it turned out to be a very interesting question.
With several months of effort behind me, I know now several things. First, I do think it’s an interesting tool which will give new insight into my line of research, provided I access the tool correctly. Second, I think I was incredibly naive in my approach: the art and science of ab initio calculation is a much bigger project than can bear high quality fruit in the hands of one overly ambitious individual. It was a labor of years for a lot of people and the time spent getting around my deficits in programming are doubly penalized by the sheer scope of the project. My little program will never produce a calculation at a modern level! Third, I chose Python for my programming language for its ease of availability and ubiquity, but I think a better version of the self-consistent field theory would be written in C or Fortran. Without having this be my full-time job, which it isn’t, I doubt there’s any hope of migrating my efforts to a language better suited to the task. For any other intrepid explorers seeking to tread this ground in the future, I would recommend asking yourself how pressing your needs are: you will never catch up with Gaussian or GAMESS or any of the legion of other professionally designed programs intended to perform ab initio quantum mechanics.
Still, I did get somewhere.
The study of Hartree-Fock is a parallel examination of Quantum Mechanics and the general history of how computers and science have become entangled. You cannot perform Hartree-Fock by hand; it is so huge and so involved that a computer is needed to hold it together. I talked about the scope of the calculation previously and what I said before still holds. It cannot be done by hand. That said, the physics were still worked out mostly by hand.
I would say that part of the story started almost 90 years ago. Linus Pauling wrote a series of papers connecting the then newly devised quantum mechanics of Schrodinger and his ilk to the puzzle of molecular structure. Pauling took hydrogenic atomic orbitals and used linear combinations of these assemblies to come up with geometric arrangements for molecules like water and methane and benzene. A sigma-orbital is the product of two atomic orbitals placed side-by-side with an overlap and then adjusted energy optimization to pick the right distance. A pi-orbital is the same, but with two p-orbitals placed side-by-side and turned so that they lie parallel to one another.
Much of Pauling’s insights now form the backbone of what you learn in Organic Chemistry. The geometry of molecules as taught in that class came out of these years of development and Pauling’s spell of ground-breaking papers from that time will have you doing a double-take regarding exactly how much impact his work had on chemistry. Still, for the work of the 1930s by Pauling and his peers, they only had approximations, with limited accuracy for the geometry and no real ability to calculate spectra.
Hartree-Fock came together gradually. C.S.S. Roothaan published what are now called the Roothaan equations, which constitute the core of more modern closed-shell Hartree-Fock, in 1951. Nearly simultaneously, Frank Boys published a treatment of gaussian functions, showing that all needed integrals for molecular overlap could be calculated in closed form with the gaussian function family, something not possible with the Slater functions that were to that point being used in place of the hydrogenic functions. Hydrogenic functions do show one exact case of what these wave functions actually look like, but they are basically impossible to calculate for any atom except hydrogen and pretty much impossible to adapt to broader use. Slater functions took over in place of the exact hydrogenic functions because they were easier to use as approximations. Gaussian functions then took over from Slater functions because they are easier to use still and much easier to put into computers, a development largely kicked off by Boys. There is a whole host of names that stick out in the literature after that, including John Pople who duly won a Nobel prize in 1998 for his work leading to the creation of Gaussian, which to this day is a dominant force in molecular ab initio calculation (and will do everything you could imagine needing do as a chemist if you’ve got like $1,000 to afford the academic program license… or $30,000 if you’re commercial).
The depth of this field set me to thinking. Sitting here in the modern day, I am reminded slightly of Walder Frey and the Frey brethren in the Game of Thrones. This may seem an unsightly and perhaps unflattering comparison, but stick with me for a moment. In the Game of Thrones, the Freys own a castle which doubles as a bridge to span the waters of the Green Fork in the lands of Riverrun. The Frey castle is the only ford for miles and if you want to cut time on your trade (or marching your army), you have no choice but to deal with the Freys. They can charge whatever price they like for the service of providing a means of commerce –or, as the case may be, war– and if you don’t go with them, you have to go the long way around. Programs like Gaussian (and GAMESS, though it is basically protected freeware), are a bridge across a nearly uncrossable river. They have such a depth of provenance in the scientific service that they provide that you are literally up a creek if you try to go the long way around. This is something I’ve been learning the hard way. In truth, there are many more programs out there which can do these calculations, but they are not necessarily cheap, or -conversely- stable.
I think this feature is interesting on its own. There is a big gap between the Quantum Mechanics which everybody knows about, which began in the 1920s, and what can be done now. The people writing the textbooks now in many cases came to their own in an environment where the deepest parts of ab initio was mainly already solved. Two of the textbooks I delved into, the one by Szabo and Ostlund, and work by Helgaker, clearly show experts who are deeply knowledgeable of the field, but have characteristics suggesting that these authors themselves have never actually been able to cross the river between classical quantum mechanics and modern quantum chemistry fully unaided (Szabo and Ostlund never give theory that can handle more than gaussian S-orbitals, where what they give is merely a nod to Boys, while Helgaker is given to quoting as recently as 2010 from a paper that, to the best of my ability, actually gives faulty theory pending some deep epistimologic insight guarded by the cloistered brotherhood of Quantum Chemists). The workings hidden within the bridge of the Freys is rather impenetrable. The effort of going from doing toy calculations as seen in Linus Pauling’s already difficult work to doing modern real calculations is genuinely a herculean effort. Some of the modern textbooks cost hundreds of dollars and are still incomplete stories on how to get from here to there. Note, this only for gaining the underpinnings of Hartree-Fock, a flawed technique itself without including Configuration Interaction or other more modern adjustments and those even short shrift if you don’t have ways of dealing with the complexities of the boundary conditions.
Several times in the past couple months, I’ve been wishing for Aria Stark’s help.
I will break this story up into sections.
The core of Hartree-Fock is perhaps as good a place to start as any. Everybody knows about the Schrodinger equation. If you’ve gone through physical chemistry, you may have cursed at it a couple times as you struggled to learn how to do the Particle-in-a-Box toy problem. You may be a physicist and have solved the hydrogen atom, or seen Heisenberg’s way of deriving spherical harmonics and might be aware that more than just Schrodinger was responsible for quantum mechanics.
Sadly, I would say you basically haven’t seen anything.
As the Egyptian Book of the Dead claims that Death is only a beginning, Schrodinger’s equation is but the surface of Quantum Mechanics. I will pick up our story in this unlikely place by pointing out that Schrodinger’s equation was put through the grinder in the 1930s and 1940s in order to spit out a ton of insight involving molecular symmetry and a lot of other thoughts about group symmetry and representation. Hartree and Fock had already spent time on their variational methods and systematic techniques for a combined method called Hartree-Fock began to emerge by the 1950s. Heck, an atomic bomb came out of that era. The variant of Schrodinger’s equation where I pick up my story is a little ditty now called the Roothaan equation.
It definitely doesn’t look like the Schrodinger equation. In fact, it looks almost as small and simple as E=mc^2 or F = ma, but that’s actually somewhat superficial. I won’t go terribly deeply into the derivation of this math because it would balloon what will already be a long post into a nightmare post. My initial brush with this form of the Roothaan equation came from Szabo and Ostlund, but I’ve since gone and tracked down Roothaan’s original paper… only to find that Szabo and Ostlund’s notation, which I found to be quite elegant, is actually almost directly Roothaan’s notation. Roothaan’s purpose seems to have been collecting prior insight regarding Hartree-Fock into a systematic method.
This equation emerges from taking the Schrodinger equation and expanding it into a many-body system where the Hamiltonian has been applied onto a wave equation that preserves electron fermion exchange anti-symmetry –literally a Slater determinant wave function, where you may have like 200! terms or more. ‘F’, ‘S’ and ‘C’ are all usually square matrices.
‘F’ is called the Fock matrix and it contains all of the terms of the Hamiltonian. Generally speaking, once the equation is in this form, you’re actually out in numerical calculation land and no mathematical dongles remain in the matrix. The matrix contains only numbers. The Fock matrix is a square matrix which is symmetric, meaning that the terms above and below the diagonal equal each other, which is an outcome of quantum mechanics using hermitian operators. To construct the Fock matrix, you’ve already done a ton of integrals and a huge amount of addition and subtraction on terms that look sort of like pieces of the Schrodinger equation. You can think of the Fock matrix as being a version of the Hamiltonian. Within the Fock matrix are terms referred to as the Core Hamiltonian, which looks like the Schrodinger Hamiltonian, and the ‘G’ matrix, which is a sum of electron repulsion and electron exchange terms, which only occur when you’ve expanded the Schrodinger equation to a many body system. The Fock matrix is usually symmetric rather than just hermitian because the Roothaan equations assume that every molecular orbital is closed… that is, every orbital has one spin up and one spin down electron which are degenerate and indistinguishable. The eigenstates are therefore spatial equations instead of spin-orbitals where spin was integrated out.
‘C’ is a way to represent the eigenstates of the Hamiltonian. Note, I did not say that ‘C’ is a wave function because these wave functions are actually impossible to write down (how many is ~200! or 400! or more terms?) ‘C’ is a representation of a way to write down the eigenstates that you might use to construct a wave function in the space of the Fock matrix. It actually isn’t even the eigenstates directly, but the coefficients for a basis set that could be used to represent the eigenstates you desire. ‘C’ is square unitary matrix, meaning that multiplying it by the transpose of itself produces identity. The eigenstates contained by ‘C’ are orbitals that are associated with the Hamiltonian in the form of the Fock matrix.
‘S’ is called the “overlap matrix.” The overlap matrix is a symmetric matrix that is constructed by use of the basis set. As you may have read in my other post on this subject, the basis set may be a bunch of gaussian functions or a bunch of slater functions or a bunch of some other miscellaneous basis set that you would use to represent the system at hand. The overlap matrix is introduced because, mathematically, whatever basis you chose may be composed of functions that are not orthogonal to one another. Gaussian basis functions are useful, but they are not orthogonal. The purpose of the overlap matrix then is to work through the calculus necessary to construct orthogonal combinations of the basis functions. For a basis set that is not orthogonal you need some way to account for the non-orthogonality.
The form of the Roothaan equation written above is an adapted form for an eigenvalue equation where ε is the eigenvalue. In the case of molecular orbitals, this eigenvalue is an energy that is called the orbital energy. These eigenstates are non-orthogonal as accommodated by the ‘S’ matrix, where the eigenvalues are distributed across combinations of basis functions, as expressed in ‘C’, that are orthogonal to each other.
What makes this equation truly a monstrosity is that the Fock matrix is dependent itself on the ‘C’ matrix. The way this dependence appears is that the integrals which are used to construct the Fock matrix are calculated from the values of the ‘C’ matrix. The Roothaan equation is a sort of feedback loop: the ‘C’ matrix is calculated from working an eigenvalue equation involving ‘F’ and ‘S’ to find ε, where ‘C’ is then used to calculate ‘F’. In practice, this operates as an iteration: you guess at a starting Fock matrix and calculate a ‘C’ matrix, which is then used to calculate a new Fock matrix, from which you calculate a new ‘C’ matrix. The hope is that eventually the new ‘C’ matrix you calculate during each cycle of calculation converges to a constant value.
Oroboros eating its own tail.
This is the spine of Hartree-Fock: you’re looking for a convergence to give constant output values of ‘C’ and ‘F’. As I articulated poorly in my previous attempt at this topic, this is the self-consistent electron field. Electrons occupy some combination of the molecular orbitals expressed by ‘C’ at the energies of ε, forming a electrostatic force field that governs the form of ‘F’, from which ‘C’ is the only acceptable solution. ‘C’ is used to calculated the values of the hamiltonian inside ‘F’, where the integrals are the repulsions of electrons in the ‘C’ orbitals against each other or attractions of those electrons toward nuclei, giving the kinetic and potential energies that you usually expect in a hamiltonian.
Here is the scale of the calculation: a minimal basis set for methane (four hydrogens and a carbon) is 9 basis functions. The simplest, most basic basis set in common use is Pople’s STO-3G basis, which creates orbitals from sums of gaussian functions (called a contraction)… “3G” meaning three gaussians to a single orbital. One overlap integral between two functions therefore involves nine integrals. Generation of the 9 x 9 S-matrix mentioned above then involves 9*9*9 integrals, 729 integrals. Kinetic energy and nuclear attraction terms would each involve another 729 (3*729 = 2187 integrals) which can be shortened by the fact that the matrices are symmetric, so that only a few more than half actually need to be calculated. The electron-electron interactions, including repulsions and exchanges are a larger number still: one quarter of 9^4*9 or ~14,700 integrals (symmetry allows you to avoid the full 9^4 where basically the whole matrix must influence each matrix element, giving a square of a square). Roughly 17,000 integration operations for a molecule of only 5 atoms using the least expensive form of basis set.
The only way to do this calculation is by computer. Literally thousands of calculations go into making ‘F’ and then hundreds more to create ‘C’ and this needs to be done repeatedly. It’s all an intractably huge amount of busy work that begs for automation.
Solving Big Eigenvalue Problems
One big problem I faced in dealing with the Roothaan equations was trying to understand how to solve big eigenvalue problems.
Most of my experience with calculating eigenvalues has been while working analytically by hand. You may remember this sort of problem from your linear algebra class: you basically set up a characteristic equation by setting the determinant of the matrix equal to zero after having subtracted a dummy variable for the eigenvalue from the diagonal and then solving that characteristic equation. It’s kind of complicated by itself and depends on the eigenvalues being separable. A 3 x 3 matrix produces a cubic equation –which you hope to God is separable because nobody ever wants to do more than the quadratic equation. If it isn’t separable, you are up a creek without a paddle even at just 3 x 3.
For the example of the methane minimal basis set, the resulting matrices of the Roothaan equation are 9 x 9.
This is past where you can go by hand. Ideally, one would prefer to not be confined to molecules as small as molecular hydrogen, so you need some method of calculating eigenvalues that can be scaled and –preferably– automated.
This was actually where I started trying to write my program. Since I didn’t know at the time whether I would be able to build the matrix tools necessary to approach the math, I used solving the eigenvalue problem as my barometer for whether or not I should continue. If I couldn’t do even this, there would be no way to approach the Roothaan equations.
The first technique I figured out was a technique called Power Iteration. At the time, this seemed like a fairly straight-forward, accessible method to pull eigenvalues from a big matrix.
To perform power iteration, all you do is operate a square matrix onto a vector, normalize the resulting vector, then act the matrix again on that new vector. If you do this 10,000 times, you will eventually find a point where the resulting vector is just the initial vector times some constant factor. The constant ends up being the biggest eigenvalue in the matrix and the normalized vector is the associated eigenvector. This gives only the biggest eigenvalue in the matrix; you access the next smaller eigenvalue by “deflating” the matrix. This is accomplished by forming the outer product of the eigenvector and multiplying it by eigenvalue and subtracting the result from the initial matrix, which produces a new matrix where the already determined eigenvalue has been “deactivated.” Performing this set of actions repeatedly allows you to work your way through each eigenvalue in the matrix.
There are some difficulties with Power Iteration. In particular, you’re kind of screwed if an eigenvalue happens to be zero since you no longer have the ability to find the eigenvector.
Much of my initial work on self-consistent field theory used Power Iteration as the core solving technique. When I started to run into significant problems later in my efforts, and couldn’t tell whether my problems were due to the way I was finding eigenvalues or some other darker crisis, I ended up switching to a different solving technique.
The second solving technique that I learned was Jacobi Diagonalization. For Power Iteration, I stumbled over the technique from a list of computational eigenvalue calculation methods discovered on-line. Jacobi, on the other hand, was recommended by the Szabo and Ostlund Quantum Chemistry book. Power Iteration is an iterative method while Jacobi is a direct calculation method.
To my somewhat naive eye, the Jacobi method seems ready-made for quantum mechanics problems. A necessary precondition for this method is that the matrix of choice be at least a symmetric matrix, if not actually a hermitian matrix. And, since quantum chemistry seems to mostly reduce its basis sets to non-complex symmetric forms, the Fock matrix is assured to be symmetric as a result of the hermiticity of ground-level quantum mechanics.
Jacobi operates on the observation that the off-diagonal elements of a symmetric matrix can be reduced to zeros by a sequence of unitary rotations. The rotation matrix (called a Givens matrix) can be directly calculated to convert one particular off-diagonal element to a zero. If you do this repeatedly, you can work your way through each off-diagonal element, zeroing each in turn, until the matrix is diagonal. This works only if diagonalization proceeds in a particular order, where you pick the Givens matrix that zeros the largest off-diagonal element present on any particular turn. This largest element is referred to as “the pivot” since it’s the crux of a mathematical rotation. As the pivot is never assured to be in any particular spot during the process, the program must work its way through off-diagonal elements in an almost random order, picking only the largest present at that time.
Once the matrix is diagonalized, all the eigenvalues lie along the diagonal… easy peezy. Further, the product of all the Givens matrices is a unitary matrix containing the eigenvectors for each eigenvalue encoded in order by column.
In the process of learning these numerical methods, I discovered a wonderful diagnostic tool for quantum mechanics programs. As a check for whether or not Jacobi can be applied to a particular matrix, one should look at whether or not the matrix is symmetric. This isn’t an explicit precondition of Power Iteration and I didn’t write any tools to look for it while I was relying on that technique. After I started using Jacobi, I wrote a tool for checking whether or not an input matrix is symmetric and discovered that other routines in my program were failing in their calculations to produce the required symmetric matrices. This diagnostic helped me root out some very deep programming issues elsewhere in what turned out to be a very complex program (for an idiot like me, anyway).
Dealing with the Basis
I made an early set of design choices in order to construct the basis. As mentioned in detail in my previous post, the preferred basis sets are Gaussian functions.
It may seem trivial to most people who may read this post, but I was particularly proud of myself for learning how to import a comma delineated .csv file into a python list while converting select character strings into floating points. In the final version, I figured out how to exploit an exception call as a choice for whether an input was intended to be text or floating point.
In the earliest forms of the program, most of my work was done in lists, but I eventually discovered python libraries. If you’re working in python, I caution you to not use lists for tasks that require searchability: libraries are easier! For any astute computer scientists out there, I have no doubt they can chime up with plenty of advice about where I’m wrong, but boy oh boy, I fell in love with libraries really quick when I discovered the functionality they promote.
For dealing with the basis sets, python has a fair number of tools in its math and cmath libraries. This is limited to basic level operations, however. It may seem a no-brainer, but teaching a program how to do six-dimensional integrals is really not as easy as discovering the right programming library. This intended task defined my choices for how I stored my basis sets.
Within the academic papers, most of the gaussian basis sets can be found in tables stripped of everything but the vital constants. The orbitals are fabricated from “primitive” contractions, where a “contraction” is simply a sum of several bare-bones “primitive” gaussian functions with each identified uniquely by a weighting coefficient and an exponential coefficient. There is frequently also a standard exponent to scale a particular contraction to fit the orbitals for a desired atom. The weighting coefficient tells the magnitude of the gaussian function (often chosen so that the sequence of primitives has an overlap integral that is normalized to equal 1) while the exponential coefficient tells how wide the bell-curve of a particular gaussian primitive spreads. The standard exponent is then applied uniformly across an entire contraction to make it bigger or smaller for a particular atom.
In the earliest papers, these gaussian contractions are frequently intended to mimic atomic orbitals. The texts often refer to “linear combinations of atomic orbitals” when calculating molecular orbital functions. In later years, it seems pretty clear that these gaussian contractions are not necessarily specified atomic orbitals so much as an easy basis set which has the correct densities to give good approximations to atoms with relatively few functions. It’s simply an economical basis for the task at hand.
Since python doesn’t automatically know how to deal specifically with gaussian functions, my programming choice was to create a gaussian primitive class. Each primitive object automatically carried around all the numbers needed to identify a particular gaussian primitive. Within the class there were a few class methods necessary to establish the object and identify the associated constants and position. The orbitals were then lists of these primitive class objects. Later in my programming, I even learned how to make the class object callable so that the primitive could spit out the value of the gaussian for a particular position in space.
This is certainly trivial to the overall story, but it was no small amount of work. That I learned how to do class objects is a point of pride.
Constructing generalized basis functions turns out to be a distinct set of actions from fabricating the particular basis to attack a select problem. After a generalized basis is available to the program, I decided to build The Basis as a list identified by atoms at positions in space. This gave a list of lists, where each entry into the top list was a itself a list of primitive objects constituting an orbital, where each orbital was associated with a particular center in space as connected to an atom. It’s not really necessary that these basis functions be centered at the locations of the atoms, but their distributions are ideally suited if they are.
With a basis in hand, what do you do with it?
The Nightmare of Integration
I’ve been debating how much math I actually want to add to this post. I think probably the less the better. If any human eyes ever actually read this post and are curious about the deeper mathematical machinations, I will happily give whatever information is in my possession. In the end, I want merely to tell the story of what I did.
For calculating a quantum mechanical self-consistent electron field, there is a nightmare of integrals that need to be used. These integrals are a cutting rain that forms the deluge which will warn any normal human being away from trying to do this calculation. It isn’t small and the numbers do not scale linearly with the size of the problem. They get big very fast.
The basic paper which sits at the bottom of all contracted gaussian basis sets is the paper by Frank Boys in 1950. I ended up in this paper after I realized that Szabo and Ostlund had no intention of telling me how to do anything deeper than s-orbital gaussians. Being poor, I can’t just randomly buy textbooks to search around for a particular author who will tell how to do all the integrals I desired. So, I took advantage of being in an academic position and I turned over the scientific literature to find how this problem was actually addressed historically. This got me to Boys.
Boys lays out the case as to why one would ever use gaussian functions in these quantum mechanical calculations. As it turns out there are basically just four integral forms that are needed to perform Hartree-Fock; you need an overlap integral, a kinetic energy integral with several derivatives of the overlap, a one center nuclear attraction integral and a two center electron repulsion integral (which covers both repulsion and exchange interaction). For many basis function types that you might use, including the vanilla hydrogenic orbitals, each of these types of integrals is unique to the basis functions you put into them. This means that no specific method is common to the integration for whatever family you’re talking about and some may not even have analytic solutions. This makes the problem very computationally expensive in many cases, and likely impossible. With the Gaussian functions, you can perform this clever trick where a p-type gaussian can be accessed from a derivative on an s-type gaussian… if the derivative is on a free variable somewhere in the function, the operations of integration and differentiation can be reversed since they aren’t on the same variable. Instead of doing the integral on the p-type gaussian, you do the integral on the s-type gaussian and then perform the derivative to find the associated result for the p-type. Derivatives are always easier!
Boys showed that all the needed integrals can be analytically solved for the s-type gaussian function, meaning that any gaussian class function can be integrated just by integrating the s-type function. In the process he introduced a crazy special function related to the Error Function (erf) which is now often called the “Boys Function.” The Boys function is an intriguing machine because it’s a completely new way of dealing with the 1/distance propagator that makes electromagnetism so hard (for one example, refer to this post).
Boys Function family with member ‘n’:
While this is a simplification, I’ll tell you that it’s still not just easy.
I did troll around in the literature for a time looking for more explicit ways of dealing with implementing this whole gaussian integral problem and grew depressed that most of what I found seemed too complicated for what I had in mind. Several sources gave useful insight into how to systematize some of these integrals to higher angular momentum guassians, but not all. Admittedly, in my earliest pass, I think I didn’t understand the needs of the math well enough. Dealing with this whole problem ended up being an evolving process.
My earliest effort was an attempt to simply, methodically teach the computer how to do symbolic derivatives using chain rule and product rule. This was not a small thing and I figured that I would probably have a slow computer program, but –by god– it would be mine. I had a semi-working system which achieved this.
A parallel problem that cropped up here was learning how to deal with the Boys function. I started out with but one form of the Boys function and knew that I needed erf to use it. Unfortunately, I rapidly discovered that I needed to make derivatives of the Boys function (each derivative is related to the ‘n’ in the function above). I trolled around in the literature for a long time trying to figure out how to perform these derivatives and eventually analytically worked out a series expansion built around erf that successfully formed derivatives for the Boys function based on the derivatives of erf. Probably the smartest and least useful thing I did this entire project.
In its initial form, using these methods, I was able to successfully calculate the minimal basis hydrogen molecule and the HeH+ molecule, both of which have a minimal basis that contains only s-type gaussian functions.
This image is electron density of the hydrogen molecule using the python maya vi package. Literally just two sigma-bonded hydrogen atoms, where the scales are in Bohr radii (1 = 0.53 Angstrom). This is the most simple quantum chemistry system and the hardest system I could perform using strictly my own methods. Sadly, this system can be done by hand in Hartree-Fock.
When I tried to use more complex basis functions for atoms above Lithium, trying things like Carbon monoxide (CO), my system failed rather spectacularly. Many of the difficulties appeared as non-symmetric matrices (mentioned above) and inexplicably ballooning integral values. The Hartree-Fock loop would oscillate rather than converge. Most of this tracked to issues occurring among my integrals.
One of the problems I discovered was that error function in the python math library couldn’t handle input values that approached zero; I supplied the zero limit input originally since the function is non-inclusive of zero, but I also found that the python math library erf function starts to freak out and generate crazy numbers when you get really close to zero, say within 10^-5. So, I got the appropriate values directly at zero due to my patch and mostly good values larger than 10^-4, but crazy weird values in this little block near zero. In one version of correction, I simply introduced a cut-off to send my Boys function to its zero value when I got sufficiently close, but this felt like a very inelegant fix to me. I searched the literature and around on-line and had myself 6 different representations of the Boys function before I was finished. The most stable version was a formulation that pulled the Boys function and all its derivatives out of the confluent hypergeometric function 1F1, which I was able find implemented in the scipy special functions library (I felt fortunate; I had added Scipy to my python environment for a completely unrelated reason that turned out to be a dead-end and ended up needing this special function… lucky!)
I write the formulation here because somebody may someday benefit from it. In this, Fn differs from the nth derivative of the Boys function by -1^n (the ‘n’ functions are all positive while the derivatives of the Boys function alternate positive and negative. The official “Boys Function” is where n = 0.)
Initially, I could not distinguish whether the errors in my program were in the Boys function or in the method being used to form the gaussian derivatives. I knew I was having problems with my initial implementation of the Boys function from a modified erf, but problems persisted no matter how I altered my Boys function. The best conclusion was that the Boys function was not my only stumbling block. Eventually I waved the white flag and surrendered to the fact that my derivative routine was not going to cut it.
This lead to a period where I was back in the literature learning how gaussian derivatives were being made by the professional quantum chemists. I found a number of different strategies: Pople and Hehre produced a paper in 1978 outlining a method used in their Gaussian software which apparently performs a cartesian rotation to cause most of the busy work to go away, which is supposedly really fast. There was a method by Dupuis, Rys and King in the 1970s which generates higher angular momentum integrals by some method of quadrature. A paper by McMurchie and Davidson in 1978 detailed a method of recursion which generates the higher angular momentum gaussians from an s-type seed using hermite polynomials. Another by Obara and Saika in 1986 broke the system down into a three center gaussian integral and used recurrence relations to generate the integrals by another method of recursion. And still a further paper involving Pople elaborated something similar (I found the paper, but never read this one).
Because I had found a secondary source from a more modern quantum chemist detailing the method, I focused on McMurchie and Davidson. This method gave several fairly interesting techniques that I was able to successfully program. I learned here how to program recursive functions since McMurchie calculates their integrals by a recurrence relation.
About this time, also, I was butting my head against the difficulty that I had no way of really checking whether my integrals were functioning properly or not. For Szabo and Ostlund, values for the integrals are only given involving s-type orbitals. I had nothing in the way of numbers for p-type orbitals. I could tell that my Hartree-Fock routine was not converging, but I couldn’t tell why. I tried making a comparison calculation in Mathematica, but the integration failed miserably. I might’ve been able to go further with Mathematica, if only the time step of figuring out how to do that programming wasn’t steeper –when the functions aren’t directly in their colossal library, implementation can become pretty hard in Mathematica. Having hammered on the Boys function until I could basically build nothing more stable and then after I found a way to diagnose whether or not my routines were producing symmetric matrices, I had no other way of telling where faults existed. These integrals are not easy by hand.
I dug a paper out of Taketa, Huzinaga and O-hata in 1966 that was about the only paper I could find which actually reported values for these integrals given a particular set-up. Apparently, after 1966, it stopped being a novelty to show that you were able to calculate these integrals, so nobody has actually published values in the last fifty years! Another paper by Shavitt and Karplus a few years earlier references values calculated at MIT still earlier, but aside from these, I struggled to find reference values. This experience is a formative one because it shows how hard you have to work to be competitive if you’re not actually in the in-club of the field –for modern workers, the problem is solved and you refer to a program built during that time which can do these operations.
Comparing to Taketa, using the McMurchie-Davidson method, I was able to confirm that my overlap and kinetic energy integrals were functioning properly. The nuclear attraction integral was a bust, no dice and no better in the electron repulsion integrals. They worked for s-orbitals, but not for p-type and higher. Unfortunately, Taketa and company had mistransliterated one of their basis functions in a previous paper, leading me to worry that maybe the paper was actually lying about the values they reported. I eventually decided that Shavitt was probably not also lying, meaning that there was still something wrong with my integration, even as I had hammered on McMurchie until smoke was coming out of my ears and I was sure I implemented it correctly.
This was rock bottom for me. You can sort of see the VH1 special: and here he hit rock bottom. I didn’t know what else to do; I was failing at finding alternative ways to generate appropriate reference checks and simply could not see what was wrong in my programming. I had no small amount of doubt about my fitness to perform computer programming.
My selected path forward was to learn how to implement Obara-Saika and to turn that against McMurchie. This method is also a recursive method to perform exactly the same derivatives without actually doing a long-form derivative. Initially, Obara-Saika also gave a value different from Taketa and also Shavitt, but I was able to track down a -1 that changed everything. Suddenly, Obara-Saika was giving values right on with Taketa.
When I tried a comparison by hand of the outcomes of McMurchie-Davidson against those of Obara-Saika on the very simplest case of p-type primitives, I found that these two methods produce different values. For the same problem, Obara-Saika does not produce the same result as McMurchie-Davidson. I conclude that I either misunderstood McMurchie-Davidson, which is possible, or that the technique is either willfully mangled in the paper to protect a possibly valuable piece of work from a competitor or it’s simply wrong (somebody tell Helgaker: he actually teaches this method in classes where he is reproducing it faithfully, part of why I originally trusted it). I do not know why McMurchie-Davidson fails in my hands because everything they do looks right!
After a huge amount of work, I broke through. My little python program was able to reproduce some historical Hartree-Fock calculations.
This image is a maya vi plot depicting the quantum mechanical electron density of water (H2O). The oxygen is centered while the two hydrogens are at the points of the stems. This calculation used the minimal basis set of STO-3G, which places three primitive gaussians for each orbital. Each hydrogen contains only a 1s orbital while the carbon contains a 1s, a 2s and three 2p orbitals in the x, y and z directions. The image above is threshold filtered to show the low density parts of the distribution, where the intensities are near zero, which was necessary because the oxygen has so much higher density than the hydrogens that you cannot see anything but the inner shell of the oxygen when plotting on absolute density. This system has ten electrons in five closed molecular orbitals where the electron density represents the superposition of those orbitals. The reported energies were right on the values expected for an STO-3G calculation.
With water, I worried initially that the calculation wouldn’t converge if I didn’t make certain symmetry considerations about the molecule, but that turned out to be unnecessary. After I solved my integral problems, the calculation converged immediately.
I also spent some time on methane using the same tools and basis…
With methane (CH4) you can see quite clearly the tetrahedral shape of the molecule with the carbon at center and the hydrogens arranged in a halo around it. This image was also filtered to show the low density regions.
I had some really stunning revelations about organic chemistry when I examined the molecular orbital structure of methane somewhat more closely. Turns out that what you learn in organic chemistry class is a lie! I’ll talk about this in an independent blog post because it deserves to be highlighted, front and center.
As kind of a conclusion here, I will note that STO-3G is not the most perfect modern basis set for doing quantum mechanical calculations… and even with the most modern basis, you can’t quite get there. Hartree-Fock does not include cross-correlation between electrons with differing spin and therefore converges to a limit called the Hartree-Fock limit. The difference seen between molecular energies calculated at the Hartree-Fock limit and those actually observed in experiment is referred to as the correlation energy, which can be calculated with greater accuracy using more modern Post-Hartree-Fock techniques using higher quality basis sets than STO-3G. With a basis of infinite size and with calculations that include the correlation energy, you get close to the truth. What is seen here is still just another approximation… better than the last, but still just short of reality.
My little python program probably can’t go there without a serious redesign that would take more time than I currently have available (and probably involve me learning FORTRAN). The methane calculation took 12 seconds –as compared to molecular hydrogen, which took 5% of a second. Given the scaling of the problem, benzene (6 carbons and 6 hydrogens) would take something close to 2 hours to calculate and maybe all night to plot. This using only STO-3G, three gaussians per orbital, which is dinky and archaic compared to a more modern basis set, which might have 50 or 60 functions for the basis of a single atom. Compared to what modern programs can do, benzene itself is but a toy.