Chemical Orbitals from Eigenstates

A small puzzle I recently set for myself was finding out how the hydrogenic orbital eigenstates give rise to the S- P- D- and F- orbitals in chemistry (and where s, p, d and f came from).

The reason this puzzle is important to me is that many of my interests sort of straddle how to go from the angstrom scale to the nanometer scale. There is a cross-over where physics becomes chemistry, but chemists and physicists often look at things very differently. I was not directly trained as a P-chemist; I was trained separately as a Biochemist and a Physicist. Remarkably, the Venn diagrams describing the education for these pursuits only overlap slightly. When Biochemists and Molecular Biologists talk, the basic structures below that are frequently just assumed (the scale here is >1nm), while Physicists frequently tend to focus their efforts toward going more and more basic (the scale here is <1 Angstrom). This leads to a clear non-overlap in the scale where chemistry and P-chem are relevant (~1 angstrom). Quite ironically, the whole periodic table of the elements lies there. I have been through P-chem and I’ve gotten hit with it as a Chemist, but this is something of an inconvenient scale gap for me. So, a cat’s paw of mine has been understanding, and I mean really understanding, where quantum mechanics transitions to chemistry.

One place is understanding how to get from the eigenstates I know how to solve to the orbitals structuring the periodic table.


This assemblage is pure quantum mechanics. You learn a huge amount about this in your quantum class. But, there are some fine details which can be left on the counter.

One of those details for me was the discrepancy between the hydrogenic wave functions and the orbitals on the periodic table. If you aren’t paying attention, you may not even know that the s-, p-, d- orbitals are not all directly the hydrogenic eigenstates (or perhaps you were paying a bit closer attention in class than I was and didn’t miss when this detail was brought up). The discrepancy is a very subtle one because often times when you start looking for images of the orbitals, the sources tend to freely mix superpositions of eigenstates with direct eigenstates without telling why the mixtures were chosen…

For example, here are the S, P and D orbitals for the periodic table:


This image is from Focusing on the P row, how is it that these functions relate to the pure eigenstates? Recall the images that I posted previously of the P eigenstates:

P-orbital probabiltiy densityorbital21-1 squared2

In the image for the S, P and D orbitals, of the Px, Py and Pz orbitals, all three look like some variant of P210, which is the pure state on the left, rather than P21-1, which is the state on the right. In chemistry, you get the orbitals directly without really being told where they came from, while in physics, you get the eigenstates and are told somewhat abstractly that the s-, p-, d- orbitals are all superpositions of these eigenstates. I recall seeing a professor during an undergraduate quantum class briefly derive Px and Py, but I really didn’t understand why he selected the combinations he did! Rationally, it makes sense that Pz is identical to P210 and that Px and Py are superpositions that have the same probability distribution as Pz, but are rotated into the X-Y plane ninety degrees from one another. How do Px and Py arise from superpositions of P21-1 and P211? P21-1 and P211 have identical probability distributions despite having opposite angular momentum!

Admittedly, the intuitive rotations that produce Px and Py from Pz make sense at a qualitative level, but if you try to extend that qualitative understanding to the D-row, you’re going to fail. Four of the D orbitals look like rotations of one another, but one doesn’t. Why? And why are there four that look identical? I mean, there are only three spatial dimensions to fill, presumably. How do these five fit together three dimensionally?

Except for the Dz^2, none of the D-orbitals are pure eigenstates: they’re all superpositions. But what logic produces them? What is the common construction algorithm which unites the logic of the D-orbitals with that of the P-orbitals (which are all intuitive rotations).

I’ll actually hold back on the math in this case because it turns out that there is a simple revelation which can give you the jump.

As it turns out, all of chemistry is dependent on angular momentum. When I say all, I really do mean it. The stability of chemical structures is dependent on cases where angular momentum has tended in some way to cancel out. Chemical reactivity in organic chemistry arises from valence choices that form bonds between atoms in order to “complete an octet,” which is short-hand for saying that species combine with each other in such a way that enough electrons are present to fill in or empty out eight orbitals (roughly push the number of electrons orbiting one type of atom across the periodic table in its appropriate row to match the noble gases column). For example, in forming the salt crystal sodium chloride, sodium possesses only one electron in its valence shell while chlorine contains seven: if sodium gives up one electron, it goes to a state with no need to complete the octet (with the equivalent electronic completion of neon), while chlorine gaining an electron pushes it into a state that is electronically equal to argon, with eight electrons. From a physicist stand-point, this is called “angular momentum closure,” where the filled orbitals are sufficient to completely cancel out all angular momentum in that valence level. As another example, one highly reactive chemical structure you might have heard about is a “radical” or maybe a “free radical,” which is simply chemist shorthand for the situation a physicist would recognize contains an electron with uncancelled spin and orbital angular momentum. Radical driven chemical reactions are about passing around this angular momentum! Overall, reactions tend to be driven to occur by the need to cancel out angular momentum. Atomic stoichiometry of a molecular species always revolves around angular momentum closure –you may not see it in basic chemistry, but this determines how many of each atom can be connected, in most cases.

From the physics, what can be known about an orbital is essentially the total angular momentum present and what amount of that angular momentum is in a particular direction, namely along the Z-axis. Angular momentum lost in the X-Y plane is, by definition, not in either the X or Y direction, but in some superposition of both. Without preparing a packet of angular momentum, the distribution ends up having to be uniform, meaning that it is in no particular direction except not in the Z-direction. For the P-orbitals, the eigenstates are purely either all angular momentum in the Z-direction, or none in that direction. For the D-orbitals, the states (of which there are five) can be combinations, two with angular momentum all along Z, two with half in the X-Y plane and half along Z and one with all in the X-Y plane.

What I’ve learned is that, for chemically relevant orbitals, the general rule is “minimal definite angular momentum.” What I mean by this is that you want to minimize situations where the orbital angular momentum is in a particular direction. The orbits present on the periodic table are states which have canceled out angular momentum located along the Z-axis. This is somewhat obvious for the homology between P210 and Pz. P210 points all of its angular momentum perpendicular to the z-axis. It locates the electron on average somewhere along the Z-axis in a pair of lobes shaped like a peanut, but the orbital direction is undefined. You can’t tell how the electron goes around.

As it turns out, Px and Py can both be obtained by making simple superpositions of P21-1 and P211 that cancel out z-axis angular momentum… literally adding together these two states so that their angular momentum along the z-axis goes away. Px is the symmetric superposition while Py is the antisymmetric version. For the two states obtained by this method, if you look for the expectation value of the z-axis angular momentum, you’ll find it missing! It cancels to zero.

It’s as simple as that.

The D-orbitals all follow. D320 already has no angular momentum on the z-axis, so it is directly Dzz. You therefore find four additional combinations by simply adding states that cancel the z-axis angular momentum: D321 and D32-1 symmetric and antisymmetric combinations and then the symmetric and antisymmetric combinations of D322 and D32-2.

Notice, all I’m doing to make any of these states is by looking at the last index (the m-index) of the eignstates and making a linear combination where the first index plus the second gives zero. 1-1 =0, 2-2=0. That’s it. Admittedly, the symmetric combination sums these with a (+) sign and a 1/sqrt(2) weighting constant so that Px = (1/sqrt(2))(P21 + P21-1) is normalized and the antisymmetric combination sums with a (-) sign as in Py = (1/sqrt(2))(P211 – P21-1), but nothing more complicated than that! The D-orbitals can be generated in exactly the same manner. I found one easy reference on line that loosely corroborated this observation, but said it instead as that the periodic table orbitals are all written such that the wave functions have no complex parts… which is also kind of true, but somewhat misleading because you sometimes have to multiply by a complex phase to put it genuinely in the form of sines for the polar coordinate (and as the polar coordinate is integrated over 360 degrees, expectation values on this coordinate, as z-axis momentum would contain, cancel themselves out; sines and cosines integrated over a full period, or multiples of a full period, integrate to zero.)

Before I wrap up, I had a quick intent to touch on where S-, P-, D- and F- came from. “Why did they pick those damn letters?” I wondered one day. Why not A-, B-, C- and D-? The nomenclature emerged from how spectral lines appeared visually and groups were named: (S)harp, (P)rincipal, (D)iffuse and (F)undamental. (A second interesting bit of “why the hell???” nomenclature is the X-ray lines… you may hate this notation as much as me: K, L, M, N, O… “stupid machine uses the K-line… what does that mean?” These letters simply match the n quantum number –the energy level– as n=1,2,3,4,5… Carbon K-edge, for instance, is the amount of energy between the n=1 orbital level and the ionized continuum for a carbon atom.) The sharpness tends to reflect the complexity of the structure in these groups.

As a quick summary about structuring of the periodic table, S-, P-, D-, and F- group the vertical columns (while the horizontal rows are the associated relative energy, but not necessarily the n-number). The element is determined by the number of protons present in the nucleus, which creates the chemical character of the atom by requiring an equal number of electrons present to cancel out the total positive charge of the nucleus. Electrons, as fermions, are forced to occupy distinct orbital states, meaning that each electron has a distinct orbit from every other (fudging for the antisymmetry of the wave function containing them all). As electrons are added to cancel protons, they fall into the available orbitals depicted in the order on the periodic table going from left to right, which can be a little confusing because they don’t necessarily purely close one level of n before starting to fill S-orbitals of the next level of n; for example at n=3, l can equal 0, 1 and 2… but, the S-orbitals for n=4 will fill before D-orbitals for n=3 (which are found in row 4). This has purely to do with the S-orbitals having lower energy than P-orbitals which have lower energy than D-orbitals, but that the energy of an S-orbital for a higher n may have lower energy than the D-orbital for n-1, meaning that the levels fill by order of energy and not necessarily by order to angular momentum closure, even though angular momentum closure influences the chemistry. S-, P-, D-, and F- all have double degeneracy to contain up and down spin of each orbital, so that S- contains 2 instead of 1, P- contains 6 instead of 3, and D- from 10 instead of 5. If you start to count, you’ll see that this produces the numerics of the periodic table.

Periodic table is a fascinating construct: it contains a huge amount of quantum mechanical information which really doesn’t look much like quantum mechanics. And, everybody has seen the thing! An interesting test to see the depth of a conversation about periodic table is to ask those conversing if they understand why the word “periodic” is used in the name “Periodic table of the elements.” The choice of that word is pure quantum mechanics.


Parity symmetry in Quantum Mechanics

I haven’t written about my problem play for a while. Since last I wrote about rotational problems, I’ve gone through the entire Sakurai chapter 4, which is an introduction to symmetry. At the moment, I’m reading Chapter 5 while still thinking about some of the last few problems in Chapter 4.

I admit that I had a great deal of trouble getting motivated to attack the Chapter 4 problems. When I saw the first aspects of symmetry in class, I just did not particularly understand it. Coming back to it on my own was not much better. Abstract symmetry is not easy to understand.

In Sakurai chapter 4, the text delves into a few different symmetries that are important to quantum mechanics and pretty much all of them are difficult to see at first. As it turns out, some of these symmetries are very powerful tools. For example, use of the reflection symmetry operation in a chiral molecule (like the C-alpha carbon of proteins or the hydrated carbons of sugars) can reveal neighboring degenerate ground states which can be accessed by racemization, where an atomic substituent of the molecule tunnels through the plane of the molecule and reverses the chirality of the state at some infrequent rate. Another example is translation symmetry operation, where a lattice of identical attractive potentials serves to hide a near infinite number of identical states where a bound particle can hop from one minimum to the next and traverse the lattice… this behavior essentially a specific model describing the passage of electrons through a crystalline semiconductor.

One of the harder symmetries was time reversal symmetry. I shouldn’t say “one of the harder;” for me time reversal was the hardest to understand and I would be hesitant to say that I completely understand it yet. Time reversal operator causes time to translate backward, making momenta and angular momenta reverse. Time reversal is really hard because the operator is anti-unitary, meaning that the operation switches the sign on complex quantities that it operates on. Nevertheless, time reversal has some interesting outcomes. For instance, if a spinless particle is bound to a fixed center where the state in question is not degenerate (Only one state at the given energy), time reversal says that the state can have no average angular momentum (it can’t be rotating or orbiting). On the other hand, if the particle has spin, the bound state must be degenerate because the particle can’t have no angular momentum!

A quick digression here for the laymen: in quantum mechanics, the word “degenerate” is used to refer to situations where multiple states lie on top of one another and are indistinguishable. Degeneracy is very important in quantum mechanics because certain situations contain only enough information to know an incomplete picture of the model where more information is needed to distinguish alternative answers… coexisting alternatives subsist in superposition, meaning that a wave function is in a superposition of its degenerate alternative outcomes if there is no way to distinguish among them. This is part of how entanglement arises: you can generate entanglement by creating a situation where discrete parts of the system simultaneously occupy degenerate states encompassing the whole system. The discrete parts become entangled.

Symmetry is important because it provides a powerful tool by which to break apart degeneracy. A set of degenerate states can often be distinguished from one another by exploiting the symmetries present in the system. L- and R- enantiomers in a molecule are related by a reflection symmetry at a stereo center, meaning that there are two states of indistinguishable energy that are reflections of one another. People don’t often notice it, but chemists are masters of quantum mechanics even though they typically don’t know as much of the math: how you build molecules is totally governed by quantum mechanics and chemists must understand the qualitative results of the physical models. I’ve seen chemists speak competently of symmetry transformations in places where the physicists sometimes have problems.

Another place where symmetry is important is in the search for new physics. The way to discover new physical phenomena is to look for observational results that break the expected symmetries of a given mathematical model. The LHC was built to explore symmetries. Currently known models are said to hold CPT symmetry, referring to Charge, Parity and Time Reversal symmetry… I admit that I don’t understand all the implications of this, but simply put, if you make an observation that violates CPT, you have discovered physics not accounted for by current models.

I held back talking about Parity in all this because I wanted to speak of it in greater detail. Of the symmetries covered in Sakurai chapter 4, I feel that I made the greatest jump in understanding on Parity.

Parity is symmetry under space inversion.


Just saying that sounds diabolical. Space inversion. It sounds like that situation in Harry Potter where somebody screws up trying to disapparate and manages to get splinched… like they space invert themselves and can’t undo it.

The parity operation carries all the cartesian variables in a function to their negative values.

parity operation

Here Phi just stands in for the parity operator. By performing the parity operation, all the variables in the function which denote spatial position are turned inside out and sent to their negative value. Things get splinched.

You might note here that applying parity twice gets you back to where you started, unsplinching the splinched. This shows that parity operator has the special property that it is it’s own inverse operation. You might understand how special this is by noting that we can’t all literally be our own brother, but the parity operator basically is.


Applying parity twice is like multiplying by 1… which is how you know parity is its own inverse. This also makes parity a unitary operator since it doesn’t effect absolute value of the function. Parity operation times inverse parity is one, so unitary.

parity3 or parity4

Here, the daggered superscript means “complex conjugate” which is an automatic requirement for the inverse operation if you’re a unitary operator. Hello linear algebra. Be assured I’m not about the break out the matrices, so have no fear. We will stay in a representation free zone. In this regard, parity operation is very much like a rotation: the inverse operation is the complex conjugate of the operation, never mind the details that the inverse operation is the operation.

Parity symmetry is “symmetry under the parity operation.” There are many states that are not symmetric under parity, but we would be interested in searching particularly for parity operation eigenstates, which are states that parity operator will transform to give back that state times some constant eigenvalue. As it turns out, the parity operator can only ever have two eigenvalues, which are +1 and -1. A parity eigenstate is a state that only changes its sign (or not) when acted on by the parity operator. The parity eigenvalue equations are therefore:


All this says is that under space inversion, the parity eigenstates will either not be affected by the transformation, or will be negative of their original value. If the sign doesn’t change, the state is symmetric under space inversion (called even). But, if the sign does change, the state is antisymmetric under space inversion (called odd). As an example, in a space of one dimension (defined by ‘x’), the function sine is antisymmetric (odd) while the function cosine is symmetric (even).


In this image, taken from a graphing app on my smartphone, the white curve is plain old sine while the blue curve is the parity transformed sine. As mentioned, cosine does not change under parity.

As you may be aware, sines and cosines are energy eigenstates for the particle-in-the-box problem and so would constitute one example of legit parity eigenstates with physical significance.

Operators can also be transformed by parity. In order to see the significance, you just note that the definition of parity is that the position operation is reversed. So, a parity transformation of the position operator is this:


Kind of what should be expected. Position under parity turns negative.

As expressed, all of this is really academic. What’s the point?

Parity can give some insights that have deep significance. The deepest result that I understood is that matrix elements and expectation values will conserve with parity transformation. Matrix elements are a generalization of the expectation value where the bra and ket are not necessarily to the same eigenfunction. The proof of the statement here is one line:


At the end, the squiggles all denote parity transformed values, ‘m’ and ‘n’ are blanket eigenstates with arbitrary parity eigenvalues and V is some miscellaneous operator. First, the complex conjugation that turns a ket into a bra does not affect the parity eigenvalue equation, since parity is its own inverse operation and since the eigenvalues of 1 and -1 are not complex, so the bra above has just the same eigenvalue as if it were a ket. So, the matrix element does not change with the parity transformation –the combined parity transformation of all these parts are as if you just multiplied by identity a couple times, which should do nothing but return the original value.

What makes this important is that it sets a requirement on how many -1 eigenvalues can appear within the parity transformed matrix element (which is equal to the original matrix element): it can never be more than an even number (either zero or two). For the element to exist (that is, for it to have a non-zero value), if the initial and final states connected by the potential are both parity odd or parity even, the potential connecting them must be symmetric. Conversely, if the potential is parity odd, either the initial or final state must be odd, while the other is even. To sum up, a parity odd operator has non-zero matrix elements only when connecting states of differing parity while a parity even operator must connect states of the same parity. This restriction is observed simply by noting that the sign can’t change between a matrix element and the parity transformed matrix element.

Now, since an expectation value (average position, for example) is always a matrix element connecting an eigenket to itself, expectation values can only be non-zero for operators of even parity. For example, in a system defined across all space, average position ends up being zero because the position operator is odd, while both eigenbra and eigenket are of the same function, and therefore have the same parity. For average position to be non-zero, the wavefunction would need to be a superposition of eigenkets of opposite parity (and therefore not an eigenstate of parity at all!)

A tangible, far reaching result of this symmetry, related particularly to the position operator, is that no pure eigenstate can have an electric dipole moment. The dipole moment operator is built around the position operator, so a situation where position expectation value goes to zero will require dipole moment to be zero also. Any observed electric dipole moment must be from a mixture of states.

If you stop and think about that, that’s really pretty amazing. It tells you whether an observable is zero or not depending on which eigenkets are present and whether the operator for that observable can be inverted or not.

Hopefully I got that all correct. If anybody more sophisticated than me sees holes in my statement, please speak up!

Welcome to symmetry.

(For the few people who may have noticed, I still have it in mind to write more about the magnets puzzle, but I really haven’t had time recently. Magnets are difficult.)

What is a qubit?

I was trolling around in the comments of a news article presented on Yahoo the other day. What I saw there has sort of stuck with me and I’ve decided I should write about it. The article in question, which may have been by an outfit other than Yahoo itself, was about the recent decision by IBM to direct a division of people toward the task of learning how to program a quantum computer.

Using the word ‘quantum’ in the title of a news article is a sure fire way to incite click-bait. People flock in awe to quantum-ness even if they don’t understand what the hell they’re reading. This article was a prime example. All the article really talked about was that IBM has decided that quantum computers are now a promising enough technology that they’re going to start devoting themselves to the task of figuring out how to compute with them. Note, the article spent a lot of time kind of masturbating over how marvelous quantum computers will be, but it really actually didn’t say anything new. Another tech company deciding to pretend to be in quantum computing by figuring out how to program an imaginary computer is not an advance in our technology… digital quantum computers are generally agreed to be at least a few years off yet and they’ve been a few years off for a while now. There’s no guarantee that the technology will suddenly emerge into the mainstream –and I’m neglecting the DSpace quantum computer because it is generally agreed among experts that DSpace hasn’t even managed to prove that their qubits remain coherent through a calculation to actually be a useful quantum computer, let alone that they achieved anything at all by scaling it up.

The title of this article was a prime example of media quantum click-bait. The title boldly declared that “IBM is planning to build a quantum computer millions of times faster than a normal computer.” Now, that title was based on an extrapolation in the midst of the article where a quantum computer containing a mere 1000 qubits suddenly becomes the fastest computing machine imaginable. We’re very used to computers that contain gigabytes of RAM now, which is actually several billion on-off switches on the chip, so a mere 1,000 qubits seems like a really tiny number. This should be underwritten with the general concerns of the physics community that an array of 100 entangled qubits may exceed what’s physically possible… and it neglects that the difficulty of dealing with entangled systems increases exponentially with the number of qubits to be entangled. Scaling up normal bits doesn’t bump into the same difficulty. I don’t know if it’s physically possible or not, but I am aware that IBM’s declaration isn’t a major break-through so much as splashing around a bit of tech gism to keep the stockholders happy. All the article really said was that IBM has happily decided to hop on the quantum train because that seems to be the thing to do right now.

I really should understand that trolling around in the comments on such articles is a lost cause. There are so many misconceptions about quantum mechanics running around in popular culture that there’s almost no hope of finding the truth in such threads.

All this background gets us to what I was hoping to talk about. One big misconception that seemed to be somewhat common among commenters on this article is that two identical things in two places actually constitute only one thing magically in two places. This may stem from a conflation of what a wave function is versus what a qubit is and it may also be a big misunderstanding of the information that can be encoded in a qubit.

In a normal computer we all know that pretty much every calculation is built around representing numbers using binary. As everybody knows, a digital computer switch has two positions: we say that one position is 0 and the other is 1. An array of two digital on-off switches then can produce four distinct states: in binary, to represent the on-off settings of these states, we have 00, 01, 10 and 11. You could easily map those four settings to mean 1, 2, 3 and 4.

Suppose we switch now to talk about a quantum computer where the array is not bits anymore, but qubits. A very common qubit to talk about is the spin of an atom or an electron. This atom can be in two spin states: spin-up and spin-down. We could easily map the state spin-up to be 1, and call it ‘on,’ while spin-down is 0, or ‘off.’ For two qubits, we then get the states 00, 01, 10 and 11 that we had before, where we know about what states the bits are in, but we also can turn around and invoke entanglement. Entanglement is a situation where we create a wave function that contains multiple distinct particles at the same time such that the states those particles are in are interdependent on one another based upon what we can’t know about the system as a whole. Note, these two particles are separate objects, but they are both present in the wave function as separate objects. For two spin-up/spin-down type particles, this can give access to the so-called singlet and triplet states in addition to the normal binary states that the usual digital register can explore.

The quantum mechanics works like this. For the system of spin-up and spin-down, the usual way to look at this is in increments of spinning angular momentum: spin-up is a 1/2 unit of angular momentum pointed up while spin-down is -1/2 unit of angular moment, but pointed the opposite direction because of the negative sign. For the entangled system of two such particles, you can get three different values of entangled angular momentum: 1, 0 and -1. Spin 1 has both spins pointing up, but not ‘observed,’ meaning that it is completely degenerate with the 11 state of the digital register since it can’t fall into anything but 11 when the wave function collapses. Spin -1 is the same way: both spins are down, meaning that they have 100% probability of dropping into 00. The spin 0 state, on the other hand, is kind of screwy, and this is where the extra information encoding space of quantum computing emerges. The 0 states could be the symmetric combination of spin-up with spin-down or the anti-symmetric combination of the same thing. Now, these are distinct states, meaning that the size of your register just expanded from (00, 01, 10 and 11) to (00, 01, 10, 11 plus anti-symmetric 10-01 and symmetric 10+01). So, the two qubit register can encode 6 possible values instead of just 4. I’m still trying to decide if the spin 1 and -1 states could be considered different from 11 and 00, but I don’t think they can since they lack the indeterminacy present in the different spin 0 states. I’m also somewhat uncertain whether you have two extra states to give a capacity in the register of 6 or just 5 since I’m not certain what the field has to say about the practicality of determining the phase constant between the two mixed spin-up/spin-down eigenstates, since this is the only way to determine the difference between the symmetric and anti-symmetric combinations of spin.

As I was writing here, I realized also that I made a mistake myself in the interpretation of the qubit as I was writing my comment last night. At the very unentangled minimum, an array of two qubits contains the same number of states as an array of two normal bits. If I consider only the states possible by entangled qubits, without considering the phasing constant between 10+01 and 10-01, this gives only three states, or at most four states with the phase constant. I wrote my comment without including the four purely unentangled cases, giving fewer total states accessible to the device, or at most the same number.

Now, the thing that makes this incredibly special is that the number of extra states available to a register of qubits grows exponentially with the number of qubits present in the register. This means that a register of 10 qubits can encode many more numbers than a register of ten bits! Further, this means that fewer bits can be used to make much bigger calculations, which ultimately translates to a much faster computer if the speed of turning over the register is comparable to that of a more conventional computer –which is actually somewhat doubtful since a quantum computer would need to repeat calculations potentially many times in order to build up quantum statistics.

One of the big things that is limiting the size of quantum computers at this point is maintaining coherence. Maintaining coherence is very difficult and proving that the computer maintains all the entanglements that you create 100% of the time is exceptionally non-trivial. This comes back to the old cat-in-the-box difficulty of truly isolating the quantum system from the rest of the universe. And, it becomes more non-trivial the more qubits you include. I saw a seminar recently where the presenting professor was expressing optimism about creating a register of 100 Josephson junction type qubits, but was forced to admit that he didn’t know for sure whether it would work because of the difficulties that emerge in trying to maintain coherence across a register of that size.

I personally think it likely that we’ll have real digital quantum computers in the relatively near future, but I think the jury is still out as to exactly how powerful they’ll be when compared to conventional computers. There are simply too many variables yet which could influence the power and speed of a quantum computer in meaningful ways.

Coming back to my outrage at reading comments in that thread, I’m still at ‘dear god.’ Quantum computers do not work by teleportation: they do not have any way of magically putting a single object in multiple places. The structure of a wave function is defined simply by what you consider to be a collection of objects that are simultaneously isolated from the rest of the universe at a given time. A wave function quite easily spans many objects all at once since it is merely a statistical description of the disposition of that system as seen from the outside, and nothing more. It is not exactly a ‘thing’ in and of itself insomuch as collections of indescribably simple objects tend to behave in absolutely consistent ways among themselves. Where it becomes wave-like and weird is that we have definable limits to how precisely we can understand what’s going on at this basic level and that our inability to directly ‘interact’ with that level more or less assures that we can’t ever know everything about that level or how it behaves. Quantum mechanics follows from there. It really is all about what’s knowable; building a situation where certain things are selectively knowable is what it means to build a quantum computer.

That’s admittedly pretty weird if you stop and think about it, but not crazy or magical in that wide-eyed new agey smack-babbling way.

Beyond F=ma

Every college student taking that requisite physics class sees Newton’s second law. I saw it once even in a textbook for a martial art: Force equals mass times acceleration… the faster you go, the harder you hit! At least, that’s what they were saying, never mind that the usage wasn’t accurate. F=ma is one of those crazy simple equations that is so bite-sized that all of popular culture is able to comprehend it. Kind of.

Newton’s second law is, of course, one of three fundamental laws. You may even already know all of Newton’s laws without realizing that you do. The first law is “An object in motion remains in motion while an object at rest remains at rest,” which is really actually just a specialization of Newton’s second law where F = 0. Newton’s third law is the ever famous “For every action there is an equal and opposite reaction.” The three laws together are pretty much everything you need to get started on physics.

Much is made of Newton’s Laws in engineering. Mostly, you can comprehend how almost everything in the world around you operates based on a first approximation with Newton’s Laws. They are very important.

Now, as a Physicist, freshman physics is basically the last time you see Newton’s Laws. However important they are, physicists prefer to go other directions.

What? Physicists don’t use Newton’s Laws?!! Sacrilege!

You heard me right. Most of modern physics opens out beyond Newton. So, what do we use?

Believe it or not, in the time before computer games, TVs and social media, people needed to keep themselves entertained. While Newton invented his physics in the 1600s, there were a couple hundred years yet between his developments and the era of modern physics… two hundred years even before electrodynamics and thermodynamics became a thing. In that time, physicists were definitely keeping themselves entertained. They did this by reinventing the wheel repeatedly!

As a field, classical mechanics is filled with the arcane formalisms that gird the structure of modern physics. If you want to understand Quantum Mechanics, for instance, it did not emerge from a vacuum; it was birthed from all this development between Newtonian Mechanics and the Golden years of the 20th century. You can’t get away from it, in fact. People lauding Quantum Mechanics as somehow breaking Classical physics generally don’t know jack. Without the Classical physics, there would be no Quantum Mechanics.

For one particular thread, consider this. Heisenberg Uncertainty Principle depends on operator commutation relations, or commutators. Commutators, then, emerged from an arcanum called Poisson brackets. Poisson brackets emerged from a structure called Hamiltonian formalism. And, Hamiltonian formalism is a modification of Lagrangian formalism. Lagrangian formalism, finally, is a calculus of variations readjustment from D’Alembert’s principle which is a freaky little break from Newtonian physics. If you’ve done any real quantum, you’ll know that you can’t escape from the Hamiltonians without tripping over Lagrangians.

This brings us to what I was hoping to talk about. Getting past Newton’s Laws into this unbounded realm of the great Beyond is a non-trivial intellectual break. When I called it a freaky little break, I’m not kidding. Everything beyond that point hangs together logically, but the stepping stone at the doorway is a particularly high one.

Perhaps the easiest way to see the depth of the jump is to see the philosophy of how mechanics is described on either side.

With Newton’s laws, the name of the game is to identify interactions between objects. An ‘interaction’ is another name for a force. If you lean back against a wall, there is an interaction between you and the wall, where you and the wall exert forces on one another. Each interaction corresponds to a pair of forces: the wall pushing against you and you pushing against the wall. Newton’s second law then states that if the sum of all forces acting on one object are not equal to zero, that the object will undergo an acceleration in some direction and the instantaneous forces then work together to describe the path the object will travel. The logical strategy is to find the forces and then calculate the accelerations.

On the far side of the jump is the lowest level of non-Newtonian mechanics, Lagrangian mechanics. You no longer work with forces at all and everything is expressed instead using energies. The problem proceeds by generating an energy laden mathematical entity called a ‘Lagrangian’ and then pushing that quantity through a differential equation called Lagrange’s equation. After constructing Lagrange’s equation, you gain expressions for position as a function of time. This tells you ultimately the same information that you gain by working Newton’s laws, which is that some object travels along a path through space.

Reading these two paragraphs side-by-side should give you a sense of the great difference between these two methods. Newtonian mechanics is typically very intuitive since it divides up the problem into objects and interactions while Lagrangian mechanics has an opaque, almost clinical quality that defies explanation. What is a Lagrangian? What is the point of Lagrange’s equation? This is not helped by the fact that Lagrangian formalism usually falls into generalized coordinates, which can hide some facets of coordinate position in favor of expedience. To the beginner, it feels like turning a crank on a gumball machine and hoping answers pop out.

There is a degree of menace to it while you’re learning it the first time. The teaching of where Lagrange’s equation comes from is from an opaque branch of mathematics called the “Calculus of variation.” How very officious! Calculus of variation is a special calculus where the objective of the mathematics is to optimize paths. This math is designed to answer the question “What is the shortest path between two points?” Intuitively, you could say the shortest path is a line, but how do you know for sure? Well, you compare all the possible paths to each other and pick out the shortest among them. Calculus of variations does this by noting that for small variations from the optimal path, neighboring paths do not differ from each other by as much. So, in the collection of all paths, those that are most alike tend to cluster around the one that is most optimal.

This is a very weird idea. Why should the density of similar paths matter? You can have an infinite number of possible paths! What is variation from the optimal path? It may seem like a rhetorical question, but this is the differential that you end up working with.

A recasting of the variational problem can express one place where this kind of calculus was extremely successful.


Roller coasters!

Under action of gravity where you have no sliding friction, what is the fastest path traveling from point A to point B where point B does not lie directly beneath point A? This is the Brachistochrone problem. Calculus of variations is built to handle this! The strategy is to optimize a path of undetermined length which gives the shortest time of travel between two points. As it turns out, by happy mathematical contrivance, the appropriate path satisfies Lagrange’s equation… which is why Lagrange’s equation is important. The optimal path here is called the curve of quickest descent.

Now, the jump to Lagrangian mechanics is but a hop! It turns out that if you throw a mathematical golden cow called a “Lagrangian” into Lagrange’s equation, the optimal path that pops out is the physical trajectory that a given system described by the Lagrangian tends to follow in reality –and when I say trajectory in the sense of Lagrange’s equation, the ‘trajectory’ is delineated by position or merely the coordinate state of the system as a function of time. If you can express the system of a satellite over the Earth in terms of a Lagrangian, Lagrange’s equation produces the orbits.

This is the very top of a deep physical idea called the “Principle of Least Action.”

In physics, adding up the Lagrangian at every point along some path in time gives a quantity called, most appropriately, “the Action.” The system could conceivably take any possible path among an infinite number of different paths, but physical systems follow paths that minimize the Action. If you find the path that gives the smallest Action, you find the path the system takes.

As an aside to see where this reasoning ultimately leads, Quantum Mechanics finds that while objects tend to follow paths that minimize the Action, they actually try to take every conceivable path… but that the paths which don’t tend to minimize the Action rapidly cancel each other out because their phases vary so wildly from one another. In a way, the minimum Action path does not cancel out from a family of nearby paths since their phases are all similar. From this, a quantum mechanical particle can seem to follow two paths of equal Action at the same time. In a very real way, the weirdness of quantum mechanics emerges directly because of path integral formalism.

All of this, all of the ability to know this, starts with the jump to Lagrangian formalism.

In that, it always bothered me: why the Lagrangian? The path optimization itself makes sense, but why specifically does the Lagrangian matter? Take this one quantity out of nowhere and throw it into a differential equation that you’ve rationalized as ‘minimizing action’ and suddenly you have a system of mechanics that is equal to Newtonian mechanics, but somehow completely different from it! Why does the Lagrangian work? Through my schooling, I’ve seen the derivation of Lagrange’s equation from path integral optimization more than once, but the spark of ‘why optimize using the Lagrangian’ always eluded me. Early on, I didn’t even comprehend enough about the physics to appreciate that the choice of the Lagrangian is usually not well motivated.

So, what exactly is the Lagrangian?

Lagrangian is defined as the difference between kinetic and potential energy. Kinetic energy is the description that an object is moving while potential energy is the expression that by having a particular location in space, the object has the capacity to gain a certain motion (say by falling from the top of a building). The formalism can be modified to work where energy is not conservative, but typically physicists are interested in cases where it does conserve. Energies emerge in Newtonian mechanics as an adaption which allows descriptions of motion to be detached from progression through time, where the first version of energy the freshman physicist usually encounters is “Work.” Work is the Force over a displacement times the spatial length of that displacement. It’s just a product of length times force. And, there is no duration over which the displacement is known to take place, meaning no velocity or acceleration. Potential energy and kinetic energy come next, where kinetic energy is simply a way to connect physical velocity of the object to the work that has been done on it and potential energy is a way to connect a physical situation, typically in terms of a conservative field, to how much work that field can enact on a given object.

When I say ‘conservative,’ the best example is usually the gravitational field that you see under everyday circumstances. When you lift your foot to take a step, you do a certain amount of work against gravity to pick it up… when you set your foot back down, gravity does an equal amount of work on your foot pulling it down. Energy was invested into potential energy picking your foot up, which was then released again as you put your foot back down. And, since gravity worked on your foot pulling it down, your foot will have a kinetic energy equal to the potential energy from how high you raised it before it strikes the ground again and stops moving (provided you aren’t using your muscles to slow its decent). It becomes really mind-bending to consider that gravity did work on your foot while you lifted it up, also, but that your muscles did work to counteract gravity’s work so that your foot could raise. As a quantity, you can chase energy around in this way. In a system like a spring or a pendulum, there are minimal dispersive interactions, meaning that after you start the system moving, it can trade energy back and forth from potential to kinetic forms pretty much  without limit so that the sum of all energies never changes, which is what we call ‘conservative.’

Energy, as it turns out, is one of the chief tokens of all physics. In fields like thermodynamics, which are considered classical but not necessarily Lagrangian, you only rarely see force directly… usually force is hidden behind pressure. The idea that the quantity of energy can function as a gearbox for attaching interactions to one another conceals Newton’s laws, making it possible to talk about interactions without knowing exactly what they are. ‘Heat of combustion’ is a black-box of energy that tells you a way to connect the burning of a fuel to how much work can be derived from the pressure produced by that fuel’s combustion. On one side, you don’t need to know what combustion is, you can tell that it will deliver a stroke of so much energy when the piston compresses, while on the other side, you don’t need to know about the engine, just that you have a process that will suck away some of the heat of your fire to do… something.

Because of the importance of energy, two quantities that are of obvious potential utility are 1.) the difference between kinetic and potential energy  and 2.) the sum of kinetic and potential energy. The first quantity is the Lagrangian, while the second is the so-called Hamiltonian.

There is some clear motivation here why you would want to explore using the quantity of the Lagrangian in some way. Quantities that can conserve, like energy and momentum, are convenient ways of characterizing motion because they can tell you about what to expect from the disposition of your system without huge effort. But for all of these manipulations, the clear connection between F=ma and Lagrange’s equation is still a subtle leap.

The final necessary connection to get from F=ma to the Lagrangian is D’Alembert’s Principle. The principle states simply this: for a system in equilibrium, (rather, while the system isn’t static, it’s not taking in more or less energy than it’s losing) perturbative forces ultimately do no net work. So, all interactions internal to a system in equilibrium can’t shift it away from equilibrium. This statement turns out to be another variational principle.

There is a way to drop F = ma into D’Alembert’s principle and directly produce that the quantity which should be optimized in Lagrange’s equation is the Lagrangian! May not seem like much, but it turns out to be a convoluted mathematical thread… and so, Lagrangian formalism directly follows as a consequence of a special case of Newtonian formalism.

As a parting shot, what does all this path integral, variational stuff mean? The Principle of Least Action has really profound implications on the functioning of reality as a whole. In a way, classical physics observes that reality tends to follow the lazy path: a line is the shortest path between two points and reality operates in such a way that at macroscopic scales the world wants to travel in the equivalent of ‘straight lines.’ The world appears to be lazy. At the fundamental quantum mechanical scale, it thinks hard about the peculiar paths and even seems to try them out, but those efforts are counteract such that only the lazy paths win.

Reality is fundamentally slovenly, and when it tries not to be, it’s self-defeating. Maybe not the best message to end on, but it gives a good reason to spend Sunday afternoon lying in a hammock.

Nonlocality and Simplicity

I just read an article called “How quantum mechanics could be even weirder” in the Atlantic.

The article is actually relatively good in explaining some of how quantum mechanics actually works in terms that are appropriate to laymen.

Neglecting almost everything about ‘super-quantum,’ there is one particular element in this article which I feel somewhat compelled to respond to. It relates to the following passages

But in 1935, Einstein and two younger colleagues unwittingly stumbled upon what looks like the strangest quantum property of all, by showing that, according to quantum mechanics, two particles can be placed in a state in which making an observation on one of them immediately affects the state of the other—even if they’re allowed to travel light years apart before measuring one of them. Two such particles are said to be entangled, and this apparent instantaneous “action at a distance” is an example of quantum nonlocality.

Erwin Schrödinger, who invented the quantum wave function, discerned at once that what later became known as nonlocality is the central feature of quantum mechanics, the thing that makes it so different from classical physics. Yet it didn’t seem to make sense, which is why it vexed Einstein, who had shown conclusively in the theory of special relativity that no signal can travel faster than light. How, then, were entangled particles apparently able to do it?

This is outlining the appearance of entanglement. The way that it’s detailed here, the implication is that there’s a signal being broadcast between the entangled particles and that it breaks the limits of speed imposed by relativity. This is a real argument that is still going on, and not being an expert, I can’t claim that I’m at the level of the discussion. On the other hand, I feel fairly strongly that it can’t be considered a ‘communication.’ I’ll try to rationalize my stance below.

One thing that is very true is that if you think a bit about the scope of the topic and the simultaneous requirements of the physics in order to assure the validity of quantum mechanics, the entanglement phenomenon becomes less metaphysical overall.

Correcting several common misapprehensions of the physics shrinks the loopiness from gaga bat-shit Deepak Chopra down to real quantum size.

The first tripping stone is highlighted by Schrodinger’s Cat, as I’ve mentioned previously. In Schrodinger’s Cat, the way the thought experiment is most frequently constructed, the idea of quantum superposition is imposed on states of “Life” and “Death.” A quantum mechanical event creates a superposition of Life and Death that is not resolved until the box is opened and one state is discovered to dominate. This is flawed because Life and Death are not eigenstates! I’ve said it elsewhere and I’ll repeat it as many times as necessary. There are plenty of brain-dead people whose bodies are still alive. The surface of your skin is all dead, but the basement layer is alive. Your blood cells live three days, and then die… but you do not! Death and Life in the biological sense are very complicated states of being that require a huge number of parameters to define. This is in contrast with an eigenstate which literally is defined by requiring only one number to describe it, the eigenvalue. If you know the eigenvalue of a nondegenerate eigenstate, you know literally everything there is to know about the eigenstate –end of story! I won’t talk about degeneracy because that muddies the water without actually violating the point.

Quantum mechanical things are objects stripped down to such a degree of nakedness that they are simple in a very profound way. For a single quantum mechanical degree of freedom, if you have an eigenvalue to define it, there is nothing else to know about that state. One number tells you everything! For a half-spin magnetic moment, it can exist in exactly two possible eigenstates, either parallel or antiparallel. Those two states can be used together to describe everything that spin can ever do. By the nature of the object, you can’t find it in any other disposition, except parallel or antiparallel… it won’t wander off into some undefined other state because its entire reality is to be pointing in some direction with respect to an external magnetic field… meaning that it can only ever be found as some combination of the two basic eigenstates. There is not another state of being for it. There is no possible “comatose and brain-dead but still breathing” other state.

This is what it means to be simple. We humans do not live where we can ever witness things that are that simple.

The second great tripping stone people never quite seem to understand about quantum mechanics is exactly what it means to have the system ‘enclosed by a box’ prior to observation. In Schrodinger’s Cat, your intuition is lead to think that we’re talking about a paper box closed by packing tape and that the obstruction of our line of vision by the box lid is enough to constitute “closed.” This is not the case… quantum mechanical entities are a combination of so infinitesimal or so low in energy that an ‘observation’ literally usually means nothing more than bouncing a single corpuscle of light off of it. An upshot of this is that as far as the object is concerned, the ‘observer’ is not really different from the rest of the universe. ‘Closed’ in the sense of a quantum mechanical ‘box’ is the state where information is not being exchanged between the rest of the universe and our quantum mechanical system.

Now, that’s closed!

If a simple system which is so simple that it can’t occupy a huge menu of states is allowed to evolve where it is not in contact with the rest of the universe, can you expect to see anything in that system different from what’s already there? One single number is all that’s needed to define what the system is doing behind that closed door!

The third great tripping stone is decoherence. Decoherence is when the universe slips between the observer and the quantum system and talks to it behind our backs. Decoherence is why quantum computers are difficult to build out of entangled quantum states. So the universe fires a photon into or pulls a photon out of our quantum mechanical system, and suddenly the system doesn’t give the entangled answers we thought that it should anymore. Naturally: information moved around. That is what the universe does.

With these several realizations, while it may still not be very intuitive, the magic of entanglement is tempered by the limits of the observation. You will not find a way to argue that ‘people’ are entangled, for instance, because they lack this degree of utter simplicity and identicalness.

One example of an entangled state is a spin singlet state with angular momentum equal to zero. This is simply two spin one-half systems added together in such a way that their spins cancel each other out. Preparing the state gives you two spins that are not merely in superposition but are entangled together by the spin zero singlet. You could take these objects and separate them from one another and then examine them apart. If the universe has not caused the entanglement to decohere, these spins are so simple and identical that they can both only occupy expected eigenstates. They evolve in exactly the same manner since they are identical, but the overarching requirement –if decoherence has not taken place and scrambled things up– is that they must continue to be a net spin-zero state. Whatever else they do, they can’t migrate away from the prepared state behind closed doors simply because entropy here is meaningless. If information is not exchanged externally, any communication by photons between the members of the singlet can only ever still produce the spin singlet.

If you then take one of those spins and determine its eigenstate, you find that it is either the parallel or antiparallel state. Entanglement then requires the partner, separated from it no matter how far, to be in the opposite state. They can’t evolve away from that.

What makes this so brain bending is that the Schrodinger equation can tell you exactly how the entangled state evolves as long as the box remains unopened (that is that the universe has not traded information with the quantum mechanical degree of freedom). There is some point in time when you have a high probability of finding one spin ‘up’ while the other is ‘down,’ and the probability switches back and forth over time as the wave function evolves. When you make the observation to find that one spin is up, the probability distribution for the partner ceases to change and it always ends up being down. After you bounce a photon off of it, that’s it, it’s done… the probability distribution for the ‘down’ particle only ever ends up ‘down.’

This is what they mean by ‘non-locality.’ That you can separate the entangled states by a great distance and still see this effect of where one entangled spin ‘knows’ that the other has decided to be in a particular state. ‘Knowledge’ of the collapse of the state moves between the spins faster than light can travel, apparently.

From this arises heady ideas that maybe this can be the basis of a faster-than-light communication system: like you can tap out Morse code by flipping entangled spins like a light switch.

Still, what information are we asking for?

The fundamental problem is that when you make the entangled state, you can’t set a phase which can tell you  which partner starts out ‘up’ and which starts out ‘down.’ They are in a superposition of both states and the jig is up if you stop to see which is which. One is up and one is down in order to be the singlet state, but you can’t set which. You make a couplet that you can’t look at, by definition! The wave function evolves without there being any way of knowing. When you stop and look at them, you get one up and one down, but no way of being able to say “that one was supposed to be ‘up’ and the other ‘down.'”

You can argue that they started out exactly as they ended up on only a single trial. As I understand it, the only way to know about entanglement is literally by running the experiment enough times to know about the statistical distributions of the outcome, that ‘up’ and ‘down’ are correlated. If you’re separated by light years, one guy finds that his partner particle is ‘up’… he can’t know that the other guy looked at his particle three days ago to find ‘down’ and was expecting the answer in the other party’s hands to be ‘up.’ So much for flipping a spin like a switch and sending message! When was it that the identities of ‘up’ and ‘down’ were even picked?

But these things are very simple, uncomplicated things! If neither party does anything to disrupt the closed box you started out with, you can argue that the choice of which particle ends with which spin was decided before they were ever separated from one another and that they have no need after the separation to be anything but very identical and so simple that you can’t find them in anything but two possible states. No ‘communication’ was necessary and the outcome observed was preordained to be observed. You didn’t look and can’t look, so you can’t know if they always would have given the same answer that they ultimately give. If the universe bumps into them before you can look, you scream ‘decoherence’ and any information preserved from the initial entanglement becomes unknowable. Without many trials, how do you ever even know with one glance if the particles decohered before you could look, or if a particle was still in coherence? That’s the issue with simple things that are in a probability distribution. Once you build up statistics, you see evidence that spins are correlated to a degree that requires an answer like quantum entanglement, but it’s hard to look at them beforehand and know what state they’re in –nay: by definition, it’s impossible. The entangled state gives you no way of knowing which is up or down, and that’s the point!

As such, being unable to pick a starting phase and biasing that one guy has ‘up’ and the other ‘down,’ there is no way to transmit information by looking –or not– at set times.

Since I’m not an experimentalist that works with entangled states, there is some chance that I’ve misunderstood something. In the middle of writing this post, I trolled around looking for information about how entanglement is examined in the lab. As far as I could tell, the information about entanglement is based upon statistics for the correlation of entangled states with each other. The statistics ultimately tell the story.

I won’t say that it isn’t magical. But, I feel that once you know the reality, the wide-eyed extravagance of articles like the one that spawned this post seem ignorant. It’s hard not to crawl through the comments section screaming at people “No, no, no! Dear God, no!”

So then, to take the bull by the horns, I made an earlier statement that I should follow up on explicitly. Why doesn’t entanglement violate relativity? The conventional answer is that the information about knowing of the wave function collapse is useless! The guy who looked first can’t tell the guy holding the other particle that he can look now. Even if the particles know that the wavefunction has collapsed, the parties holding those particles can’t be sure whether or not the state collapsed or decohered. Since the collapse can’t carry information from one party to the other, it doesn’t break relativity. That’s the standard physicist party line.

My own personal feeling is that it’s actually a bit stiffer than that. Once the collapse occurs, the particles in hand seem as if they’ve _always_ made the choice you finally learn them to contain. They don’t talk: it’s just the concrete substrate of reality determined before they’re separated. The on-line world talks about this in two ways: either information can be written backward in time (yeah, they do actually say that) or reality is so deterministic as to eliminate all free will: as if that the experiment you chose to carry out is foreordained at the time when the spin singlet is created, meaning that the particles know what answer they’ll give before you know that you’ve been predestined to ask.

This is not necessarily a favored interpretation. People don’t like the idea that free will doesn’t exist. I personally am not sure why it matters: life and death aren’t eigenstates, so why must free will exist? Was it necessary that your mind choose to be associated with your anus or tied to a substrate in the form of your brain? How many fundamental things about your existence do you inherit by birth which you don’t control? Would it really matter in your life if someone told you that you weren’t actually choosing any of it when there’s no way at all to tell the difference from if you were? Does this mean that Physics says that it can’t predict for you what direction your life will go, but that your path was inevitable before you were born?

At some level one must simply shrug. What I’m suggesting is not a nihilistic stance or that people should just give up because they have no say… I’m suggesting that, beyond the scope of your own life and existence, you are not in a position to make any claims about your own importance in the grand scheme of the universe. The wrr and tick of reality is not in human hands.

If you wish to know more about entanglement, the EPR paradox and this stuff about non-locality and realism, I would recommend learning something about Bell’s inequality.

Hydrogen atom radial equation

In between the Sakurai problems, I decided to tackle a small problem I set for myself.

The Sakurai quantum mechanics book is directed at about graduate student level, meaning that it explicitly overlooks problems that it deems too ‘undergraduate.’ When I started into the next problem in the chapter, which deals with the Wigner-Eckert relation, I decided to direct myself at a ‘lower level’ problem that demands practice from time to time. I worked in early January solving the angular component of the hydrogen atom by deriving the spherical harmonics and much of my play time since has been devoted to angular and angular momentum type problems. So, I decided it would be worth switching up a little and solving the radial portion of the hydrogen atom electron central force problem.

One of my teachers once suggested that deriving the hydrogen atom was a task that any devoted physicist should play with every other year or so. Why not, I figured; the radial solution is actually a bit more mind boggling to me than the angular parts because it requires some substitutions that are not very intuitive.

The hydrogen atom problem is a classic problem mainly because it’s one of the last exactly solvable quantum mechanics problems you ever encounter. After the hydrogen atom, the water gets deeper and the field starts to focus on tools that give insight without actually giving exact answers. The only atomic system that is exactly solvable is the hydrogen atom… even helium, with just one more electron, demands perturbation in some way. It isn’t exactly crippling to the field because the solutions to all the other atoms are basically variations of the hydrogen atom and all, with some adjustment, have hydrogenic geometry or are superpositions of hydrogen-like functions that are only modified to the extent necessary to make the energy levels match. Solving the hydrogen atom ends up giving profound insight to the structure of the periodic table of the elements, even if it doesn’t actually solve for all the atoms.

As implied above, I decided to do a simplified version of this problem, focusing only on the radial component. The work I did on the angular momentum eigenstates was not in context of the hydrogen electron wave function, but can be inserted in a neat cassette to avoid much of the brute labor of the hydrogen atom problem. The only additional work needed is solving the radial equation.

A starting point here is understanding spherical geometry as mediated by spherical polar coordinates.

A hydrogen atom, as we all know from the hard work of a legion of physicists coming into the turn of the century, is a combination of a single proton with a single electron. The proton has one indivisible positive charge while the electron has one indivisible negative charge. These two charges attract each other and the proton, being a couple thousand times more massive, pulls the electron to it. The electron falls in until the kinetic energy it gains forces it to have enough momentum to be unlocalized to a certain extent, as required by quantum mechanical uncertainty. The system might then radiate photons as the electron sorts itself into a stable orbiting state. The resting combination of proton and electron has neutral charge with the electron ‘distributed’ around the proton in a sort of cloud as determined by its wave-like properties.

The first approximation of the hydrogen atom is a structure called the Bohr model, proposed by Niels Bohr in 1913. The Bohr model features classical orbits for the electron around the nucleus, much like the moon circles the Earth.


This image, from, is a crude example of a Bohr atom. The Bohr atom is perhaps the most common image of atoms in popular culture, even if it isn’t correct. Note that the creators of this cartoon didn’t have the wherewithall to make a ‘right’ atom, giving the nucleus four plus charges and the shell three minus… this would be a positively charged ion of Beryllium. Further, the electrons are not stacked into a decent representation for the actual structure: cyclic orbitals would be P-orbitals or above, where Beryllium has only S-orbitals for its ground state, which possess either no orbital angular momentum, or angular momentum without any defined direction. But, it’s a popular cartoon. Hard to sweat the small stuff.

The Bohr model grew from the notion of the photon as a discrete particle, where Bohr postulated that the only allowed stable orbits for the electron circling the nucleus is at integer quantities of angular momentum delivered by single photons… as quantized by Planck’s constant. ‘Quantized’ is a word invoked to mean ‘discrete quantities’ and comes back to that pesky little feature Deepak Chopra always ignores: the first thing we ever knew about quantum mechanics was Planck’s constant –and freaking hell is Planck’s constant small! ‘Quantization’ is the act of parsing into discrete ‘quantized’ states and is the word root which loaned the physics field its name: Quantum Mechanics. ‘Quantum Mechanics’ means ‘the mechanics of quantization.’

Quantum mechanics, as it has evolved, approaches problems like the hydrogen atom using descriptions of energy. In the classical sense, an electron orbiting a proton has some energy describing its kinetic motion, its kinetic energy, and some additional energy describing the interaction between the two masses, usually as a potential source of more kinetic energy, called a potential energy. If nothing interacts from the outside, the closed system has a non-varying total energy which is the sum of the kinetic and potential energies. Quantum mechanics evolved these ideas away from their original roots using a version of Hamiltonian formalism. Hamiltonian formalism, as it appears in quantum, is a way to merely sum up kinetic and potential energies as a function of position and momentum –this becomes complicated in Quantum because of the restriction that position and momentum cannot be simultaneously known to arbitrary precision. But, Schrodinger’s equation actually just boils down to a statement of kinetic energy plus potential energy.

Here is a quick demonstration of how to get from a statement of total energy to the Schrodinger equation:

5-12-16 schrodinger

After ‘therefore,’ I’ve simply multiplied in from the right with a wave function to make this an operator equation. The first term on the left is kinetic energy in terms of momentum while the second term is the Gaussian CGS form of potential energy for the electrical central force problem (for Gaussian CGS, the constants of permittivity and permeability are swept under the rug by collecting them into the speed of light and usually a constant of light speed appears with magnetic fields… here, the charge is in statcoulombs, which take coulombs and wrap in a scaling constant of 4*Pi.) When you convert momentum into its position space representation, you get Schrodinger’s time independent equation for an electron under a central force potential. The potential, which depends on the positional expression of ‘radius,’ has a negative sign to make it an attractive force, much like gravity.

Now, the interaction between a proton and an electron is a central force interaction, which means that the radius term could actually be pointed in any direction. Radius would be some complicated combination of x, y and z. But, because the central force problem is spherically symmetric, if we could move out of Cartesian coordinates and into spherical polar, we get a huge simplification of the math. The inverted triangle that I wrote for the representation of momentum is a three dimensional operator called the Laplace operator, or ‘double del.’ Picking the form of del ends up casting the dimensional symmetry of the differential equation… as written above, it could be Cartesian or spherical polar or cylindrical, or anything else.

A small exercise I sometimes put myself through is defining the structure of del. The easiest way that I know to do this is to pull apart the divergence theory of vector calculus in Spherical polar geometry, which means defining a differential volume and differential surfaces.

5-12-16 central force 2

Well, that turned out a little neater than my usual meandering crud.

This little bit of math is defining the geometry of the coordinate variables in spherical polar coordinates. You can see the spherical polar coordinates in the Cartesian coordinate frame and they consist of a radial distance from the origin and two angles, Phi and Theta, that act at 90 degrees from each other. If you pick a constant radius in spherical polar space, you get a spherical surface where lines of constant Phi and Theta create longitude and latitude lines, respectively, making a globe! You can establish a right handed coordinate system in spherical polar space by picking a point and considering it to be locally Cartesian… the three dimensions at this point are labeled as shown, along the outward radius and in the directions in which each of the angles increases.

If you were to consider an infinitesimal volume of these perpendicular dimensions, at this locally cartesian point, it would be a volume that ‘approaches’ cubic. But then, that’s the key to calculus: recognizing that 99.999999 effectively approaches 100. So then, this framework allows you to define the calculus occurring in spherical polar space. The integral performed along Theta, Phi and Rho would be adding up tiny cubical elements of volume welded together spherically, while the derivative would be with respect to each dimension of length as locally defined. The scaling values appear because I needed to convert differentials of angle into linear length in order to calculate volume, which can be accomplished by using the definition of the radian angle, which is arc length per radius –a curve is effectively linear when an arc becomes so tiny as to be negligible when considering the edges of an infinitesimal cube, like thinking about the curvature of the Earth effecting the flatness of the sidewalk outside your house.

The divergence operation uses Green’s formulas to say that a volume integral of divergence relates to a surface integral of flux wrapping across the surface of that same volume… and then you simply chase the constants. All that I do to find the divergence differential expression is to take the full integral and remove the infinite sum so that I’m basically doing algebra on the infinitesmal pieces, then literally divide across by the volume element and cancel the appropriate differentials. There are three possible area integrals because the normal vector is in three possible directions, one each for Rho, Theta and Phi.

The structure becomes a derivative if the volume is in the denominator because volume has one greater dimension than any possible area, where the derivative is with respect to the dimension of volume that doesn’t cancel out when you divide against the areas. If a scaling variable used to convert theta or phi into a length is dependent on the dimension of the differential left in the denominator, it can’t pass out of the derivative and remains inside at completion. The form of the divergence operation on a random vector field appears in the last line above. The value produced by divergence is a scalar quantity with no direction which could be said to reflect the ‘poofiness’ of a vector field at any given point in the space where you’re working.

I then continued by defining a gradient.

5-12-16 central force 1

Gradient is basically an opposite operation from divergence. Divergence creates a scalar from a vector which represents the intensity of ‘divergence’ at some point in a smooth function defined across all of space. Gradient, on the other hand, creates a vector field out of a scalar function, where the vectors point in the dimensional direction where the function tends to be increasing.

This is kind of opaque. One way to think about this is to think of a hill poking out of a two dimensional plane. A scalar function defines the topography of the hill… it says simply that at some pair of coordinates in a plane, the geography has an altitude. The gradient operation would take that topography map and give you a vector field which has a vector at every location that points in the direction toward which the altitude is increasing at that location. Divergence then goes backward from this, after a fashion: it takes a vector map and coverts it into a map which says ‘strength of change’ at every location. This last is not ‘altitude’ per se, but more like ‘rate at which altitude is changing’ at a given point.

The Laplace operator combines gradient with divergence as literally the divergence of a gradient, denoted as ‘double del,’ the upside-down triangle squared.

In the last line, I’ve simply taken the Laplace operator in spherical polar coordinates and dropped it into its rightful spot in Schrodinger’s equation as shown far above. Here, the wave equation, called Psi, is a density function defined in spherical polar space, varying along the radius (Rho) and the angles Theta and Phi (the so-called ‘solid angle’). Welcome to greek word salad…

What I’ve produced is an explicit form for Schrodinger’s equation with a coordinate set that is conducive to the problem. This differential equation is a multivariate second order partial differential equation. You have to solve this by separation of variables.

Having defined the hydrogen atom Schrodinger equation, I now switch to the more simple ‘radial only’ problem that I originally hinted at. Here’s how you cut out the angular parts:

5-12-16 radial schrodinger equation

You just recognize that the second and third differential terms are collectively the square of the total angular momentum and then use the relevant eigenvalue equation to remove it.

The L^2 operator comes out of the kinetic energy contained in the electron going ‘around.’ For the sake of consistency, it’s worth noting that the Hamiltonian for the full hydrogen atom contains a term for the kinetic energy of the proton and that the variable Rho refers to the distance between the electron and proton… in its right form, the ‘m’ given above is actually the reduced mass of that system and not directly the mass of the electron, which gives us a system where the electron is actually orbiting the center of mass, not the proton.

Starting on this problem, it’s convenient to recognize that the Psi wave function is a product of a Ylm (angular wave function) with a Radial function. I started by dividing out the Ylm and losing it. Psi basically just becomes R.

5-13-16 radial equation 1

The first thing to do is take out the units. There is a lot of extra crap floating around in this differential equation that will obscure the structure of the problem. First, take the energy ‘E’ down into the denominator to consolidate the units, then make a substitution that hides the length unit by setting it to ‘one’. This makes Rho a multiple of ‘r’ involving energy. The ‘8’ wedged in here is crazily counter intuitive at this point, but makes the quantization work in the method I’ve chosen! I’ll point out the use when I reach it. At the last line, I substitute for Rho and make a bunch of cancellations. Also, in that last line, there’s an “= R” which fell off the side of the picture –I assure you it’s there, it just didn’t get photographed.

After you clean everything up and bringing the R over from the behind the equals sign, the differential equation is a little simpler…

5-13-16 radial equation 2

The ‘P’ and ‘Q’ are quick substitutions made so that I don’t have to work as hard doing all this math; they are important later, but they just need to be simple to use at the moment. I also make a substitution for R, by saying that R = U/r. This converts the problem from radial probability into probability per unit radius. The advantage is that it lets me break up the complicated differential expression at the beginning of the equation.

The next part is to analyze the ‘asymptotic behavior’ of the differential equation. This is simply to look at what terms become important as the radius variable grows very big or very small. In this case, if radius gets very big, certain terms become small before others. If I can consider the solution U to be a separable composition of parts that solve different elements of this equation, I can create a further simplification.

5-13-16 asymptotic correction

If you consider the situation where r is very very big, the two terms in this equation which are 1/r or 1/r^2 tend to shrink essentially to zero, meaning that they have no impact on the solution at big radii. This gives you a very simple differential equation at big radii, as written at right, which is solved by a simple exponential with either positive or negative roots. I discard the positive root solution because I know that the wave equation must suppress to zero as r goes far away and because the positive exponential will tend to explode, becoming bigger the further you get from the proton –this situation would make no physical sense because we know the proton and electron to be attractive to one another and solutions that have them favor being separated don’t match the boundaries of the problem. Differential equations are frequently like this: they have multiple solutions which fit, but only certain solutions that can be correct for a given situation –doing derivatives loses information, meaning that multiple equations can give the same derivative and in going backward, you have to cope with this loss of information. The modification I made allows me to write U as a portion that’s an unknown function of radius and a second portion that fits as a negative exponent. Hidden here is a second route to the same solution of this problem… if I considered the asymptotic behavior at small radii. I did not utilize the second asymptotic condition.

I just need now to find a way to work out the identity of the rest of this function. I substitute the U back in with its new exponentially augmented form…

5-13-16 Froebenius

With the new version of U, the differential equation rearranges to give a refined set of differentials. I then divide out the exponential so that I don’t have it cluttering things up. All this jiggering about has basically reduced the original differential equation to a skin and bones that still hasn’t quite come apart. The next technique that I apply is the Frobenius method. This technique is to guess that the differential equation can be solved by some infinite power series where the coefficients of each power of radius control how much a particular power shows up in the solution. It’s basically just saying “What if my solution is some polynomial expression Ar^2 -Br +C,” where I can include as many ‘r’s as I want. This can be very convenient because the calculus of polynomials is so easy. In the ‘sum,’ the variable n just identifies where you are in the series, whether at n=0, which just sets r to 1, or n=1000, which has a power of r^1000. In this particular case, I’ve learned that the n=0 term can actually be excluded because of boundary conditions since the probability per unit radius will need to go to zero at the origin (at the proton), and since the radius invariant term can’t do that, you need to leave it out… I didn’t think of that as I was originally working the problem, but it gets excluded anyway for a second reason that I will outline later.

The advantage of Frobenius may not be apparent right away, but it lets you reconstruct the differential equation in terms of the power series. I plug in the sum wherever the ‘A’ appears and work the derivatives. This relates different powers of r to different A coefficients. I also pull the 1/r and 1/r^2 into their respective sums to the same affect. Then, you rewrite two of the sums by advancing the coefficient indices and rewriting the labels, which allows all the powers of r to be the same power, which can be consolidated all under the same sum by omitting coefficients that are known to be zero. This has the effect of saying that the differential equation is now identically repeated in every term of the sum, letting you work with only one.

The result is a recurrence relation. For the power series to be a solution to the given differential equation, each coefficient is related to the one previous by a consistent expression. The existence of the recurrence relation allows you to construct a power series where you need only define one coefficient to immediately set all the rest. After all those turns and twists, this is a solution to the radial differential equation, but not in closed form.

Screwing around with all this math involved a ton of substitutions and a great deal of recasting the problem. That’s part of why solving the radial equation is challenging. Here is a collection of all the important substitutions made…

Collecting solution

As you can see, there is layer on layer on layer of substitution here. Further, you may not realize it yet, but something rather amazing happened with that number Q.

Quantize radial equation

If you set Q/4 = -n, the recurrence relation which generates the power series solution for the radial wave function cuts off the sequence of coefficients with a zero. This gives a choice for cutting off the power series after only a few terms instead of including the infinite number of possible powers, where you can choose how many terms are included! Suddenly, the sum drops into a closed form and reveals an infinite family of solutions that depend on the ‘n’ chosen as to cut off. Further, Q was originally defined as a function of energy… if you substitute in that definition and solve for ‘E,’ you get an energy dependent on ‘n’. These are the allowed orbital energies for the hydrogen atom.

This is an example of Quantization!

Having just quantized the radial wave function of the hydrogen atom, you may want to sit back and smoke a cigarette (if you’re into that sort of thing).

It’s opaque and particular to this strategy, but the ‘8’ I chose to add way back in that first substitution that converts Rho into r came into play right here. As it turns out, the 4 which resulted from pulling a 2 out of the square root twice canceled another 2 showing up during a derivative done a few dozen lines later and had the effect of keeping a 2 from showing up with the ‘n’ on top of the recurrence relation… allowing the solutions to be successive integers in the power series instead of every other integer. This is something you cannot see ahead, but has a profound, Rube Goldbergian effect way down the line. I had to crash into the extra two while doing the problem to realize it might be needed.

At this point, I’ve looked at a few books to try to validate my method and I’ve found three different ways to approach this problem, all producing equivalent results. This is only one way.

The recurrence relation also gives a second very important outcome:

n to l relation

The energy quantum number must be bigger than the angular momentum quantum number. ‘n’ must always be bigger than ‘l’ by at least 1. And secondarily, and this is really important, the unprimed n must also always be bigger than ‘l.’ This gives:

n’ = n > l

This constrains which powers of n can be added in the series solution. You can’t just start blindly at the zero order power; ‘n’ must be bigger than ‘l’ so that it never equals ‘l’ in the denominator and the primed number is always bigger too. If ‘l’ and ‘n’ are ever equal, you get an undefined term. One might argue that maybe you can include negative powers of n, but these will produce terms that are 1/r, which are asymptotic at the origin and blow up when the radius is small, even though we know from the boundary conditions that the probability must go to zero at the origin. There is therefore a small window of powers that can be included in the sum, going between n = l+1 and n = n’.

I spent some significant effort thinking about this point as I worked the radial problem this time; for whatever reason, it has always been hazy in my head which powers of the sum are allowed and how the energy and angular momentum quantum numbers constrained them. The radial problem can sometimes be an afterthought next to the intricacy of the angular momentum problem, but it is no less important.

For all of this, I’ve more or less just told you the ingredients needed to construct the radial wave functions. There is a big amount of back substitution and then you must work the recurrence relation while obeying the quantization conditions I’ve just detailed.

constructing solution

A general form for the radial wave equations appears at the lower right, fabricated from the back-substitutions. The powers of ‘r’ in the series solution must be replaced with the original form of ‘rho’ which now includes a constant involving mass, charge and Plank’s constant which I’ve dubbed the Bohr radius. The Bohr radius ao is a relic of the old Bohr atom model that I started off talking about and it’s used as the scale length for the modern version of the atom. The wave function, as you can see, ends up being a polynomial in radius multiplied by an exponential, where the polynomial is further multiplied by a single 1/radius term and includes terms that are powers of radial distance between l+1, where l is the angular momentum quantum number, and n’, the energy quantum number.

Here is how you construct a specific hydrogen atom orbital from all the gobbledigook written above. This is the simplest orbital, the S-orbital, where the energy quantum number is 1 and the angular momentum is 0. This uses the Y00 spherical harmonic, the simplest spherical harmonic, which more or less just says that the wave function does not vary across any angle, making it completely spherically symmetric.

Normalized S orbital

The ‘100’ attached in subscript to the Psi wave function is a physicist shorthand for representing the hydrogen atom wave functions: these subscripts are ‘nlm,’ the three quantum numbers that define the orbital, which are n=1, l=0 and m=0 in this case. All I’ve done to produce the final wave function is take my prescription from before and use it to construct one of an infinite series of possible solutions. I then perform the typical Quantum Mechanics trick of making it a probability distribution by normalizing it. The process of normalization is just to make certain that the value ‘under the curve’ contained by the square of the wave function, counted up across all of space in the integral, is 1. This way, you have a 100% chance of finding the particle somewhere in space as defined by the probability distribution of the wave function.

You can use the wave function to ask questions about the distribution of the electron in space around the proton –for instance, what’s the average orbital radius of the electron? You just look for the expectation value of the radius using the wave function probability distribution:

Average radius

For the hydrogen atom ground state, which is the lowest energy state for a 1 electron, 1 proton atom, the electron is distributed, on average, about 1 and a half Bohr radii from the nucleus. Bohr radius is about 0.52 angstrom (1×10^-10 meters), which means that the electron is on average distributed 0.78 angstroms from the nucleus.

(special note 8-2-17: If you’ve read my recent post on parity symmetry, you may be wondering why this situation doesn’t break parity. Average position can never be reported as anything other than zero for a pure eigenstate–and yet I’ve reported a positionally related average value other than zero right here. The reason this doesn’t break parity symmetry is because the radial distance is only fundamentally defined over “half” of space to begin with, from a radius of zero to a radius of infinity and with no respect for a direction from the origin. In asking “What’s average radius?” I’m not asking “What’s the average position?” Another way to look at this is that the radius operator Rho is a parity symmetric operator since it doesn’t reverse under parity transformation and it can connect states that have the same parity, allowing radial expectation values to be non-zero.)

Right now, this is all very abstract and mathematical, so I’ll jump into the more concrete by including some pictures. Here is a 3D density plot of the wave function performed using Mathematica.

S-orbital density

Definitely anticlimactic and a little bit blah, but this is the ground state wave function. We know it doesn’t vary in any angle, so it has to be spherically symmetric. The axes are distance in units of Bohr’s radius. One thing I can do to make it a little more interesting is to take a knife to it and chop it in half.


This is just the same thing bisected. The legend at left just shows the intensity of the wave function as represented in color.

As you can see, this is a far cry from the atomic model depicted in cartoon far above.

For the moment, I’m going to hang up this particular blog post. This took quite a long time to construct. Some of the higher energy, larger angular momentum hydrogenic wave functions start looking somewhat crazy and more beautiful, but I really just had it in mind to show the math which produces them. I may produce another post containing a few of them as I have time to work them out and render images of them. If the savvy reader so desires, the prescriptions given here can generate any hydrogenic wave function you like… just refer back to my Ylm post where I talk some about the spherical harmonics, or by referring directly to the Ylm tables in wikipedia, which is a good, complete online source of them anyway.


Because I couldn’t leave it well enough alone, I decided to do images of one more hydrogen atom wave function. This orbital is 210, the P-orbital. I won’t show the equation form of this, but I did calculate it by hand before turning it over to Mathematica. In Mathematica, I’m not showing directly the wave function this time because the density plot doesn’t make clear intuitive sense, but I’m putting up the probability densities (which is the wave function squared).

P-orbital probabiltiy density

Mr. Peanut is the P-orbital. Here, angular momentum lies somewhere in the x-y plane since the z axis angular momentum eigenstate is zero. You can kind of think of it as a propeller where you don’t quite know which direction the axle is pointed.

Here’s a bisection of the same density map, along the long axis.

P-orbital probability density bisect

Edit 5-18-16

I keep finding interesting structures here. Since I was just sitting on all the necessary mathematical structures for hydrogen wave function 21-1 (no work needed, it was all in my notebook already), I simply plugged it into mathematica to see what the density plot would produce. The first image, where the box size was a little small, was perhaps the most striking of what I’ve seen thus far…

orbital21-1 squared

I knew basically that I was going to find a donut, but it’s oddly beautiful seen with the outsides peeled off. Here’s more of 21-1…


The donut turned out to be way more interesting than I thought. In this case, the angular momentum is pointing down the Z-axis since the Z-axis eigenstate is -1. This orbital shape is most similar qualitatively to the orbits depicted in the original Bohr atom model with an electron density that is known to be ‘circulating’ clockwise primarily within the donut. This particular state is almost the definition of a magnetic dipole.

A Spherical Tensor Problem

Since last I wrote about it, my continued sojourn through Sakurai has brought me back to spherical tensors, a topic I didn’t well understand when last I saw it. The problem in question is Sakurai 3.21. We will get to this problem shortly…

I’ve been thinking about how best to include math on this blog. The fact of the matter is that it isn’t easy to do very fast. It looks awful if I photograph pages from my notebook, but it takes forever if I use a word processor to make it nice and neat and presentable. I’ve tried a stylus in OneNote before, but I don’t very much like the feeling compared to working on paper.

After my tirade the other day about the Smith siblings, I’ve been thinking again about everything I wanted this blog to be. It isn’t hard to find superficial level explanations of most of physics, but I also don’t want this to read like a textbook. If Willow Smith hosts ‘underground quantum mechanics teachings,’ I actually honestly envisioned this effort on my part as a sort of underground teaching –regardless of the nonexistent audience. What better way to put it. I didn’t want to put in pure math, at least not quite; I wanted to present here what happens in my head while I’m working with the math. How exactly do you do that?

Here’s an image of the mythological notebook where all my practicing and playing takes place:

4-16-16 Notebook image

I’ve never been neat and pretty while working with problems, but all that scratching doesn’t look like scratching to me while I’m working with it. It’s almost indescribable. I could shovel metaphors on top of it or take pictures of beautiful things and call that other thing ‘what I see.’ But there isn’t anything like it. If you’ve spent time on it yourself, maybe you know. It’s addictive. It’s conceptual tourism in the purest form, standing on the edge of the Grand Canyon looking out, then climbing down inside, feeling the crags of stone on my fingertips as I pass down toward where the river flows. It’s tourism in a way, going to a place that isn’t a place, not necessarily pushing back the frontiers since people have been there before, but climbing to the top of a mountain that nobody ever just visits in daily life. You can’t simply read it and you don’t just walk there.

The pages pictured above are of my efforts to derive a formula from Schwinger’s harmonic oscillator representation to produce the rotation matrices for any value of angular momentum. Writing the words will mean nothing to practically anybody who reads this. But what do I do to make it genuine? How do you create a travelogue for a landscape of mathematical ideas?

For the moment, at least, I hope you will forgive me. I’m going to use images of my notebook in all its messy glory.

Where we started in this post was mentioning Spherical Tensors. I hit this topic again while considering Sakurai problem 3.21. ‘Tensor’ is admittedly a very cool word. In Brandon Sanderson’s “Steelheart,” Tensors are a special tool that lets people use magic power to scissor through solid material.

For all the coolness of the word, what are Tensors really?

In the most general sense, a tensor is a sort of container. Here is a very simple tensor:


This construct holds things. Computer programmers call them Arrays sometimes, but here it’s just a very simple container. The subscript ‘i’ could stand for anything. If you make ‘i’ be 1,2 or 3, this tensor can contain three things. I could make it be a vector in 3 dimensions, describing something as simple as position.

In the problem I’m going to present, you have to think twice about what ‘tensor’ means in order to drag out the idea of a ‘spherical’ tensor.

Here is Sakurai 3.21 as written in my notebook:

Sakurai 3.21Omitting the |j,m> ket at the bottom, Sakurai 3.21 is innocuous enough. You’re just asked to evaluate two sums between parts a.) and b.). No problem right? Just count some stuff and you’re done! Trick is, what the hell are you trying to count?

Contrary to my using the symbol ‘Ai’ above to sneak in the meaning of ‘love,’ the dj here do not play in dance clubs, even if they are spinning like a turntable! These ‘d’s are symbols for a rotation operation which can transform a state as if rotating it by an angle (here angle β). Each ‘d’ transforms a state with a particular z-axis angular momentum, labeled by ‘m’, to a second state with a different label, where the angle between the two states is a rotation of β around the y-axis. Get all that? You’ve got a spinning object and you want to alter the axis of the spin by an angle β. Literally you’re spinning a spin! That’s a headache, I know.

Within quantum mechanics, you can know only certain things about the rotation of an object, but not really know others. This is captured in the label ‘j’. ‘j’ describes the total angular momentum contained in an object; literally how much it’s spinning. This is distinct from ‘m’ which describes the rotation around a particular axis. Together, ‘m’ and ‘j’ encapsulate all of the knowable rotational qualities of our quantum mechanical object, where you can know it’s rotating a certain amount and that some of that rotation is around a particular axis. The rest of the rotation is in some unknowable combination not along the axis of choice. This whole set of statements is good for both an object spinning and for an object revolving around another object, like a planet in orbit.

The weird trick that quantum mechanics plays is that only a certain number of rotational state are allowed for a particular state of total angular momentum; the more total angular momentum you have, the larger the library of rotational states you can select from. In the sum in the problem, you’re including all the possible states of z-axis angular momentum allowable by the particular total angular momentum. Simultaneous rotation around x and y-axis is knowable only to an extent depending on the magnitude of rotation about the z-axis (so says the Heisenberg Uncertainty Principle, in this case–but the problem doesn’t require that…).

Here is an example of how you ‘rotate a state’ in quantum mechanics. I expect that only readers familiar with the math will truly be able to follow, but it’s a straightforward application of an operator to carry out an operation at a symbolic level:

Rotating a state 4-20-16

All this shows is that a rotation operator ‘R’ works on one state to produce another. By the end of the derivation, operator R has been converted into a ‘dj’ like what I mentioned above. Each dj is a function of m and m” in a set of elements which can be written as a 2-dimensional matrix… dj is literally mapping the probability amplitude at m onto m”, which can be considered how you route one element of a 2-dimensional matrix into another based upon the operation of rotating the state. In this case, the example starts out without a representation, but ultimately shifts over to representing in a space of ‘j’ and ‘m.’ The final state can be regarded as a superposition of all the states in the set, as defined by the sum. In all of this, dj can be regarded as a tensor with three indices, j, m and m”, making it a 3-dimensional entity which  contains a variable number of elements depending on each level of j: dj is only the face-plate of that tensor, coughing up whatever is stored at the element indexed by a particular j, m and m”.

In problem 3.21, what you’re counting up is a series of objects that transform other objects as paired with whatever z-axis angular momentum they represent within the total angular momentum contained by the system. This collection of objects is closed, meaning that you can only transform among the objects in the set. If there were no weighting factor in the sum, the sum of these squared objects actually goes to ‘1’… the ‘d’ symbols become probability amplitudes when they’re squared and, for a closed set, you must have 100% probability of staying within that set. The headache in evaluating this sum, then, is dealing with the weighting factor, which is different for each element in the sum, particularly for whatever state they are ultimately supposed to rotate to.

My initial idea looking at this problem was that if I can calculate each ‘d,’ then I can just work the sum directly. Just square each ‘d’ and multiply it by the weighting factor and voila! There was no thought in my head about spherical tensors, despite the overwhelming weight of that hint following part b.)

Naively, this approach could work. You just need some way of calculating a generalized ‘d.’ This can be done using Schwinger’s simple harmonic oscillator model. All you need to do is rotate the double harmonic oscillator and then pick out the factor that appears in place of ‘d’ in the appropriate sum –an example of which can be seen in the rotation transformation above. Not hard, right?

A month ago, I would have agreed with you. I had spent only a little bit of time learning how the Schwinger model works and I thought, “Well, solve ‘d’ using the Schwinger method and then boom, we’re golden.” It didn’t seem too bad, except that days eventually converted themselves into weeks before I had a good enough understanding of the method to be able to crank out a ‘d.’ You can see one of my pages of work on this near the top of this post… there were factorials and sums everywhere. By the time I had it completely figured out –which I really don’t regret, by the way– I had actually pretty much forgotten why I went to all that trouble in the first place. My thesis here is that, yes, you can solve for each and every ‘d’ you may ever want using Schwinger’s method. On the other hand, when I came back to look at Sakurai 3.21, I realized that if I were to try to horsewhip a version of ‘d’ from the Schwinger method into that sum, I was probably never going to solve the problem. The formula to derive each ‘d’ is itself a big sum with a large number of working parts, the square of which would turn into a _really_ large number of moving parts. I know I’m not a computer and trying to go that way is begging for a trouble.

It was a bit of a letdown when I realized that I was on the wrong track. As a lesson, that happens to everyone: almost nobody gets it first shot. This should be an abstract lesson to many people: what you think is a truth isn’t always a truth, or necessarily the simplest path to a truth. I still expect that if you were a horrific glutton for punishment, you could work the problem the way I started out trying, but you would get old in the attempt.

I spent some introspective time reading Chapter 3 of Sakurai, looking at simple methods for obtaining the necessary ‘d’ matrix elements. Most of these can’t be used in the context of problem 3.21 because they are too specific. With half-integer j or j of 1, you can directly calculate rotation matrices, except that this is not a general solution. I had a feeling that you could suck the weighting factor of ‘m’ back into the square of the ‘d’ and use an eigenvalue equation to change the ‘m’ into the Jz operator, but I wasn’t completely sure what to do with it if I did. About a week ago, I started to look a bit more closely at the section outlining operator transformations using spherical tensor formalism. I had a feeling I could make something work in these new ideas, especially following that heavy-handed hint in part b.)

The spherical tensor formalism is very much like the Heisenberg picture; it enables one to rotate an operator using the same sorts of machineries that one might use to rotate a state. This, it turns out, is the necessary logical leap required by the problem. To be honest, I didn’t actually understand this while I was reading the math and trying to work through it. I only really understood very recently. Rotating the state is not the same as rotating operators. The math posted above is the rotation of a state.

As it turns out, with an operator written in a cartesian form, different parts will rotate differently from one another; you can’t just apply one rotation to the whole thing and expect the same operator back.

This becomes challenging because the angular momentum operators are usually written in a cartesian form and because operator transformations in quantum mechanics are usually handled as unitary transformations. Constructing a unitary transformation requires careful analysis of what can rotate and remain intact.

Here is a derivation which shows rotation converted into a unitary operation:

Rotation as a unitary transform 4-21-16

In this case, the rotation matrix ‘d’ has been replaced by a more general form. The script ‘D’ is generally used to represent a transformation involving all three Euler angles, whereas the original ‘d’ was a rotation only around the y-axis. In principle, this transformation can work for any reorientation. In this derivation, you start with a spherical harmonic and show, if you create a representation of something else with that spherical harmonic, that you can rotate that other object within the Ylm. In this derivation, the object being rotated is just a vector used to indicate direction, called ‘n’. The spherical harmonics have this incredible quality in that they are ready-made to describe spherical, angle-space objects and that they rotate naturally without distortion… if you want to rotate anything, writing it as an object which transforms like a spherical harmonic is definitely the best way to go.

In the last line of that derivation, the spherical harmonic containing the direction vector has been replaced with a construct labeled simply as ‘T’. T is a spherical tensor. This object contains whatever you put into it and resides in the description space of the spherical harmonics. It rotates like a spherical harmonic.

The last line of algebra contains another ramification that I think is interesting. In this math, for this particular case, the unitary transform of D*Object*D reduces to a simple linear transform D*Object.

This brings me roughly full circle: I’m back at spherical tensors.

A spherical tensor is a multi-dimensional object which sits in a space which uses the spherical harmonics as a descriptive basis set. Each index of the spherical tensor transforms like the Ylm that resides at that index location. In some ways, this looks very like a state function in spherical harmonic space, but it’s different since the object being represented is an operator native to that space rather than a state function. Operators and state functions must be treated differently in quantum mechanics because they are different. A state function is a nascent form of a probability distribution while an operator is an entity that can be used to manipulate that distribution in eigenvalue equations.

This may seem a non-sequitur, but I’ve just introduced you to a form of trans-dimensional travel. I’ve just shown you the gap for moving between a space involving the dimensions of length, width and depth into a space which replaces those descriptive commodities with angles. A being living in spherical harmonic space is a being constructed directly out of turns and rotations, containing nothing that we can directly witness as a physical volume. You will never find something so patently crazy in the best science fiction! Quantum mechanics is replete with real expressions for moving from one space to another.

The next great challenge of Sakurai 3.21 is learning how to convert a cartesian operator construct into a spherical one. You can put whatever you want into a spherical tensor, but this means figuring out how to transfer the meaning of the cartesian expression into the spherical expression. As far as I currently understand it, the operator can’t be directly applied while residing within the spherical tensor form –I screwed this problem up a number of times before I understood that. To make the problem work, you have to convert from cartesian objects into the spherical object, perform the rotation, then convert backward into the cartesian object in order to come up with the final expression. The spherical tensor forms of the operators end up being linear combinations of the cartesian forms.

Here is the template for using spherical harmonics to guide conversion of cartesian operators into spherical tensor components:

Conversion to spherical tensor 4-21-16

In this case, I’m converting the momentum operators into a spherical tensor. This requires only the rank 1 spherical harmonics. The spherical tensor of rank one is a three dimensional object with indices 1,0 and -1, which relate to the cartesian components of the momentum vector Jz, Jx and Jy as shown. For position, cosine = z/radius and the x and y conversions follow from that, given the relations above. Angular momentum needs no spatial component because of normalization in length, so z-axis angular momentum just converts directly into cosine.

As you can see, all the tensor does here is store things. In this case, the geometry of conversion between the spaces stores these things in such a way that they can be rotated with no effort.

Since I’ve slogged through the grist of the ideas needed to solve Sakurai 3.21, I can turn now to how I solved it. For all the rotation stuff that I’ve been talking about, there is one important, very easy technique for rotating spherical harmonics which is relevant to this particular problem. If you are rotating an m=0 state, of which there is only one in every rank of total angular momentum, the dj element is a spherical harmonic. No crazy Schwinger formulas, just bang, use the spherical harmonic. Further, both sections of problem 3.21 involve converting m into Jz and Jz converts to the m=0 element of the spherical tensor with nothing but a normalization (to see this, look at the conversion rules that I included above). This means that the unitary transform of Jz can be mediated either by rotating from any state into the m=0 state, or rotating m=0 toward any state, which lets the dj be a spherical harmonic in either direction.

Now, since part a.) is easy, here’s the solution to problem 3.21 part b.)

Sakurai 3.21 b1

I apologize here that the clarity of the images is not the best; the website downgraded the resolution. I included a restatement of problem 3.21 part b.) in the first line here and then began by expanding the absolute value and pulling the eigenvalue of m back into the expression so that I could recast it as operator Jz using an eigenvalue equation to give me Jz^2. Jz must then be manipulated to produce the spherical tensor, the process expanded below.

Sakurai 3.21 b2

Where I say “three meaningful terms,” I’m looking ahead to an outcome further along in the problem in order to avoid writing 6 extra terms from the multiplication that I don’t ultimately need. I do write my math exhaustively, but in this particular case, I know that any term that isn’t J0*J0, J1*J-1 or J-1*J1 will cancel out after the J+ and J- ladder operators have had their way. For anyone versed, J1 is directly the ladder operator J+ and J-1 is J-. If the m value doesn’t end up back where it started, with J+J- or J-J+ combinations, when you take the resulting expectation value, anything like <m|m+1> is zero. Knowing this a page in advance, I simply omitted writing all that math. I then worked out the two unique coefficients that show up in the sum of only three elements…

Sakurai 3.21 b3

In the middle of this last page, I converted the operators Jx and Jy into a combination of J^2 and Jz. The ladder operators composed of Jx and Jy served to strain out 2/3 of the mathematical extra and I more or less omitted writing all of that from the middle of the second page. After you’re back in the cartesian form, once you’ve made the rotation, which occurs once the sum has been expanded, there is no need to stay in terms of Jx and Jy because the system can’t be simultaneously expressed as eigen functions of Jx, Jy and Jz. You can have simultaneous eigen functions of only total angular momentum and one axis, typically chosen to be the z-axis. By converting to J^2 and Jz only, I get the option to use eigen values instead of operators, which is almost always where you want to end up in a quantum problem. This is why I started writing |m> as |j,m>… most of the time in this problem I only care about tracking the m values, but I understand from the very beginning of the problem that I have a j value hiding in there that I can use on choice.

One thing that eases your burden considerably in this problem is understanding how j compartmentalizes m values. As I mentioned before, each rank of j contains a small collection of m value eigenfunctions which only transform amongst themselves. Even though the problem is asking for a solution that is general to every j, by using transformations of the angular momentum operator, which is a rank 1 operator, I only needed the j=1 spherical harmonics to represent it. This allows me to work in a small space which can be general across all values of j. This is part of what makes the Schwinger approach to this problem so unwieldy; by trying to represent d for every j, I basically swelled the number of terms I was working with to infinity. You can work with situations like this, but it just gets too big too quickly in this case –I’m just not that smart.

It’s also possible to work omitting the normalization coefficients needed in the spherical harmonics, but do this with caution. It can be hard to tell which part of the coefficient is dedicated to flattening multiplicity and which is canceling out of the solid angle. In cases where terms are getting mixed, I hold onto normalization so that I know down the line whether or not all my 2s and -1 are going to turn out. I always screw things like this up, so I do my best to give myself whatever tools I can for figuring out where I’ve messed up arithmetic. I found an answer to this problem on-line which leaves cartesian indices on the transformations through the problem and completely omits the normalization… technically, this sort of solution is wrong and bypasses the mechanics. You can’t transform a cartesian tensor like a spherical tensor; getting yourself screwed up by missing the proper indices misses the math. How the guy hammered out the right answer from doing it so poorly makes no sense to me.

This problem took a considerable amount of work and thought. It may not show in the writing, but I had been thinking about it for weeks. One great difference between doing this for class and doing it on my own is that there is no time limit on completing it except for the admission of defeat. I never made that admission and I gradually became more and more clear on what to do in the problem. I had been thinking about it on and off so hard that it was losing me sleep and leaving me foggy headed on other daily tasks. It takes real work. Eventually, there was a morning while I was in the shower where I just saw it. Clear as day, the solution unfolded to me. I do my best thinking in the morning while taking my shower. Under some circumstances, the stress of this process can be soul-breaking. It can also be profoundly illuminating. Seeing through it can be addictive… but you must not give up when the going gets tough.