Derived in ten minutes while I was on the toilet:

Edit: 11-16-16

There was a slight error in the set-up of the center of mass calculation. Light appears to move an effective mass from -L/2 to L/2-l, the starting and stopping points of light. Here’s the reformulation to capture that.

Time spent on this derivation: ten minutes on the toilet last night and ten minutes before breakfast writing the correction (yeah, that’s what I get for going really fast).

Don’t thank me, this is Einstein’s calculation.

Edit: 11-18-17

As I’m still thinking about this post, I figure it might be beneficial to flesh out the reason that it was written. This post was used as a response to a comment on another blog… if you want to back up a statement you made about something and somebody is accusing you of not providing evidence, most people provide citations, net links, references, etc. In this particular case, the argument was about a piece of math and I was being accused of lying about said piece of math by someone who clearly likes to believe he knows everything without actually knowing practically anything. Yes, skeptics are guilty of Dunning-Kruger, just like everyone else (This is an unfair statement and I apologize for it.) What better way to slam the textbook in someone’s face than to actually work the problem? If you want the final word on what Einstein said about something, quote Einstein’s work! And so, a piece of Einstein’s work is posted above.

The argument in question started with a fellow suggesting to me that mass-energy equivalence can be derived but not proven with classical physics. I beg to differ; energy is a classical concept, from Thermo, E&M and classical mechanics… all three! You don’t need relativity or quantum mechanics to justify statements about how energy works; measurements of kinematics and force are sufficient to show that energy as a concept works. Mass-energy equivalence arose from Einstein’s notion that the newly completed classical field of electromagnetism must be consistent with the older fields of classical mechanics. The equation E=pc is not relativistic: it came directly out of electromagnetism (and, believe me, I’ve been through that calculation too because I didn’t believe it at first.) Imposing that these two fields must be cross-consistent is the origin of mass-energy equivalence…. light carries momentum (by Poynting’s vector and well defined in the electromagnetic stress-energy tensor) and light interacts with mass, therefore conservation of momentum (and consequently conservation of center of mass in absence of external forces) requires that light carry an equivalent of mass in order for forces to add up in a situation where light interacts with matter but no forces interact externally on the system comprised by the light and the matter. Mass-energy equivalence is required by this, no ifs, ands, buts or “yeah, but you didn’t proves…”

Einstein’s thought experiment validating this set-up is an exceptionally elegant one. It’s called “Einstein’s Box.” Everybody loves Schrodinger’s cat-in-a-box… well, Einstein had a box too and this box is older than Schrodinger’s. Einstein’s box is a closed box sitting out in space where it feels no external forces. A flash of light is emitted inside the box from one wall and travels across the box to strike the opposite wall. E&M states that light must carry momentum. If the system has no external forces acting on it, the emission of the light inside the box requires that momentum of the system be conserved, which requires that the box recoils with a momentum equal to that carried by the light, causing the box to move at some velocity consistent with the momentum carried by the light (which turns out to be directly proportional to the energy carried by that light as stated by E=pc). Net momentum of the system must remain zero by conservation of momentum. When the light travels across the box and collides with the opposite wall, the momentum of the light cancels the momentum of the box and the box stops moving. Thing about this is that center of mass, as a consequence of momentum conservation, could not have moved. No forces on the outside of the box.

Center of mass is a damn well classical concept, well worked out in the 1700s and 1800s… and since the box moved, the box’s center of mass moved! But no forces acted on the outside of the system, so the overall center of mass of the system could not have moved. This requires the light to have carried with it a value of mass, taken from the location where the light was emitted and deposited again at the location where the light was absorbed. But light is known not to carry mass since it is a wave-like solution of immaterial fields in the form of Maxwell’s equations. If you set up this situation and work through the calculation, Newtonian mechanics and electromagnetism –nothing more–, this turns out the classical requirement that energy and mass have an equivalence in the form of E=mc^2. No quantization or probability relations from quantum mechanics, no frame of reference shifting from relativity, not even delineating that light is some package of photons… this is purely classical. Moreover, energy is a tabular result to begin with: it is not something that is by itself ever directly observed and it must always be carried by something else, a field, a heat, a potential, a motion or what have you. The statement that this tabular relationship extends to something else that is technically only indirectly observed, mass, is a proof. And yes, mass is indirect since you can only know mass from weight, which is a force!

If you have concepts of weight, momentum and light together in the same model as expressed by classical physics, mass-energy equivalence is required for self-consistency.

Granted, special relativity quite naturally produces this result as well, but special relativity is not required to produce mass-energy equivalence. Had Einstein not discovered it, someone else damned well would’ve and it would not have required relativity to do –at all!

Now, the thing that doubly made me angry about this conversation is that it was with a fellow who absolutely craved physicist street cred: he name dropped Arxiv and seemed to want to chase around details. Sadly, his whole argument ultimately amounted to insulting someone and not backing up his ability to absolutely know what he was claiming to know. Does it matter that you don’t believe my statement if you aren’t competent to evaluate the field in question? Not at all: such a person has no place at the table to start with. This is why it’s possible for a Nobel Laureate to descend in to crankery… just because you have a big prize doesn’t mean you are always equally competent at everything! I’m guessing the guy was a surgeon given the ego and the blog, but if he was a physicist, I’m very disappointed. A physicist who doesn’t know Einstein’s box is a travesty. I’m not the greatest physicist that ever lived, but I work at it and I know what I’m talking about… where I have gaps, I do my best to admit it.

edit: 11-18-17

(Statement redacted. It was an unfairly insulting comment)

edit: 11-20-17

As this is still nagging at me, one further thought. What I consider the last statement of the conversation before it simply became obvious trolling, the fellow accused me of not including “a variable speed for light” in my calculation. In Einstein’s calculation, the speed of light is given the constant “c.” This is a constant which comes with a caveat; “c” is the speed of light when it is not passing through anything material, the speed of light in a vacuum. This distinction is important because light can travel at lower speeds when it’s passing through a material. This situation is well-handled by E&M and is considered a “solved effect” by Relativity, whose postulates include the explicit notion that E&M simply be true everywhere. The constant “c” is the maximum possible speed that light can travel, but it will travel at lower speeds in a medium with an index of refraction greater than 1, where permittivity and permeability might have values other than their vacuum values, which has the wonderful result of making lenses possible in glasses and microscopes. In my lavatory derivation above, a little screw up on my part is that I didn’t clobber the reader over the head with the constancy of the value “c,” I said “a box in zero gravity” and I said light travels at “c,” but I didn’t say “this is definitely all in a vacuum” which I probably should have. If index of refraction is “n”… the velocity of light in a medium with that refractive index is v = c/n. There are other ways to encode refractive index which allow for more sophisticated optical behaviors, but everything in that line is completely out of the pail for the argument in question, and drawing attention to it is simply chaff intended to shift the focus of the argument.

Light can travel at speeds lower than “c,” but “c” itself is so far found to be invariant. Moreover, the fact that light can travel at speeds other than “c” does not change the Einstein’s box derivation, which is set in explicit conditions where light would travel at “c.” Somebody who doesn’t know this isn’t a physicist (11-20-17: I’ll moderate this it’s unfair and was too angry.)

Also, as an aside, I mention above that Special Relativity can produce E = mc^2. Thinking about it, but not running through the calculations, I think this is actually backward; E=mc^2 is sort of needed first before it shows up in Special Rel. Einstein made some amazing leaps.

Edit: 11-20-17

As an added extra, here is a derivation of E=pc from the stress-energy and electromagnetic power continuity equations. These were written a few years ago, but I had the good sense to scan them:

EMC31 001

EMC32 001

EMC33 001

The E=pc derivation begins on the second page above. The first page is the end of the continuity equation derivation. I’ll neglect that. No relativity here, just pure E&M. There are a couple pieces in here that I don’t remember so well and I need to think about to decide if they’re correct. The first page is included to show clearly the relation between force and the stress-energy tensor divergence.

Edit: 11-21-17

I’ve spent some time thinking about the form Narad put forward in the comments.

Epc approximate

First of all, we have to be really sure of what is meant by “p” on the left side of this equation. My first reading of it was as “momentum,” but I’m realizing that it isn’t, and this may be leading to some misunderstandings about what is meant by E=pc. The thing in the middle is average poynting vector divided by speed of light… Poynting vector has units of Watts/meter^2 and speed of light has units of meters/sec, which works out to Newtons/m^2, or force per area, which is pressure, not momentum. The thing on the right is actually in units of energy… permittivity times peak E-field^2 over 2, which is just a form of electromagnetic energy, in units of Joules. For a literal reading of the equation above, unit analysis put me at momentum = pressure = energy, which is not right (apple can’t equal orange can’t equal pear). If I take “p” as pressure rather than momentum, the left side makes sense, but the right side still doesn’t quite work.

It’s a nice try, all the elements are there. It has energy and momentum can be massaged out of it. I think the route being taken here is to try to use the form of a plane wave to figure out the momentum based on the pressure and specifically for a plane wave form of the poynting vector, or else the peak E-field intensity wouldn’t be needed.

The approach in the E=pc derivation I posted above is really different. My starting point is with a classical structure called the Electromagnetic stress-energy tensor and with a second structure which is conservation of power given energy flux. (Wikipedia actually kind of pissed me off about this: they want to masturbate over the four-dimensional relativistic version, but wouldn’t provide me a clean on-line citation for the classical version shown above; the form given here is the same as it appears in Jackson E&M) The first equation is a consequence of the Lorentz force law (F = qE+qvxB) where the system has electromagnetic waves, but is sealed so that there is no net force… the equation says that the change in Poynting vector per unit time is equal to the divergence of the electromagnetic stress-energy tensor, all of which is in units of force or change of momentum with time. The second equation is a consequence of Power=current*voltage, believe it or not, and just says that the change in energy density in the system is equal to the divergence of the Poynting vector, all in units of power. These structures make no real initial assumptions about the form that the electromagnetic fields are taking, they speak only of change of momentum per time and change of energy density given energy flux and are derived directly from application of Maxwell’s laws.

The first step is to take the stress-energy continuity relation and to hold it as change in Poynting vector with time is equal to change in momentum density with time by direct application of Newtonian force. You end up with an expression that says that Poynting vector is equal to momentum density times speed of light squared.

The second step is to throw this Poynting vector relation into the power equation so that you get a relation that says that the momentum flux out of a volume of space is equal to the change of energy density with time. This gives you a “momentum current vector” equation, which is analogous to the relationship between electrical current “I” and current vector “J.”

I next establish a momentum current, basically just a beam of light with no specific frequency or field configuration. You could write this as white light in a Fourier composition. A set of very simple manipulations gets you to a relation that directly says that energy density is equal to momentum density times speed of light. Integrate out the density and you get E=pc directly. Please note, this set-up is explicitly agnostic on the idea of photons since it depends on a mixture of frequencies to produce a constant envelope of plane waves with constant momentum density distributed everywhere and therefore does not require quantum mechanics to work. I can’t claim this work is Einstein’s because I didn’t follow anyone to make it… this is me using Jacksonian E&M technique to prove E=pc for myself, all using classical physics.

With E=pc in hand by these means, the classical derivation of E=mc^2 is pretty much a shoe-in. Again, I used no quantum and no relativity. If I could do this, the geniuses at the turn of the century got it faster;-)

Edit: 11-22-17

I must’ve done something wrong with the Latex, it doesn’t seem to want to render in the body of my post; I’m still looking into whether I need to get the plugin…

Further, I figured out what was wrong with the unit analysis I did above… the right side of that equation is energy density (J/m^3) rather than energy (J)… and since J =N*m, J/m^3 is N/m^2…. the equation above is all in units of light pressure. To get to E = pc in approximate form in the plane wave, you just need to sub in the relation for momentum density per Poynting vector S = pc^2, then cancel the density by integrating over volume.

One additional thing about the Einstein’s box derivation that is important; it works in a classical framework. What I’ve provided above, then, is E=mc^2 as a classical equation, which is really torturing the point that it was “proven.” I’ve been thinking about whether or not I was doing this right since the whole discussion started and the derivation is only consistent from the standpoint that there are no effects included taking into account the potential relativistic characteristics of the box as it moves. I’m sorry about that, Narad. The derivation above would be insufficient from a modern physics standpoint because the box would undergo length contractions and dilations as it moves. To be perfectly honest, this nagged at me a tiny bit as I wrote the derivation, but maybe not as much as it should have… I drew the box as strictly “before” and “after” so that I ended up looking at the system only when it is located in the inertial frame of reference. That would call into question the nature of the boost pushing it into motion. I was assuming that the completely undisclosed relativistics located between the end-points were sufficient to conspire that the end-points be right! And, that’s an open end since length contraction would place the wall of the box in a different location depending on the frame… throwing off the whole calculation.

(For the people at home, here is something very important about how I designed to write this blog. I leave my edits visible so that the progression of my thinking is clear… one of the hardest, most human aspects of working in sciences is facing the fact that nobody is always right about everything. I think that being a good scientist is not about being right all the time, but about changing your mind when it’s important to do so. And, it’s about admitting when someone else was right, sometimes very publicly! Are you smart if you’re unwilling to abandon a sinking ship? I think not. Smart is being able to turn the steering wheel and to grow when its necessary to do so –especially when it effects your pride. I think this is the difference between arguing loudly and arguing productively.)

Here is the derivation converting the light pressure equation Narad offered into E=pc…

Epc approximate 2

Hopefully that ties up all the loose ends! (Don’t be surprised to see me back here playing with a relativistic E=mc^2 proof at some point.)


Magnets, how do they work? (part 1)

Subtitle: Basic derivation of Ampere’s Law from the Biot-Savart equation.

Know your meme.

It’s been a while since this became a thing, but I think it’s actually a really good question. Truly, the original meme exploded from an unlikely source who wanted to relish in appreciating those things that seem magical without really appreciating how mind-bending and thought-expanding the explanation to this seemingly earnest question actually is.

As I got on in this writing, I realized that the scope of the topic is bigger than can be tackled in a single post. What is presented here will only be the first part (though I haven’t yet had a chance to write later parts!) The succeeding posts may end up being as mathematical as this, but perhaps less so. Moveover, as I got to writing, I realized that I haven’t posted a good bit of math here in a while: what good is the the mathematical poetry of physics if nobody sees it?

Magnets do not get less magical when you understand how they work: they get more compelling.


This image, taken from a website that sells quackery, highlights the intriguing properties of magnets. A solid object with apparently no moving parts has this manner of influencing the world around it. How can that not be magical? Lodestones have been magic forever and they do not get less magical with the explanation.

Truthfully, I’ve been thinking about the question of how they work for a couple days now. When I started out, I realized that I couldn’t just answer this out of hand, even though I would like to think that I’ve got a working understanding of magnetic fields –this is actually significant to me because the typical response to the Insane Clown Posse’s somewhat vacuous pondering is not really as simple as “Well, duh, magnetic fields you dope!” Someone really can explain how magnets work, but the explanation is really not trivial. That I got to a level in asking how they work where I said, “Well, um, I don’t really know this,” got my attention. How the details fit together gets deep in a hurry. What makes a bar magnet like the one in the picture above special? You don’t put batteries in it. You don’t flick a switch. It just works.

For most every person, that pattern above is the depth of how it works. How does it work? Well, it has a magnetic field. And, everybody has played with magnets at some point, so we sort of all know what they do, if not how they do it.


In this picture from penguin labs, these magnets are exerting sufficient force on one another that many of them apparently defy gravity. Here, the rod simply keeps the magnets confined so that they can’t change orientations with respect to one another and they exert sufficient repulsive force to climb up the rod as if they have no weight.

It’s definitely cool, no denying. There is definitely a quality to this that is magical and awe inspiring.

But, is it better knowing how they work, or just blindly appreciating them because it’s too hard to fill in the blank?

The central feature of how magnets work is quite effortlessly explained by the physics of Electromagnetism. Or, maybe it’s better to say that the details are laboriously and completely explained. People rebel against how hard it is to understand the details, but no true explanation is required to be easily explicable.

The forces which hold those little pieces of metal apart are relatively understandable.

Lorentz force

Here’s the Lorentz force law. It says that the force (F) on an object with a charge is equal to sum of the electric force on the object (qE) plus the magnetic force (qvB). Magnets interact solely by magnetic force, the second term.


In this picture from Wikipedia, if a charge (q) moving with speed (v) passes into a region containing this thing we call a “magnetic field,” it will tend to curve in its trajectory depending on whether the charge is negative or positive. We can ‘see’ this magnetic field thing in the image above with the bar magnet and iron filings. What is it, how is it produced?

The fundamental observation of magnetic fields is tied up into a phenomenological equation called the Biot-Savart law.


This equation is immediately intimidating. I’ve written it in all of it’s horrifying Jacksonian glory. You can read this equation like a sentence. It says that all the magnetic field (B) you can find at a location in space (r) is proportional to a sum of all the electric currents (J) at all possible locations where you can find any current (r’) and inversely proportional to the square of the distance between where you’re looking for the magnetic field and where all the electrical currents are –it may say ‘inverse cube’ in the equation, but it’s actually an inverse square since there’s a full power of length in the numerator. Yikes, what a sentence! Additionally, the equation says that the direction of the magnetic field is at right angles to both the direction that the current is traveling and the direction given by the line between where you’re looking for magnetic field and where the current is located. These directions are all wrapped up in the arrow scripts on every quantity in the equation and are determined by the cross-product as denoted by the ‘x’. The difference between the two ‘r’ vectors in the numerator creates a pure direction between the location of a particular current element and where you’re looking for magnetic field. The ‘d’ at the end is the differential volume that confines the electric currents and simply means that you’re adding up locations in 3D space. The scaling constants outside the integral sign are geometrical and control strength; the 4 and Pi relate to the dimensionality of the field source radiated out into a full solid angle (it covers a singularity in the field due to the location of the field source) and the ‘μ’ essentially tells how space broadcasts magnetic field… where the constant ‘μ’ is closely tied to the speed of light. This equation has the structure of a propagator: it takes an electric current located at r’ and propagates it into a field at r.

It may also be confusing to you that I’m calling current ‘J’ when nearly every basic physics class calls it ‘I’… well, get used to it. ‘Current vector’ is a subtle variation of current.

I looked for some diagrams to help depict Biot-Savart’s components, but I wasn’t satisfied with what Google coughed up. Here’s a rendering of my own with all the important vectors labeled.

biotsavart diagram

Now, I showed the crazy Biot-Savart equation, but I can tell you right now that it is a pain in the ass to work with. Very few people wake up in the morning and say “Boy oh boy, Biot-Savart for me today!” For most physics students this equation comes with a note of dread. Directly using it to analytically calculate magnetic fields is not easy. That cross product and all the crazy vectors pointing in every which direction make this equation a monster. There are some basic feature here which are common to many fields, particularly the inverse square, which you can find in the Newtonian gravity formula or Coulomb’s law for electrostatics, and the field being proportional to some source, in this case an electric current, where gravity has mass and electrostatics have charge.

Magnetic field becomes extraordinary because of that flipping (God damned, effing…) cross product, which means that it points in counter-intuitive directions. With electrostatics and gravity, the field is usually going toward or away from the source, while magnetism has the field seems to be going ‘around’ the source. Moreover, unlike electrostatics and gravity, the source isn’t exactly a something, like a charge or a mass, it’s dynamic… as in a change in state; electric charges are present in a current, but if you have those charges sitting stationary, even though they are still present, they can’t produce a magnetic field. Moreover, if you neutralize the charge, a magnetic field can still be present if those now invisible charges are moving to produce a current: current flowing in a copper wire is electric charges that are moving along the wire and this produces a magnetic field around the wire, but the presence of positive charges fixed to the metal atoms of the wire neutralizes the negative charges of the moving electrons, resulting in a state of otherwise net neutral charge. So, no electrostatic field, even though you have a magnetic field. It might surprise you to know that neutron stars have powerful magnetic fields, even though there are no electrons or protons present in order give any actual electric currents at all. The requirement for moving charges to produce a magnetic field is not inconsistent with the moving charge required to feel force from a magnetic field as well. Admittedly, there’s more to it than just ‘currents’ but I’ll get to that in another post.

With a little bit of algebraic shenanigans, Biot-Savart can be twisted around into a slightly more tractable form called Ampere’s Law, which is one of the four Maxwell’s equations that define electromagnetism. I had originally not intended to show this derivation, but I had a change of heart when I realized that I’d forgotten the details myself. So, I worked through them again just to see that I could. Keep in mind that this is really just a speed bump along the direction toward learning how magnets work.

For your viewing pleasure, the derivation of the Maxwell-Ampere law from the Biot-Savart equation.

In starting to set up for this, there are a couple fairly useful vector identities.

Useful identities 1

This trio contains several basic differential identities which can be very useful in this particular derivation. Here, the variables r are actually vectors in three dimensions. For those of you who don’t know these things, all it means is this:


These can be diagrammed like this:

vector example

This little diagram just treats the origin like the corner of a 3D box and each distance is a length along one of the three edges emanating from the corner.

I’ll try not to get too far afield with this quick vector tutorial, but it helps to understand that this is just a way to wrap up a 3D representation inside a simple symbol. The hatted symbols of x,y and z are all unit vectors that point in the relevant three dimensional directions where the un-hatted symbols just mean a variable distance along x or y or z. The prime (r’) means that the coordinate is used to tell where the electric current is located while the unprime (r) means that this is the coordinate for the magnetic field. The upside down triangle is an operator called ‘del’… you may know it from my hydrogen wave function post. What I’m doing here is quite similar to what I did over there before. For the uninitiated, here are gradient, divergence and curl:


Gradient works on a scalar function to produce a vector, divergence works on a vector to produce a scalar function and curl works on a vector to produce a vector. I will assume that the reader can take derivatives and not go any further back than this. The operations on the right of the equal sign are wrapped up inside the symbols on the left.

One final useful bit of notation here is the length operation. Length operation just finds the length of a vector and is denoted by flat braces as an absolute value. Everywhere I’ve used it, I’ve been applying it to a vector obtained by finding the distance between where two different vectors point:


As you can see, notation is all about compressing operations away until they are very compact. The equations I’ve used to this point all contain a great deal of math lying underneath what is written, but you can muddle through by the examples here.

Getting back to my identity trio:

Useful identities 1

The first identity here (I1) takes the vector object written on the left and produces a gradient from it… the thing in the quotient of that function is the length of the difference between those two vectors, which is simply a scalar number without a direction as shown in the length operation as written above.

The second identity (I2) here takes the divergence of the gradient and reveals that it’s the same thing as a Dirac delta (incredibly easy way to kill an integral!). I’ve not written the operation as divergence on a gradient, but instead wrapped it up in the ‘square’ on the del… you can know it’s a divergence of a gradient because the function inside the parenthesis is a scalar, meaning that the first operation has to be a gradient, which produces a vector, which automatically necessitates the second operation to be a divergence, since that only works on vectors to produce scalars.

The third identity (I3) shows that the gradient with respect to the unprimed vector coordinate system is actually equal to a negative sign times the primed coordinate system… which is a very easy way to switch from a derivative with respect to the first r and the same form of derivative with respect to the second r’.

To be clear, these identities are tailor-made to this problem (and similar electrodynamics problems) and you probably will never ever see them anywhere but the *cough cough* Jackson book. The first identity can be proven by working the gradient operation and taking derivatives. The second identity can be proven by using the vector divergence theorem in a spherical polar coordinate system and is the source of the 4*Pi that you see everywhere in electromagnetism. The third identity can also be proven by the same method as the first.

There are two additional helpful vector identities that I used which I produced in the process of working this derivation. I will create them here because, why not! If the math scares you, you’re on the wrong blog. To produce these identities, I used the component decomposition of the cross product and a useful Levi-Civita kroenecker delta identity –I’m really bad at remembering vector identities, so I put a great deal of effort into learning how to construct them myself: my Levi-Civita is ghetto, but it works well enough. For those of you who don’t know the ol’ Levi-Civita symbol, it’s a pretty nice tool for constructing things in a component-wise fashion: εijk . To make this work, you just have to remember it as I just wrote it… if any indices are equal, the symbol is zero, if they are all different, they are 1 or -1. If you take it as ijk, with the indices all different as I wrote, it equals 1 and becomes -1 if you reverse two of the indices: ijk=1, jik=-1, jki=1, kji=-1 and so on and so forth. Here are the useful Levi-Civita identities as they relate to cross product:


Using these small tools, the first vector identity that I need is a curl of a curl. I derive it here:

vector id 1

Let’s see how this works. I’ve used colors to show the major substitutions and tried to draw arrows where they belong. If you follow the math, you’ll note that the Kroenecker deltas have the intriguing property of trading out indices in these sums. Kroenecker delta works on a finite sum the same way a Dirac delta works on an integral, which is nothing more than an infinite sum. Also, the index convention says that if you see duplicated indices, but without a sum on that index, you associate a sum with that index… this is how I located the divergences in that last step. This identity is a soft stopping point for the double curl: I could have used the derivative produce rule to expand it further, but that isn’t needed (if you want to see it get really complex, go ahead and try it! It’s do-able.) One will note that I have double del applied on a vector here… I said that it only applies on scalars above… in this form, it would only act on the scalar portion of each vector component, meaning that you would end up with a sum of three terms multiplied by unit vectors! Double del only ever acts on scalars, but you actually don’t need to know that in the derivation below.

This first vector identity I’ve produced I’ll call I4:

useful vector id 1

Here’s a second useful identity that I’ll need to develop:

useful vector id 2

This identity I’ll call I5:

vector id 2

*Pant Pant* I’ve collected all the identities I need to make this work. If you don’t immediately know something off the top of your head, you can develop the pieces you need. I will use I1, I2, I3, I4 and I5 together to derive the Maxwell-Ampere Law from Biot-Savart. Most of the following derivation comes from Jackson Electrodynamics, with a few small embellishments of my own.

first line amp devIn this first line of the derivation, I’ve rewritten Biot-Savart with the constants outside the integral and everything variable inside. Inside the integral, I’ve split the meat so that the different vector and scalar elements are clear. In what follows, it’s very important to remember that unprimed del operators are in a different space from the primed del operators: a value (like J) that is dependent on the primed position variable is essentially a constant with respect to the unprimed operator and will render a zero in a derivative by the unprimed del. Moreover, unprimed del can be moved into or out of the integral, which is with respect to the primed position coordinates. This observation is profoundly important to this derivation.

BS to amp 1

The usage of the first two identities here manages to extract the cross product from the midst of the function and puts it into a manipulable position where the del is unprimed while the integral is primed, letting me move it out of the integrand if I want.

BS to amp 2

This intermediate contains another very important magnetic quantity in the form of the vector potential (A) –“A” here not to be confused with the alphabetical placeholder I used while deriving my vector identities. I may come back to vector potential later, but this is simply an interesting stop-over for now. From here, we press on toward the Maxwell-Ampere law by acting in from the left with a curl onto the magnetic field…

BS to amp 3

The Dirac delta I end with in the final term allows me to collapse r’ into r at the expense of that last integral. At this point, I’ve actually produced the magnetostatic Ampere’s law if I feel like claiming that the current has no divergence, but I will talk about this later…

BS to amp 4

This substitution switches del from being unprimed to primed, putting it in the same terms as the current vector J. I use integration by parts next to switch which element of the first term the primed del is acting on.

BS to amp 5

Were I being really careful about how I depicted the integration by parts, there would be a unit vector dotted into the J in order to turn it into a scalar sum in that first term ahead of the integral… this is a little sloppy on my part, but nobody ever cares about that term anyway because it’s presupposed to vanish at the limits where it’s being evaluated. This is a physicist trick similar to pulling a rug over a mess on the floor –I’ve seen it performed in many contexts.

BS to amp 6

This substitution is not one of the mathematical identities I created above, this is purely physics. In this case, I’ve used conservation of charge to connect the divergence of the current vector to the change in charge density over time. If you don’t recognize the epic nature of this particular substitution, take my word for it… I’ve essentially inverted magnetostatics into electrodynamics, assuring that a ‘current’ is actually a form of moving charge.

BS to amp 75

In this line, I’ve switched the order of the derivatives again. Nothing in the integral is dependent on time except the charge density, so almost everything can pass through the derivative with respect to time. On the other hand, only the distance is dependent on the unprimed r, meaning that the unprimed del can pass inward through everything in the opposite direction.

BS to amp 8

At this point something amazing has emerged from the math. Pardon the pun; I’m feeling punchy. The quantity I’ve highlighted blue is a form of Coulomb’s law! If that name doesn’t tickle you at the base of your spine, what you’re looking at is the electrostatic version of the Biot-Savart law, which makes electric fields from electric charges. This is one of the reasons I like this derivation and why I decided to go ahead and detail the whole thing. This shows explicitly a connection between magnetism and electrostatics where such connection was not previously clear.

BS to amp 9

And thus ends the derivation. In this casting, the curl of the magnetic field is dependent both on the electric field and on currents. If there is no time varying electric field, that first term vanishes and you get the plain old magnetostatic Ampere’s law:

Ampere's law

This says simply that the curl of the magnetic field is equal to the current. There are some interesting qualities to this equation because of how the derivation leaves only a single positional dependence. As you can see, there is no separate position coordinate to describe magnetic field independently from its source. And, really, it isn’t describing the magnetic field as ‘generated’ by the current, but rather that a deformation to the linearity of the magnetic field is due to the presence of a current at that location… which is an interesting way to relate the two.

This relationship tends to cause magnetic lines to orbit around the current vector.


This image from hyperphysics sums up the whole situation –I realize I’ve been saying something similar from way up, but this equation is proof. If you have current passing along a wire, magnetic field will tend to wrap around the wire in a right handed sense. For all intents and purposes, this is all the Ampere’s law says, neglecting that you can manipulate the geometry of the situation to make the field do some interesting things. But, this is all.

Well, so what? I did a lot of math. What, if anything, have I gained from it? How does this help me along the path to understanding magnets?

The Ampere Law is useful in generating very simple magnetic field configurations that can be used in the Lorentz force law, ultimately showing a direct dynamical connection between moving currents and magnetic fields. I have it in mind to show a freshman level example of how this is done in the next part of this series. Given the length of this post, I will do more math in a different post.

This is a big step in the direction of learning how magnets work, but it should leave you feeling a little unsatisfied. How exactly do the forces work? In physics, it is widely known that magnetic fields do no work, so why is it that bar magnets can drag each other across the counter? That sure looks like work to me! And if electric currents are necessary to drive magnets, why is it that bar magnets and horseshoe magnets don’t require batteries? Where are the electric currents that animate a bar magnet and how is it that they seem to be unlimited or unpowered? These questions remain to be addressed.

Until the next post…

Hydrogen atom radial equation

In between the Sakurai problems, I decided to tackle a small problem I set for myself.

The Sakurai quantum mechanics book is directed at about graduate student level, meaning that it explicitly overlooks problems that it deems too ‘undergraduate.’ When I started into the next problem in the chapter, which deals with the Wigner-Eckert relation, I decided to direct myself at a ‘lower level’ problem that demands practice from time to time. I worked in early January solving the angular component of the hydrogen atom by deriving the spherical harmonics and much of my play time since has been devoted to angular and angular momentum type problems. So, I decided it would be worth switching up a little and solving the radial portion of the hydrogen atom electron central force problem.

One of my teachers once suggested that deriving the hydrogen atom was a task that any devoted physicist should play with every other year or so. Why not, I figured; the radial solution is actually a bit more mind boggling to me than the angular parts because it requires some substitutions that are not very intuitive.

The hydrogen atom problem is a classic problem mainly because it’s one of the last exactly solvable quantum mechanics problems you ever encounter. After the hydrogen atom, the water gets deeper and the field starts to focus on tools that give insight without actually giving exact answers. The only atomic system that is exactly solvable is the hydrogen atom… even helium, with just one more electron, demands perturbation in some way. It isn’t exactly crippling to the field because the solutions to all the other atoms are basically variations of the hydrogen atom and all, with some adjustment, have hydrogenic geometry or are superpositions of hydrogen-like functions that are only modified to the extent necessary to make the energy levels match. Solving the hydrogen atom ends up giving profound insight to the structure of the periodic table of the elements, even if it doesn’t actually solve for all the atoms.

As implied above, I decided to do a simplified version of this problem, focusing only on the radial component. The work I did on the angular momentum eigenstates was not in context of the hydrogen electron wave function, but can be inserted in a neat cassette to avoid much of the brute labor of the hydrogen atom problem. The only additional work needed is solving the radial equation.

A starting point here is understanding spherical geometry as mediated by spherical polar coordinates.

A hydrogen atom, as we all know from the hard work of a legion of physicists coming into the turn of the century, is a combination of a single proton with a single electron. The proton has one indivisible positive charge while the electron has one indivisible negative charge. These two charges attract each other and the proton, being a couple thousand times more massive, pulls the electron to it. The electron falls in until the kinetic energy it gains forces it to have enough momentum to be unlocalized to a certain extent, as required by quantum mechanical uncertainty. The system might then radiate photons as the electron sorts itself into a stable orbiting state. The resting combination of proton and electron has neutral charge with the electron ‘distributed’ around the proton in a sort of cloud as determined by its wave-like properties.

The first approximation of the hydrogen atom is a structure called the Bohr model, proposed by Niels Bohr in 1913. The Bohr model features classical orbits for the electron around the nucleus, much like the moon circles the Earth.


This image, from duckster.com, is a crude example of a Bohr atom. The Bohr atom is perhaps the most common image of atoms in popular culture, even if it isn’t correct. Note that the creators of this cartoon didn’t have the wherewithall to make a ‘right’ atom, giving the nucleus four plus charges and the shell three minus… this would be a positively charged ion of Beryllium. Further, the electrons are not stacked into a decent representation for the actual structure: cyclic orbitals would be P-orbitals or above, where Beryllium has only S-orbitals for its ground state, which possess either no orbital angular momentum, or angular momentum without any defined direction. But, it’s a popular cartoon. Hard to sweat the small stuff.

The Bohr model grew from the notion of the photon as a discrete particle, where Bohr postulated that the only allowed stable orbits for the electron circling the nucleus is at integer quantities of angular momentum delivered by single photons… as quantized by Planck’s constant. ‘Quantized’ is a word invoked to mean ‘discrete quantities’ and comes back to that pesky little feature Deepak Chopra always ignores: the first thing we ever knew about quantum mechanics was Planck’s constant –and freaking hell is Planck’s constant small! ‘Quantization’ is the act of parsing into discrete ‘quantized’ states and is the word root which loaned the physics field its name: Quantum Mechanics. ‘Quantum Mechanics’ means ‘the mechanics of quantization.’

Quantum mechanics, as it has evolved, approaches problems like the hydrogen atom using descriptions of energy. In the classical sense, an electron orbiting a proton has some energy describing its kinetic motion, its kinetic energy, and some additional energy describing the interaction between the two masses, usually as a potential source of more kinetic energy, called a potential energy. If nothing interacts from the outside, the closed system has a non-varying total energy which is the sum of the kinetic and potential energies. Quantum mechanics evolved these ideas away from their original roots using a version of Hamiltonian formalism. Hamiltonian formalism, as it appears in quantum, is a way to merely sum up kinetic and potential energies as a function of position and momentum –this becomes complicated in Quantum because of the restriction that position and momentum cannot be simultaneously known to arbitrary precision. But, Schrodinger’s equation actually just boils down to a statement of kinetic energy plus potential energy.

Here is a quick demonstration of how to get from a statement of total energy to the Schrodinger equation:

5-12-16 schrodinger

After ‘therefore,’ I’ve simply multiplied in from the right with a wave function to make this an operator equation. The first term on the left is kinetic energy in terms of momentum while the second term is the Gaussian CGS form of potential energy for the electrical central force problem (for Gaussian CGS, the constants of permittivity and permeability are swept under the rug by collecting them into the speed of light and usually a constant of light speed appears with magnetic fields… here, the charge is in statcoulombs, which take coulombs and wrap in a scaling constant of 4*Pi.) When you convert momentum into its position space representation, you get Schrodinger’s time independent equation for an electron under a central force potential. The potential, which depends on the positional expression of ‘radius,’ has a negative sign to make it an attractive force, much like gravity.

Now, the interaction between a proton and an electron is a central force interaction, which means that the radius term could actually be pointed in any direction. Radius would be some complicated combination of x, y and z. But, because the central force problem is spherically symmetric, if we could move out of Cartesian coordinates and into spherical polar, we get a huge simplification of the math. The inverted triangle that I wrote for the representation of momentum is a three dimensional operator called the Laplace operator, or ‘double del.’ Picking the form of del ends up casting the dimensional symmetry of the differential equation… as written above, it could be Cartesian or spherical polar or cylindrical, or anything else.

A small exercise I sometimes put myself through is defining the structure of del. The easiest way that I know to do this is to pull apart the divergence theory of vector calculus in Spherical polar geometry, which means defining a differential volume and differential surfaces.

5-12-16 central force 2

Well, that turned out a little neater than my usual meandering crud.

This little bit of math is defining the geometry of the coordinate variables in spherical polar coordinates. You can see the spherical polar coordinates in the Cartesian coordinate frame and they consist of a radial distance from the origin and two angles, Phi and Theta, that act at 90 degrees from each other. If you pick a constant radius in spherical polar space, you get a spherical surface where lines of constant Phi and Theta create longitude and latitude lines, respectively, making a globe! You can establish a right handed coordinate system in spherical polar space by picking a point and considering it to be locally Cartesian… the three dimensions at this point are labeled as shown, along the outward radius and in the directions in which each of the angles increases.

If you were to consider an infinitesimal volume of these perpendicular dimensions, at this locally cartesian point, it would be a volume that ‘approaches’ cubic. But then, that’s the key to calculus: recognizing that 99.999999 effectively approaches 100. So then, this framework allows you to define the calculus occurring in spherical polar space. The integral performed along Theta, Phi and Rho would be adding up tiny cubical elements of volume welded together spherically, while the derivative would be with respect to each dimension of length as locally defined. The scaling values appear because I needed to convert differentials of angle into linear length in order to calculate volume, which can be accomplished by using the definition of the radian angle, which is arc length per radius –a curve is effectively linear when an arc becomes so tiny as to be negligible when considering the edges of an infinitesimal cube, like thinking about the curvature of the Earth effecting the flatness of the sidewalk outside your house.

The divergence operation uses Green’s formulas to say that a volume integral of divergence relates to a surface integral of flux wrapping across the surface of that same volume… and then you simply chase the constants. All that I do to find the divergence differential expression is to take the full integral and remove the infinite sum so that I’m basically doing algebra on the infinitesmal pieces, then literally divide across by the volume element and cancel the appropriate differentials. There are three possible area integrals because the normal vector is in three possible directions, one each for Rho, Theta and Phi.

The structure becomes a derivative if the volume is in the denominator because volume has one greater dimension than any possible area, where the derivative is with respect to the dimension of volume that doesn’t cancel out when you divide against the areas. If a scaling variable used to convert theta or phi into a length is dependent on the dimension of the differential left in the denominator, it can’t pass out of the derivative and remains inside at completion. The form of the divergence operation on a random vector field appears in the last line above. The value produced by divergence is a scalar quantity with no direction which could be said to reflect the ‘poofiness’ of a vector field at any given point in the space where you’re working.

I then continued by defining a gradient.

5-12-16 central force 1

Gradient is basically an opposite operation from divergence. Divergence creates a scalar from a vector which represents the intensity of ‘divergence’ at some point in a smooth function defined across all of space. Gradient, on the other hand, creates a vector field out of a scalar function, where the vectors point in the dimensional direction where the function tends to be increasing.

This is kind of opaque. One way to think about this is to think of a hill poking out of a two dimensional plane. A scalar function defines the topography of the hill… it says simply that at some pair of coordinates in a plane, the geography has an altitude. The gradient operation would take that topography map and give you a vector field which has a vector at every location that points in the direction toward which the altitude is increasing at that location. Divergence then goes backward from this, after a fashion: it takes a vector map and coverts it into a map which says ‘strength of change’ at every location. This last is not ‘altitude’ per se, but more like ‘rate at which altitude is changing’ at a given point.

The Laplace operator combines gradient with divergence as literally the divergence of a gradient, denoted as ‘double del,’ the upside-down triangle squared.

In the last line, I’ve simply taken the Laplace operator in spherical polar coordinates and dropped it into its rightful spot in Schrodinger’s equation as shown far above. Here, the wave equation, called Psi, is a density function defined in spherical polar space, varying along the radius (Rho) and the angles Theta and Phi (the so-called ‘solid angle’). Welcome to greek word salad…

What I’ve produced is an explicit form for Schrodinger’s equation with a coordinate set that is conducive to the problem. This differential equation is a multivariate second order partial differential equation. You have to solve this by separation of variables.

Having defined the hydrogen atom Schrodinger equation, I now switch to the more simple ‘radial only’ problem that I originally hinted at. Here’s how you cut out the angular parts:

5-12-16 radial schrodinger equation

You just recognize that the second and third differential terms are collectively the square of the total angular momentum and then use the relevant eigenvalue equation to remove it.

The L^2 operator comes out of the kinetic energy contained in the electron going ‘around.’ For the sake of consistency, it’s worth noting that the Hamiltonian for the full hydrogen atom contains a term for the kinetic energy of the proton and that the variable Rho refers to the distance between the electron and proton… in its right form, the ‘m’ given above is actually the reduced mass of that system and not directly the mass of the electron, which gives us a system where the electron is actually orbiting the center of mass, not the proton.

Starting on this problem, it’s convenient to recognize that the Psi wave function is a product of a Ylm (angular wave function) with a Radial function. I started by dividing out the Ylm and losing it. Psi basically just becomes R.

5-13-16 radial equation 1

The first thing to do is take out the units. There is a lot of extra crap floating around in this differential equation that will obscure the structure of the problem. First, take the energy ‘E’ down into the denominator to consolidate the units, then make a substitution that hides the length unit by setting it to ‘one’. This makes Rho a multiple of ‘r’ involving energy. The ‘8’ wedged in here is crazily counter intuitive at this point, but makes the quantization work in the method I’ve chosen! I’ll point out the use when I reach it. At the last line, I substitute for Rho and make a bunch of cancellations. Also, in that last line, there’s an “= R” which fell off the side of the picture –I assure you it’s there, it just didn’t get photographed.

After you clean everything up and bringing the R over from the behind the equals sign, the differential equation is a little simpler…

5-13-16 radial equation 2

The ‘P’ and ‘Q’ are quick substitutions made so that I don’t have to work as hard doing all this math; they are important later, but they just need to be simple to use at the moment. I also make a substitution for R, by saying that R = U/r. This converts the problem from radial probability into probability per unit radius. The advantage is that it lets me break up the complicated differential expression at the beginning of the equation.

The next part is to analyze the ‘asymptotic behavior’ of the differential equation. This is simply to look at what terms become important as the radius variable grows very big or very small. In this case, if radius gets very big, certain terms become small before others. If I can consider the solution U to be a separable composition of parts that solve different elements of this equation, I can create a further simplification.

5-13-16 asymptotic correction

If you consider the situation where r is very very big, the two terms in this equation which are 1/r or 1/r^2 tend to shrink essentially to zero, meaning that they have no impact on the solution at big radii. This gives you a very simple differential equation at big radii, as written at right, which is solved by a simple exponential with either positive or negative roots. I discard the positive root solution because I know that the wave equation must suppress to zero as r goes far away and because the positive exponential will tend to explode, becoming bigger the further you get from the proton –this situation would make no physical sense because we know the proton and electron to be attractive to one another and solutions that have them favor being separated don’t match the boundaries of the problem. Differential equations are frequently like this: they have multiple solutions which fit, but only certain solutions that can be correct for a given situation –doing derivatives loses information, meaning that multiple equations can give the same derivative and in going backward, you have to cope with this loss of information. The modification I made allows me to write U as a portion that’s an unknown function of radius and a second portion that fits as a negative exponent. Hidden here is a second route to the same solution of this problem… if I considered the asymptotic behavior at small radii. I did not utilize the second asymptotic condition.

I just need now to find a way to work out the identity of the rest of this function. I substitute the U back in with its new exponentially augmented form…

5-13-16 Froebenius

With the new version of U, the differential equation rearranges to give a refined set of differentials. I then divide out the exponential so that I don’t have it cluttering things up. All this jiggering about has basically reduced the original differential equation to a skin and bones that still hasn’t quite come apart. The next technique that I apply is the Frobenius method. This technique is to guess that the differential equation can be solved by some infinite power series where the coefficients of each power of radius control how much a particular power shows up in the solution. It’s basically just saying “What if my solution is some polynomial expression Ar^2 -Br +C,” where I can include as many ‘r’s as I want. This can be very convenient because the calculus of polynomials is so easy. In the ‘sum,’ the variable n just identifies where you are in the series, whether at n=0, which just sets r to 1, or n=1000, which has a power of r^1000. In this particular case, I’ve learned that the n=0 term can actually be excluded because of boundary conditions since the probability per unit radius will need to go to zero at the origin (at the proton), and since the radius invariant term can’t do that, you need to leave it out… I didn’t think of that as I was originally working the problem, but it gets excluded anyway for a second reason that I will outline later.

The advantage of Frobenius may not be apparent right away, but it lets you reconstruct the differential equation in terms of the power series. I plug in the sum wherever the ‘A’ appears and work the derivatives. This relates different powers of r to different A coefficients. I also pull the 1/r and 1/r^2 into their respective sums to the same affect. Then, you rewrite two of the sums by advancing the coefficient indices and rewriting the labels, which allows all the powers of r to be the same power, which can be consolidated all under the same sum by omitting coefficients that are known to be zero. This has the effect of saying that the differential equation is now identically repeated in every term of the sum, letting you work with only one.

The result is a recurrence relation. For the power series to be a solution to the given differential equation, each coefficient is related to the one previous by a consistent expression. The existence of the recurrence relation allows you to construct a power series where you need only define one coefficient to immediately set all the rest. After all those turns and twists, this is a solution to the radial differential equation, but not in closed form.

Screwing around with all this math involved a ton of substitutions and a great deal of recasting the problem. That’s part of why solving the radial equation is challenging. Here is a collection of all the important substitutions made…

Collecting solution

As you can see, there is layer on layer on layer of substitution here. Further, you may not realize it yet, but something rather amazing happened with that number Q.

Quantize radial equation

If you set Q/4 = -n, the recurrence relation which generates the power series solution for the radial wave function cuts off the sequence of coefficients with a zero. This gives a choice for cutting off the power series after only a few terms instead of including the infinite number of possible powers, where you can choose how many terms are included! Suddenly, the sum drops into a closed form and reveals an infinite family of solutions that depend on the ‘n’ chosen as to cut off. Further, Q was originally defined as a function of energy… if you substitute in that definition and solve for ‘E,’ you get an energy dependent on ‘n’. These are the allowed orbital energies for the hydrogen atom.

This is an example of Quantization!

Having just quantized the radial wave function of the hydrogen atom, you may want to sit back and smoke a cigarette (if you’re into that sort of thing).

It’s opaque and particular to this strategy, but the ‘8’ I chose to add way back in that first substitution that converts Rho into r came into play right here. As it turns out, the 4 which resulted from pulling a 2 out of the square root twice canceled another 2 showing up during a derivative done a few dozen lines later and had the effect of keeping a 2 from showing up with the ‘n’ on top of the recurrence relation… allowing the solutions to be successive integers in the power series instead of every other integer. This is something you cannot see ahead, but has a profound, Rube Goldbergian effect way down the line. I had to crash into the extra two while doing the problem to realize it might be needed.

At this point, I’ve looked at a few books to try to validate my method and I’ve found three different ways to approach this problem, all producing equivalent results. This is only one way.

The recurrence relation also gives a second very important outcome:

n to l relation

The energy quantum number must be bigger than the angular momentum quantum number. ‘n’ must always be bigger than ‘l’ by at least 1. And secondarily, and this is really important, the unprimed n must also always be bigger than ‘l.’ This gives:

n’ = n > l

This constrains which powers of n can be added in the series solution. You can’t just start blindly at the zero order power; ‘n’ must be bigger than ‘l’ so that it never equals ‘l’ in the denominator and the primed number is always bigger too. If ‘l’ and ‘n’ are ever equal, you get an undefined term. One might argue that maybe you can include negative powers of n, but these will produce terms that are 1/r, which are asymptotic at the origin and blow up when the radius is small, even though we know from the boundary conditions that the probability must go to zero at the origin. There is therefore a small window of powers that can be included in the sum, going between n = l+1 and n = n’.

I spent some significant effort thinking about this point as I worked the radial problem this time; for whatever reason, it has always been hazy in my head which powers of the sum are allowed and how the energy and angular momentum quantum numbers constrained them. The radial problem can sometimes be an afterthought next to the intricacy of the angular momentum problem, but it is no less important.

For all of this, I’ve more or less just told you the ingredients needed to construct the radial wave functions. There is a big amount of back substitution and then you must work the recurrence relation while obeying the quantization conditions I’ve just detailed.

constructing solution

A general form for the radial wave equations appears at the lower right, fabricated from the back-substitutions. The powers of ‘r’ in the series solution must be replaced with the original form of ‘rho’ which now includes a constant involving mass, charge and Plank’s constant which I’ve dubbed the Bohr radius. The Bohr radius ao is a relic of the old Bohr atom model that I started off talking about and it’s used as the scale length for the modern version of the atom. The wave function, as you can see, ends up being a polynomial in radius multiplied by an exponential, where the polynomial is further multiplied by a single 1/radius term and includes terms that are powers of radial distance between l+1, where l is the angular momentum quantum number, and n’, the energy quantum number.

Here is how you construct a specific hydrogen atom orbital from all the gobbledigook written above. This is the simplest orbital, the S-orbital, where the energy quantum number is 1 and the angular momentum is 0. This uses the Y00 spherical harmonic, the simplest spherical harmonic, which more or less just says that the wave function does not vary across any angle, making it completely spherically symmetric.

Normalized S orbital

The ‘100’ attached in subscript to the Psi wave function is a physicist shorthand for representing the hydrogen atom wave functions: these subscripts are ‘nlm,’ the three quantum numbers that define the orbital, which are n=1, l=0 and m=0 in this case. All I’ve done to produce the final wave function is take my prescription from before and use it to construct one of an infinite series of possible solutions. I then perform the typical Quantum Mechanics trick of making it a probability distribution by normalizing it. The process of normalization is just to make certain that the value ‘under the curve’ contained by the square of the wave function, counted up across all of space in the integral, is 1. This way, you have a 100% chance of finding the particle somewhere in space as defined by the probability distribution of the wave function.

You can use the wave function to ask questions about the distribution of the electron in space around the proton –for instance, what’s the average orbital radius of the electron? You just look for the expectation value of the radius using the wave function probability distribution:

Average radius

For the hydrogen atom ground state, which is the lowest energy state for a 1 electron, 1 proton atom, the electron is distributed, on average, about 1 and a half Bohr radii from the nucleus. Bohr radius is about 0.52 angstrom (1×10^-10 meters), which means that the electron is on average distributed 0.78 angstroms from the nucleus.

(special note 8-2-17: If you’ve read my recent post on parity symmetry, you may be wondering why this situation doesn’t break parity. Average position can never be reported as anything other than zero for a pure eigenstate–and yet I’ve reported a positionally related average value other than zero right here. The reason this doesn’t break parity symmetry is because the radial distance is only fundamentally defined over “half” of space to begin with, from a radius of zero to a radius of infinity and with no respect for a direction from the origin. In asking “What’s average radius?” I’m not asking “What’s the average position?” Another way to look at this is that the radius operator Rho is a parity symmetric operator since it doesn’t reverse under parity transformation and it can connect states that have the same parity, allowing radial expectation values to be non-zero.)

Right now, this is all very abstract and mathematical, so I’ll jump into the more concrete by including some pictures. Here is a 3D density plot of the wave function performed using Mathematica.

S-orbital density

Definitely anticlimactic and a little bit blah, but this is the ground state wave function. We know it doesn’t vary in any angle, so it has to be spherically symmetric. The axes are distance in units of Bohr’s radius. One thing I can do to make it a little more interesting is to take a knife to it and chop it in half.


This is just the same thing bisected. The legend at left just shows the intensity of the wave function as represented in color.

As you can see, this is a far cry from the atomic model depicted in cartoon far above.

For the moment, I’m going to hang up this particular blog post. This took quite a long time to construct. Some of the higher energy, larger angular momentum hydrogenic wave functions start looking somewhat crazy and more beautiful, but I really just had it in mind to show the math which produces them. I may produce another post containing a few of them as I have time to work them out and render images of them. If the savvy reader so desires, the prescriptions given here can generate any hydrogenic wave function you like… just refer back to my Ylm post where I talk some about the spherical harmonics, or by referring directly to the Ylm tables in wikipedia, which is a good, complete online source of them anyway.


Because I couldn’t leave it well enough alone, I decided to do images of one more hydrogen atom wave function. This orbital is 210, the P-orbital. I won’t show the equation form of this, but I did calculate it by hand before turning it over to Mathematica. In Mathematica, I’m not showing directly the wave function this time because the density plot doesn’t make clear intuitive sense, but I’m putting up the probability densities (which is the wave function squared).

P-orbital probabiltiy density

Mr. Peanut is the P-orbital. Here, angular momentum lies somewhere in the x-y plane since the z axis angular momentum eigenstate is zero. You can kind of think of it as a propeller where you don’t quite know which direction the axle is pointed.

Here’s a bisection of the same density map, along the long axis.

P-orbital probability density bisect

Edit 5-18-16

I keep finding interesting structures here. Since I was just sitting on all the necessary mathematical structures for hydrogen wave function 21-1 (no work needed, it was all in my notebook already), I simply plugged it into mathematica to see what the density plot would produce. The first image, where the box size was a little small, was perhaps the most striking of what I’ve seen thus far…

orbital21-1 squared

I knew basically that I was going to find a donut, but it’s oddly beautiful seen with the outsides peeled off. Here’s more of 21-1…


The donut turned out to be way more interesting than I thought. In this case, the angular momentum is pointing down the Z-axis since the Z-axis eigenstate is -1. This orbital shape is most similar qualitatively to the orbits depicted in the original Bohr atom model with an electron density that is known to be ‘circulating’ clockwise primarily within the donut. This particular state is almost the definition of a magnetic dipole.