When he hit his chapter explaining Quantum Mechanics and his “Level 3 multiverse” I found that I profoundly disagree with this guy. It’s clear that he’s a grade A cosmologist, but I think he skirts dangerously close to being a quantum crank when it comes to multi-universe theory. I’ve been disagreeing with his take for the last couple driving sessions and I will do my best to try to sum for memory the specific issues that I’ve taken. Since this is a physicist making these claims, it’s important that I be accurate about my disagreement. In fact, I’ll start with just one and see whether I feel like going further from there…

The first place where I disagree is where he seems to show physicist Dunning-Kruger when regarding other fields in which he is not an expert. Physicists are very smart people, but they have a nasty habit of overestimating their competence in neighboring sciences… particularly biology. I am in a unique position in that I’ve been doubly educated; I have a solid background in biochemistry and cell molecular biology in addition to my background in quantum mechanics. I can speak at a fair level on both.

Professor Tegmark uses an anecdote (got to be careful here; anecdotes inflate mathematical imprecision) to illustrate how he feels quantum mechanics connects to events at a macroscopic level in organisms. There are many versions, but essentially he says this: when he is biking, the quantum mechanical behavior of an atom crossing through a gated ion channel in his brain affects whether or not he sees an oncoming car, which then may or may not hit him. By quantum mechanics, whether he gets hit or not by the car should be a superposition of states depending on whether or not the atom passes through the membrane of a neuron and enables him to have the thought to save himself or not. He ultimately elaborates this by asserting that “collapse free” quantum mechanics states that there is one universe where he saved himself and one universe where he didn’t… and he uses this as a thought experiment to justify what he calls a “level 3” multiverse with parallel realities that are coherent to each other but differ by the direction that a quantum mechanical wave function collapse took.

I feel his anecdote is a massive oversimplification that more or less throws the baby out with the bath water. Illustration of the quantum event in question is “Whether or not a calcium ion in his brain passes through a calcium gate” as connected to the macroscopic biological phenomenon of “whether he decides to bike through traffic” or alternatively “whether or not he decides to turn his eye in the appropriate direction” or alternatively “whether or not he sees a car coming when he starts to bike.”

You may notice this as a variant of the Schrodinger “Cat in a box” thought experiment. In this experiment, a cat is locked in a perfectly closed box with a sample of radioactive material and a Geiger counter that will dump acid onto the cat if it detects a decay; as long as the box is closed, the cat will remain in some superposition of states, conventionally considered “alive” or “dead” as connected with whether or not the isotope emitted a radioactive decay or not. I’ve made my feelings of this thought experiment known before here.

The fundamental difficulty comes down to what the superposition of states means when you start connecting an object with a very simple spectrum of states, like an atom, to an object with a very complex spectrum of states, like a whole cat. You could suppose that the cat and the radioactive emission become entangled, but I feel that there’s some question whether you could ever actually know whether or not they were entangled simply because you can’t discretely figure out what the superposition should mean: alive and dead for the cat are not a binary on-off difference from one another as “emitted or not” is for the radioactive atom. There are a huge number of states the cat might occupy that are very similar to one another in energy and the spectrum spanning “alive” to “dead” is so complicated that it might as well just be a thermal universe. If the entanglement actually happened or not, in this case, the classical thermodynamics and statistical mechanics should be enough to tell you in classically “accurate enough” terms what you find when you open the box. If you wait one half-life of a bulk radioactive sample, when you open the box, you’ll find a cat that is burned by acid to some degree or another. At some point, quantum mechanics does give rise to classical reality, but where?

The “but where” is always where these arguments hit their wall.

In the anecdote Tegmark uses, as I’ve written above, the “whether a calcium ion crossed through a channel or not” is the quantum mechanical phenomenon connected to “whether an oncoming car hit me or not while I was biking.”

The problem that I have with this particular argument is that it loses scale. This is where quantum flapdoodle comes from. Does the scale make sense? Is all the cogitation associated with seeing a car and operating a bike on the same scale as where you can actually see quantum mechanical phenomena? No, it isn’t.

First, all the information coming to your brain from your eyes telling you that the car is present originate from many many cells in your retina, involving billions of interactions with light. The muscles that move your eyes and your head to see the car are instructed from thousands of nerves firing simultaneously and these nerves fire from *gradients* of Calcium and other ions… molar scale quantities of atoms! A nerve doesn’t fire or not based on the collapse of possibilities for a single calcium ion. It fires based on thermodynamic quantities of ions flowing through many gated ion channels all at once. The net effect of one particular atom experiencing quantum mechanical ambivalence is swamped under statistically large quantities of atoms picking all of the choices they can pick from the whole range of possibilities available to them, giving rise to the bulk phenomenon of the neuron firing. Let’s put it this way: for the nerve to fire or not based on quantum mechanical superposition of calcium ions would demand that the nerve visit that single thermodynamic state where *all* the ions fail to flow through all the open ion gates in the membrane of the cell *all at once*… and there are statistically few states where this has happened compared to the statistically many states where some ions or many ions have chosen to pass through the gated pore (this is what underpins the chemical potential that drives the functioning of the cell). If you bothered to learn any stat mech at all, you would know that this state is such a rare one that it would probably not be visited even once in the entire age of the universe. Voltage gradients in nerve cells are established and maintained through copious application of chemical energy, which is truthfully constructed from quantum mechanics and mainly expressed in bulk level by plain old classical thermodynamics. And this is merely the state of whether a single nerve “fired or not” taken in aggregate with the fact that your capacity for “thought” doesn’t depend enough on a single nerve that you can’t lose that one nerve and fail to think –if a single nerve in your retina failed to fire, all the sister nerves around it would still deliver an image of the car speeding toward you to your brain.

Do atoms like a single calcium ion subsist in quantum mechanical ambivalence when left to their own devices? Yes, they do. But, when you put together a large collection of these atoms simultaneously, it is physically improbable that every single atom will make the same choice all at once. At some point you get a bulk thermodynamic behavior and the decision that your brain makes are based on bulk thermodynamic behaviors, not isolated quantum mechanical events.

Pretending that a person made a cognitive choice based on the quantum mechanical outcomes of a single atom is a reductio ad absurdum and it is profoundly disingenuous to start talking about entire parallel universes where you swerved right on your bike instead of left based on that single calcium atom (regardless of how liberally you wave around the butterfly effect). The nature of physiology in a human being at all levels is about biasing fundamentally random behavior into directed, ordered action, so focusing on one potential speck of randomness doesn’t mean that the aggregate should fail to behave as it always does. All the air in the room where you’re standing right now could suddenly pop into the far corner leaving you to suffocate (there is one such state in the statistical ensemble), but that doesn’t mean that it will…. closer to home, you might win a $500 million Power Ball Jackpot, but that doesn’t mean you will!

I honestly do not know what I think about the multiverse or about parallel universes. I would say I’m agnostic on the subject. But, if all parallel universe theory is based on such breathtaking Dunning-Kruger as Professor Tegmark exhibits when talking about the connection between quantum mechanics and actualization of biological systems, the only stance I’m motivated to take is that we don’t know nearly enough to be speculating. If Tegmark is supporting multiverse theory based on such thinking, he hasn’t thought about the subject deeply enough. Scale matters here and neglecting the scale means you’re neglecting the math! Is he neglecting the math elsewhere in his other huge, generalizing statements? For the scale of individual atoms, I can see how these ideas are seductive, but stretching it into statistical systems is just wrong when you start claiming that you’re seeing the effects of quantum mechanics at macroscopic biological levels when people actually do not. It’s like Tegmark is trying to give Deepak Chopra ammunition!

Ok, just one gripe there. I figure I probably have room for another.

In another series of statements that Tegmark makes in his discussion of quantum mechanics, I think he probably knows better, but by adopting the framing he has, he risks misinforming the audience. After a short discussion of the origins of Quantum Mechanics, he introduces the Schrodinger Equation as the end-all, be-all of the field (despite speaking briefly of Lagrangian path integral formalism elsewhere). One of the main theses of his book is that “the universe is mathematical” and therefore the whole of reality is deterministic based on the predictions of equations like Schrodinger’s equation. If you can write the wave equation of the whole universe, he says, Schrodinger’s equation governs how all of it works.

This is wrong.

And, I find this to miss most of the point of what physics is and what it actually does. Math is valuable to the physics, but one must always be careful that the math not break free of its observational justification. Most of what physics is about is making measurements of the world around us and fitting those measurements to mathematical models, the “theories” (small caps) provided to us by the Einsteins and the Sheldon Coopers… if the fit is close enough, the regularity of a given equation will sometimes make predictions about further observations that have not yet been made. Good theoretical equations have good provenance in that they predict observations that are later made, but the opposite can be said for bad theory, and the field of physics is littered with a thick layer of mathematical theories which failed to account for the observations, in one way or another. The process of physics is a big selection algorithm where smart theorists write every possible theory they can come up with and experimentalists take those theories and see if the data fit to them, and if they do accommodate observation, such a theory is promoted to a Theory (big caps) and is explored to see where its limits exist. On the other hand, small caps “theories” are discarded if they don’t accommodate observation, at which point they are replaced by a wave of new attempts that try to accomplish what the failure didn’t. As a result, new theories fit over old theories and push back predictive limits as time goes on.

For the specific example of Schrodinger’s equation, the mathematical model that it offers fits over the Bohr model by incorporating deBroglie’s matter wave. Bohr’s model itself fit over a previous model and the previous models fit over still earlier ideas had by the ancient Greeks. Each later iteration extends the accuracy of the model, where the development is settled depending on whether or not a new model has validated predictive power –this is literally survival of the fittest applied to mathematical models. Schrodinger’s equation itself has a limit where its predictive power fails: it cannot handle Relativity except as a perturbation… meaning that it can’t exactly predict outcomes that occur at high speeds. The deficiencies of the Schrodinger equation are addressed by the Klein-Gordon equation and by the Dirac equation and the deficiencies of those in turn are addressed by the path integral formalisms of Quantum Field Theory. If you knew the state equation for the whole universe, Schrodinger’s equation would not accurately predict how time unfolds because it fails to work under certain physically relevant conditions. The modern Quantum Field Theories fail at gravity, meaning that even with the modern quantum, there is no assured way of predicting the evolution of the “state equation of the universe” even if you knew it. There are a host of follow-on theories, String Theory, Quantum loop gravity and so and so forth that vy for being The Theory That Fills The Holes, but, given history, probably will only extend our understanding without fully answering all the remaining questions. That String Theory has not made a single prediction that we can actually observe right now should be lost on no one –there is a grave risk that it never will. We cannot at the moment pretend that the Schrodinger equation perfectly satisfies what we actually know about the universe from other sources.

It would be most accurate to say that reality seems to be quantum mechanical at its foundation, but that we have yet to derive the true “fully correct” quantum theory. Tegmark makes a big fuss about trying to explain “wave function collapse” doesn’t fit within the premise of Schrodinger’s equation but that the equation could hold as good quantum regardless if a “level three multiverse” is real. The opposite is also true: we’ve known Schrodinger’s equation is incomplete since the 1930s, so “collapse” may simply be another place where it’s *incomplete* that we don’t yet know why. A multiverse does not necessarily follow from this. Maybe pilot wave theory is correct quantum, for all I know.

It might be possible to masturbate over the incredible mathematical regularity of physics in the universe, but beware of the fact that it wasn’t particularly mathematical or regular until we picked out those theories that fit the universe’s behavior very closely. Those theories have predictive power because that is the nature of the selection criteria we used to find them; if they lacked that power, they would be discarded and replaced until a theory emerged meeting the selection criteria. To be clear, mathematical models can be written to describe anything you want, including the color of your bong haze, but they only have power because of their self consistency. If the universe does something to deviate from what the math says it should, the math is simply wrong, not the universe. Every time you find neutrino mass, God help your massless neutrino Standard Model!

Wonderful how the math works… until it doesn’t.

Edit 12-19-17:

We’re still listening to this book during our car trips and I wanted to point out that Tegmark uses an argument very similar to my argument above to suggest why the human brain can’t be a quantum computer. He approaches the matter from a slightly different angle. He says instead that a coherent superposition of all the ions either inside or outside the cell membrane is impossible to maintain for more than a very very short period of time because eventually something outside of the superposition would rapidly bump against some component of the superposition and that since so many ions are involved, the frequency of things bumping on the system from the outside and “making a measurement” becomes high. I do like what he says here because it starts to show the scale that is relevant to the argument.

On the other hand, it still fails to necessitate a multiverse. The simple fact is that human choice is decoupled from the scale of quantum coherence.

Edit 1-10-18:

As I’m trying desperately to recover from stress in the process of thesis writing, I thought I would add a small set of thoughts in this subject in an effort to defocus and defrag a little. My wife and I have continued to listen to this book and I think I have another fairly major objection with Tegmark’s views.

Tegmark lives in a version of quantum mechanics that fetishizes the notion of wave function collapse where he views himself as going against the grain by offering an alternative where collapse does not have to happen.

For a bit of context, “collapse” is a side effect of the Copenhagen convention of quantum mechanics. In this way of looking at the subject, the wave function will remain in superposition until something is done to determine what state the wave function is in… at this point, the wave function will cease to be coherent and will drop into some allowed eigenstate, after which it will remain in that eigenstate. This is a big, dominant part of quantum mechanics, but I would suggest that it misses some of the subtlety of what actually happens in quantum mechanics by trying to interpret, perhaps wrongly, what the wave function is.

Fact of the matter is that you can never observe a wave function. When you actually look at what you have, you only ever find eigenstates. But, there is an added subtlety to this. If you make an observation, you find an object somewhere, doing something. That you found the object is indisputable and you can be pretty certain what you know about it at the time slice of the observation. Unfortunately, you only know exactly what you found; from this –directly– you actually have no idea either what the wave function was or even really what the eigenstates are. Location is clearly an eigenstate of the position operator, as quantum mechanics operates, but from finding a particle “here” you really don’t actually know what the spectrum of locations it was potentially capable of occupying actually were. In order to learn this, the experiment which is performed is to set up the situation in a second instance, put time in motion and see that you find the new particle ending up “there,” then to tabulate the results together. This is repeated a number of times until you get “here,” “there” and “everywhere.” Binning each trial together, you start to learn a distribution of how the possibilities could have played out. From this distribution, you can suddenly write a wave function, which tells the probability of making some observation across the continuum of the space you’re looking at… the wave function says that you have “this chance of finding the object ‘here’ or ‘there’.”

The wave function, however you try to pack it, is fundamentally dependent on the numerical weight of a statistically significant number of observations. From one observation, you can never know anything about the wave function.

The same thing holds true for coherence. If you make one observation, you find what you found that one time; you know nothing about the spectrum of possibilities. For that one hit, the particle could have been in coherence, or it could have been collapsed to an eigenstate. You don’t know. You have to build up a battery of observations, which gives you the ability to say “there’s a xx% chance *this* observation and *that* observation were correlated, meaning that coherence was maintained to yy degree.”

This comes back to Feynman’s old double slit experiment anecdote. For one BB passing through the system and striking the screen, you only know that it did, and not anything about how it did. The wave function written for the circumstances of the double slit provides a forecast of what the possible outcomes of the experiment could be. If you start measuring which slit a BB went through, the system becomes fundamentally different based upon how the observation is made and different things are knowable, giving the chance that the wave function will forecast different statistical outcomes. But, you cannot know this unless you make many observations in order to see the difference. If you measure the location of 1 BB at the slit and the location of 1 BB at the screen, that’s all you know.

In this way, the wave function is a bulk phenomenon, a beast of statistical weight. It can tell you observations that you *might* find… if you know the set up of the system. An interference pattern at the screen tells that the history was muddy and that there are multiple possible histories that could explain an observation at the screen. This doesn’t mean that a BB went through both slits, merely that you don’t know what history brought it to the place where it is. “Collapse” can only be known after two situations have been so thoroughly examined that the chances for the different outcomes are well understood. In a way, it is as if the phenomenon of collapse is written into the outcome of the system by the set-up of the experiment and that the types of observations that are possible are ordained before the experiment is carried out. In that way, the wave function really is basically just a forecast of possible outcomes based on what is known about a system… sampling for the BB at the slit or not, different information is present about the system, creating different possible outcomes, requiring the wave function to make a different forecast that includes that something different is known about the system. The wave function is something that never actually exists at all except to tell you the envelope of what you can know at any given time, based upon how the system is *different* from one instance to the next.

This view directly contradicts the notions in Tegmark’s book that individual quantum mechanical observations at “collapse” allow for two universes to be created based upon whether the wave function went one way or another. On a statistical weight of one, it cannot be known whether the observed outcome was from a collection of different possibilities or not. The possible histories or futures are unknown on a data point of one; that one is what it is and it can’t be known that there may have been other choices without a large conspiracy to know what other choices could have happened and what that gives you is the ability to say is “there’s a sixty percent chance this observation matches this eigenstate and a forty percent chance it’s that one.” Which is fundamentally not the same as the decisiveness which would be required for a collapse of one data point to claim “we’re definitely in the universe where it went through the right slit.”

I guess I would say this: Tegmark’s level 3 multiverse is strongly contradicted by the Uncertainty Principle. Quantum mechanics is structurally based on indecisiveness, while Tegmark’s multiverse is based on a clockwork decisiveness. Tegmark is saying that the history of every particle is always known.

This is part of the issue with quantum computers: the quantum computer must run its processing experiment repeatedly, multiple times, in order to establish knowledge about coherence in the system. On a sampling of one, the wave function simply does not exist.

Tegmark does this a lot. He routinely puts the cart ahead of the horse; saying that math implies the universe rather than that math describes the universe (Tegmark: Math therefore Universe. Me: Universe, therefore Math). The universe is not math; math is simply so flexible that you can pick out descriptions that accurately tell what’s going on in the universe (until they don’t). For all his cherry picking the “mathematical regularity of the universe,” Tegmark quite completely turns his eye to where math fails to work: most problems in quantum mechanics are not exactly solvable and most quantum advancement is based strongly on perturbation… that is approximations and infinite expansions that are cranked through computers to churn out compact numbers that are *close* to what we see. In this, the math that ‘works’ is so overloaded with bells and whistles to make it approach the actual observational curve that one can only ever say that the math is adopting the form of the universe, not that the universe arises from the math.

edit 1-17-18:

Still listening to this book. We listened through a section where Tegmark admits that he’s putting the cart ahead of the horse by putting math ahead of reality. He simply refers to it as a “stronger assertion” which I think is code for “where I know everyone will disagree with me.”

Tegmark slipped gently out of reality again when he started into a weird observer-observation duality argument about how time “flows” for a self-aware being. You know he’s lost it when his description fails to even once use the word “entropy.” Tegmark is under the impression that the quantum mechanical choice of every distinct ion in your brain is somehow significant to the functioning of thought. This shows an unbelievable lack of understanding of biology, where mass structures and mass action form behavior. Fact of the matter is that biological thought (the awareness of a thinking being) is not predictable from the quantum mechanical behavior of its discrete underpinning parts. In reality, quantum mechanics supplies the bulk steady state from which a mass effect like biological self-awareness is formed. Because of the difference in scale between the biological level and the quantum mechanical level, biology depends only on the prevailing quantum mechanical average… fluctuations away from that average, the weirdness of quantum, are almost entirely swamped out by simple statistical weight. A series of quantum mechanical arguments designed to connected the macroscale of thought to the quantum scale is fundamentally broken without taking this into account.

Consider this: the engine of your gas fueled car is dependent on a quantum mechanical behavior. Molecules of gasoline are mixed with molecules of oxygen in the cylinder head and are triggered by a pulse of heat to undergo a chemical reaction where the atoms of the gas and oxygen reconfigure the quantum mechanical states of their electrons in order to organize into molecules of CO2 and CO. After the reorganization, the collected atoms in these new molecules of CO2 and CO are at a different average state of quantum mechanical excitation than they were prior to the reconfiguration –you could say that they end up further from their quantum mechanical zero point for their final structure as compared to prior to the reorganization. In ‘human baggage’ we call this differential “heat” or “release of heat.” The quantum mechanics describe everything about how the reorganization would proceed, right down to the direction a CO2 molecule wants to speed off after it has been formed. What the quantum mechanics does not directly tell you is that 10^23 of these reactions happen and for all the different directions that CO2 molecules are moving after they are formed, the average distribution of their expansion is all that is needed to drive the cylinder head… that this molecule speeds right or that one speeds left are immaterial: if it didn’t, another would, and if that one didn’t still another would and so on and so forth until you achieve a bulk behavior of expansion in CO2 atmosphere that can push the piston. The statistics are important here. That the gasoline is 87 octane versus 91 octane, two quantum mechanically different approaches to the same thing, does not change that both drive the piston… you could use ethanol or kerosine or RP-1 to perform the same action and the specifics of the quantum mechanics result in an almost indistinguishable state where an expanding gas pushes back the piston head to produce torque on the crankshaft to drive the wheels around. The quantum mechanics are watered out to a simple average where the quantum mechanical differences between one firing of the piston are indistinguishable from the next. But, to be sure, every firing of the piston is not quantum mechanically exactly the same as the one before it. In reality, that piston moves despite these differences. There is literally an unthinkably huge ensemble of quantum mechanical states that result in the cylinder head moving and you cannot distinguish any of them from any other. There is literally no choice but to group them all together by what they hold in common and to treat them as if they are the same thing, even though at the basement layer of reality, they aren’t. Without what Tegmark refers to as “human baggage” there would be no way to connect the quantum level to the one we can actually observe in this case. That this particular molecule of fuel failed to react or not based on fluctuations of the quantum mechanics is pretty much immaterial.

The brain is not different. If you were to consider “thought” to be a quantum mechanical action, the specific difference between one thought and the next are themselves huge ensembles of different quantum mechanical configurations… even the same thought twice is not the same quantum mechanical configuration twice. The “units” of thought are in this way decoupled from the fundamental level since two versions of the “same thing” are actually so statistically removed from their quantum mechanical foundation as to be completely unpredictable from it.

This is a big part of the problem with Tegmark’s approach; he basically says “Quantum underlies everything, therefore everything should be predictable from quantum.” This is a fool’s errand. The machineries of thought in a biological person are simply at a scale where the quantum mechanics has salted out into Tegmark’s “human baggage”… named conceptual entities, like neuroanatomy, free energy and entropy, that are not mathematically irreducible. He gets to ignore the actual mechanisms of “thought” and “self-awareness” in order to focus on things he’s more interested in, like what he calls the foundation structure of the universe. Unfortunately, he’s trying to attach to levels of reality that are not naturally associated… thought and awareness are by no means associated with fundamental reality –time passage as experienced by a human being, for instance, has much more in common with entropy and statistical mechanics than it does with anything else, and Tegmark *totally* ignored it in favor of a rather ridiculous version of the observer paradox.

One thing that continues to bother me about this book is something that Tegmark says late in it. The man is clearly very skilled and very capable at what he does, but he dedicates the last part of his book to all the things he will not publish on for fear of destroying his career. He feels the ideas deserve to be out (and as an arrogant theorist, he feels that even the dross in his theories are gold), but by publishing a book about them, he gets to circumvent peer review and scientific discussion and bring these ideas straight to an audience that may not be able to sort which parts of what he says are crap from those few trinkets which are good. I don’t mean that he should be muzzled, he has the freedom of speech, but if his objective is to favor dissemination of scientific education, he should be a model of what he professes. If Tegmark truly believes these ideas are useful, he should damned well be publishing them directly into the scientific literature so that they can be subjected to real peer review. Like all people, this one should face his hubris. The first of which is his incredible weakness at stat mech and biology.

]]>Further in the background, I think it’s clear he was just after a publicity stunt; his do-it-yourself rocket cost a great deal of money, and his conversion to flat eartherism obviously helped to pay the bill. It really did make me wonder what exactly flat earthers think “research” is given that they were apparently willing to pony up a ton of money for this rocket, which won’t go high enough to resolve anything an airline ticket won’t resolve better.

My general feelings about flat earth nonsense are well recorded here and here.

A part of why I decided to write anything about this is that the guy wants to run for congress in California. This should be concerning to everyone: someone who is trusted to make decisions for a whole community had better be doing so based on a sound understanding of reality. Higher positions currently filled in the Federal government not withstanding, a disconnect seems to be forming in our self-governance which is allowing people to unhinge their decision-making processes from what is actually known about the world. I think that’s profoundly dangerous.

In my opinion also, this is not to heap blame on those who actually hold office now, but on everybody who elected to put them there. Our government is both by the people and for the people: anybody in power is at some level representative of the electorate, possessing all the same potentially fatal flaws. If you want to bitch about the government, the place to start is society itself.

Now, Flat Eartherism is one of those pastimes that is truly incredibly past its time. There are two reasons it subsists; the first is people trolling other people for kicks online, while the second is that some people are so distrusting and conspiracy-minded that they’re willing to believe just about anything if it feeds into their biases. There are some people who truly believe it. A part of why people have the ability to believe the conspiracy theories is that what they consider visual evidence of the Earth’s roundness comes through sources that they define as questionable because of their connection to ostensibly corrupt power –NASA, for all its earnest effort to keep space science accessible to the common man, has not been perfect. Further, not just anybody can go to a place where the roundness of the Earth is unambiguously visible given exactly how hard it is to get to very high altitudes over Earth in the first place. For all of SpaceX’s success, space flight still isn’t a commodity that everyone can sample. Travel into space is held under lock and key by the few and powerful.

Knowing and having worked a bit around scientists associated with space flight projects, I understand the mindset of the scientists, and it offends me very deeply to see their trustworthiness questioned when I know that many of them value honesty very highly. Part of why the conspiracy garbage circulates at all is because our society is so big that “these people” never meet “those people” and the two sides have little chance of bumping into one another. It’s easy to malign people who are faceless and its really easy to accuse someone of lying if they aren’t present to defend themselves. That doesn’t mean that either is due. This comes back to my old argument about the constitutionally defended right to spout lies in the form of “Freedom of Speech” being a very dangerous social norm.

Now, that said, another of the primary reasons I decided to write this post is because I saw a Youtube video of Eddie Bravo facing down two scientists and more or less humiliating them over their inability to defend “round eartherism.”

You may or may not know of him, but Eddie Bravo is a modern hero to the teenage boy; he’s another of these podcaster/micro-celebrity types who is widely accessible with a few keystrokes in an environment with basically zero editorial content control. He’s a visible face of the UFC (Ultimate Fighting Challenge) movement along with Joe Rogan. He’s attained wide acclaim for being a “Gracie Killer,” which is a big thing if you know anything about UFC… the Gracies being the renown Brazilian Jiu-Jutsu family who dominated the grappling world early in the UFC and brought the art of Jiu-Jutsu in its Brazilian form to the whole world. From this little history, you can easily guess why Bravo is a teenage boy hero: he’s a brash, cocky bad ass. He’s a world class Jiu-Jutsu fighter, hands down. Unfortunately, as with many celebrities, his Jiu-Jutsu street cred affords him the opportunity to open his mouth about whatever he feels like. Turns out he’s a bit of a crank magnet too, including being a flat earther.

To begin with, I don’t believe Mr. Bravo –or any other crank, for that matter– is stupid. I’ve long since seen that great intelligence can exist in people who for one reason or another don’t know better or choose not to “believe” in something for whatever reason. If he weren’t talented at some level, he wouldn’t be a hard enough worker to develop the acclaim he has attained. But, he conflates being able to shout over whoever he feels like to being able to beat them, which absolutely isn’t true in an intellectual debate.

In the Youtube clip I saw, Mr. Bravo confronts two scientists in a room full of people friendly to him. The first scientist is brought to the forefront where he introduces himself as an “Earth Scientist”… much to the rolling eyes and derision of the audience. Eddie Bravo then demands that he give the one bit of evidence which proves that the “Earth is round.” Put on the spot, this poor fellow then makes the mistake of trying to tell Mr. Bravo that science is a group of people who specialize in many different disciplines, across many different lines of research, and fails to provide Mr. Bravo with a direct answer to his question. It’s true that science is distributed, but by not answering the question, he gives the appearance of not having the answer and Eddie Bravo was completely aware that he’d said nothing to the point! When the second scientist comes forward, Eddie Bravo demands (a poorly worded demand at that, in my opinion) that since most people hold the disappearance of a ship’s mast over the horizon as the “proof” that the world is round, “why was it that people are able to take pictures of ships after they’re supposedly over the horizon?” This second scientist really did step up, I think: he tried to explain that light doesn’t necessarily travel in straight lines (which is true) and that the atmosphere can work like a fiber optic to bring images around the curve of the earth. Mr. Bravo derided this explanation, basically saying “Oh, please, that’s garbage, everybody knows you can’t see around corners.” And, at a superficial level, this will be regarded as a *true* response, despite the fact that the numbers always fall out the bottom of the strainer in a rhetorical confrontation. The second scientist ended up sounding like he was talking over everybody’s head with his too intricate explanation, and Eddie Bravo was able to use that to make him out as “other,” winning the popular argument at that point. Combine these incidents with a lot of shouting over the other guy, and Eddie Bravo came off well…. the video is listed as a “debate,” never mind that it was anything but.

If you are a science educator, I would recommend watching that video. Scientist #1 comes off as stupid and scientist #2 comes off as pompous.

You’ll love me for saying this, but that was all preface to the purpose of this blog post. Most modern flat earthers are Youtube trolls; they castrate their opposition by relying on the fact that evidence of the Earth’s roundness is provided by a source that is intrinsically tainted and questionable. And, the truth is that many people who believe the Earth is round really only understand this fact based on a line of evidence that people like Eddie Bravo *will not accept*. How do you straighten out a guy who *will not accept* the satellite images?

Well, how is it that we know the earth is round? We *knew* it before there were satellites, computer graphics and photoshop. With globalism and information society, these knowable, observable things are amplified. Flat earthers prove they are incompetent researchers every time they open their mouths and say “Well, have you researched it? I did and the earth is flat!”

Now, suppose I was a flat earth researcher, how would I go about the science of establishing the shape of the earth using a series of modern, readily available, cheap tools?

**Hypothesis: The Earth is flat! It’s the stable, unmoving center of the universe and the sun and sky move over it.**

One thing that we can immediately see about this model is a simple thing. When the sun is in the sky, every point on the plane can see it at the same time since there is nothing to obstruct the line of sight anywhere. In the 1800s, nobody could really travel fast enough to be able to tell whether or not this was the case: for every person in that time, it was enough to suppose that everybody on Earth wakes up from the night at the same time and goes about their day. For this flat earth modeled when seen from the side, the phenomenon of sunrise (a phenomenon as old as the beginning of the Earth, by the way) would look like this:

We have all seen this: the sun starts below the edge of the Eastern Horizon and pops up above it. For a majority of people on Earth, this is what the sun seems to do in the morning.

There are a number of simple tests of this model, but the simplest question to ask is this: Does everybody on Earth see the sun appear at the same time? Everybody is standing on that flat plane: when the sun comes up from below the horizon, does everybody *on Earth* see it at once?

Notice, this is a requirement: if the Earth is flat, people all across the plane of the Earth will be able to see something big coming over the edge of that plane almost simultaneously, depending on nearby impediments, like mountains for instance.

So, here’s the experiment! If you live in California, grab your smart phone, buy an airplane ticket and fly to New York. The government has no control at all over where you fly in the continental US of A and they really won’t care if you take this trip. New York, New York is actually a kind of fun place to visit, so I recommend going and maybe catching a Broadway show while you’re there. When you get to New York, find someplace along the waterline where you can look east over the ocean and go there in the morning before sunrise. After the sun rises, wait 30 minutes and then place a phone call back to one of your buddies in California and ask him if the sun is up.

This experiment can be repeated with any two east-west related locations on Earth, though the time delays will depend on the separation so that maybe a half hour is long enough for the sun to rise in both places. Any real flat earth “researcher” should be running this experiment.

For the set-up written above, the sun comes up in New York *four hours* before it actually comes up in California! A California view of the sun is blocked below the horizon of the Earth for four hours after it has become visible in New York.

Now, you might argue, New York is on the east side of the US and is much closer to where the sun comes up on our hypothetical plane, so maybe the Rocky Mountains are obstructing some view of the sun in LA.

And that this blocking effect lasts 4 hours.

So, here’s the new experiment. Drive your car from LA to NY and watch the odometer; you can even get a mechanic you trust to assure you that the government hasn’t fiddled with it. You now know the approximate distance from LA to NY by the odometer read-out. Next, you buy a barometer and use the pressure change of the air to measure how high the Rocky Mountains are… or, you could just use a surveying scope to measure the angular height of the mountains and your car to check distances, then work a bit of trig to estimate the height of the mountains.

The Rockies are well understood to be just a bit taller than 14,000 ft.

With these distances available, you do the following experiment with surveying scopes. When the sun appears above the horizon in LA, your friend measures the angle above ground level where it is visible (surveying scopes have bubble levels for leveling the scope). You measure the angle above the horizon at the same time using a survey scope of your own in New York. Remember, you’ve got smartphones, you can talk to each other and coordinate these measurements.

For the flat earth, the position of the sun in the sky should obey the following simple triangular model:

This technique is as old as the hills and is called “triangulation.” Notice, I’ve used three measurements made with cheap modern equipment: angle at LA, angle at NY and the distance from LA to NY (approximate from the odometer). What I have in hand from this is the ability to determine the approximate altitude of the sun using a bit of high school level trig. Use law of sines and it’s easy to forecast the altitude of the sun from these measurements:

I won’t do the derivation this once, but you just plug in the distance and the angles, then voila, the height of the sun over the flat earth. (I’m not being snide here: Flat Earthers don’t even seem to try to use trig.)

What we know so far is that the sun comes up four hours earlier in New York than LA and that we would expect that the sun should be visible everywhere on the flat earth at the same time as it comes over the horizon. Maybe the Rockies are blocking LA from seeing the sun for four hours. This would give rise to the following situation:

You end up with similar triangles formed by the triangle of LA to the Rocky Mountains and the triangle of LA to the sun. Knowing the height of the mountains and the distance from LA to the mountains, you get the angle that the sun must be at when it appears in LA. This gives us a relation where the angle from LA to the top of the mountains must be the same as the angle from LA to the sun when it appears. We would expect the angle to be very small since the Rockies are really not that high, so finding it nearly zero to within the noise of the instrument would be expected.

Now, LA to New York is about 2,800 miles and the distance from LA to Denver is 1,020 miles. The mountains are 14,000 feet tall. In four hours of morning, from New York, the sun will appear to be at an angle of ~60 degrees over the horizon (neglecting latitude effects… leave that for later). If you start plugging these figures into equations, the altitude of the sun must be 7.3 miles up in the sky, or 38,500 ft.

Huh.

You can fly at 40,000 ft in an airliner. Easy hypothesis to test. If the sun is only 7.3 miles up and visible at 60 degrees inclination in New York, you could go fly around it with an airplane.

Has anybody ever done that?

A good scientist would keep looking at the sun through the whole day and might notice that the angular difference of the sun’s inclination observed in the spotting scopes at New York and in LA does not change. Both inclinations increase at the same rate. There is always something like 60 degree difference in inclination in the sky from where the sun rose between these two places (again, neglecting latitude effects; this argument will appear a tiny bit janky since New York and Los Angeles are not at the same latitude, but the effect should be very close to what I described).

For this flat earth model to be true, the sun would need to radically and aphysically change altitude from one part of the day to the next in order for the reported angles to be real. We know with pretty good accuracy that the sun does not just pop out of the Atlantic ocean several dozen miles off the coast every morning when it rises over the United States, whatever the flat earthers want to tell you. And, this is pretty much observable without any NASA satellites. Grab yourself a boat and go see! The other possibility is that the sun is much further away than 7 miles and that the physical obstruction between LA and New York is much larger than just the height the Rocky Mountains over sea level –and also maybe that the angles on the levels of the spotting scopes somehow don’t agree with each other.

For this alone, the vanilla flat earth model must be discarded. You cannot validate any of the predictions in the model above: LA and New York do not see the sunrise at the same time and the sun clearly is not only 7 miles high in New York. To give them some credit, most modern flat earthers, including Eddie Bravo, do not subscribe directly to this model.

For a point, I would mention that every flat earth model struggles with the observable phenomenon of time zones and jet lag. If any flat earther ever asks you what convinced you of a round Earth, just say “Time Zones” in order to forestall him or her and to not look like you’re avoiding the question. Generally speaking, time zones exist because the curve of the Earth (something that flat earthers claim shouldn’t exist) obstructs the sun from lighting every point on the surface of the Earth at the same time.

So then, now that we’ve made basically two tests of a flat earther hypothesis and seen that it fails rather dramatically in the face of simple modern do-it-yourself measurements, what model do these people actually believe in?

Most modern flat earthers believe in some version of the model above (one of the major purveyors of this is Eric Dubay. I won’t link his site because I won’t give him traffic.) In this model, you can think about the Earth as a big disc centered on an axle that passes through the north pole. The sun, the moon and the night sky spin around this axle over the Earth (or maybe the Earth spins like a record beneath the sky). The southern tips of South America, Africa and Australia are placed at extreme distances from one another and Antarctica is expanded into an ice wall that surrounds the whole disc. The model here is actually not a new one and originated some time in the 1800s.

For the image depicted here, I would point out once again that if the sun is an emissive sphere, projecting light in all directions, the model above gives a clear line of sight for every location on Earth to see the sun at all times. For this reason, the flat earthers usually insist that the sun is more like a flashlight or a street lamp which projects light in a preferred direction so that light from it can’t be seen at locations other than where the light is being projected (never mind that this prospect immediately begins to suffer for trying to generate the appropriate phases of the moon).

To generate this model, the flat earthers have actually cherry-picked a few rather interesting observations about the sky. You can find a Youtube video where Eddie Bravo tries to articulate these observations to Joe Rogan. Central among them is that the North Star, Polaris, seems to not move in the night sky and that all the stars and even the sun seem to pivot around this point. In particular, during the season of white nights above the arctic circle, the sun seems to travel around the horizon without really setting (never mind that during the winter months, the sun disappears below the horizon for weeks on end… again with that pesky horizon thing; on the flat earth, the sun is not allowed to drop below the horizon and still be visible elsewhere on the same longitude since that intrinsically implies that the Earth’s surface must curve to accomplish said feat).

Taken from Scijinks.gov, this image demonstrates the real observation of what the sun does during the season of white nights as viewed at the arctic circle. The flat earth model amplifies this into the depiction given above.

If this is our hypothetical model, we could say that the sun is suspended over the flat Earth so that it sits on a ring at the radius of the equator in its revolution around the pole.

This image shows you right away the first thing to test. As seen at a distance of 3/4 of the disc’s diameter away, the sun cannot ever be seen in the sky at a lower angle of inclination than is allowed by its altitude over the surface. In other words, it can never go down below the horizon or come up over it.

Here, theta is the minimum angle of inclination that the sun will visit in the sky. I’ve heard flat earthers quote ~3,000 miles for the height of the sun and the absolute length of the longitude would be (3/4)*24,000 miles = 18,000 miles, which gives a minimum inclination angle of about 9 degrees over the horizon. And, that’s seen from the maximum possible distance across the width of the disc, where the flat earthers claim the sunlight can’t be seen. As a result, the sun will always have to *appear* in the sky at some inclination greater than 9 degrees –just suddenly start making light– at the time when the sun supposedly rises.

The truth of that is directly observable: do you ever see the sun just appear in the sky when day breaks? I certainly haven’t.

This failure to ever reach the horizon mixed with the requirement for time zones is enough to kill the flat earth model above: it can’t produce the observations available from the world around us that can be obtained with just the tiniest bit of leg work! The model can’t handle sunrises (period). There’s a reason that the round earth was postulated in 2,500 BC; it’s based on a series of clever but damn easy measurements. And I reiterate, those measurements are easier to make with modern technology.

It is inevitable that this logic won’t satisfy someone. The altitude number for the sun, 3,000 miles, was cribbed from flat earth chatter. Suppose that this number is actually different and that they don’t actually know what it is (surprise, surprise, I don’t think I’ve ever seen evidence of any one of them doing something other than making YouTube videos or staring through big cameras trying to see ships disappear over the horizon and not understanding why they don’t. Time to get to work, guys, you need to measure the altitude of the sun over the flat earth or you’ll all just keep looking like a bunch of dumbasses staring at tea leaves!)

Now, then, in some attempt to justify this model, a measurement needs to be made of the altitude of the sun (again). You can do it basically in the same way you did it before; you mark out a base length along the surface of the Earth and station two guys with surveying scopes at either end: you count “1,2,3” over the smartphone and then both of you report the angle you measure for the inclination of the sun. In this case, I recommend that one guy be stationed south of the equator and the other guy stationed north, both off the equator by the same distance along a longitude line. The measurement should be made on either the Vernal or Autumnal equinox and it should be made at noon during the day when the sun is at its highest point in the sky. This should make calculations easier by producing an isosceles triangle. How do you know you’re on the same longitude line? The sun should rise at the same time for both of you on the equinox. And, I specify equinox because I would rather not get into effects caused by the Earth’s axial tilt, like the significance of the tropics of Cancer and Capricorn (you want to know about those, go learn about them yourself).

From this measurement how do you get the height of the sun? You use the following piece of very easy trig:

And, note, this trig will not work unless both angles measured above are the same… but you can orchestrate this with a couple spotters, an accurate clock and a couple surveying scopes.

If you do this very close to the equator, where d is small, you will find that the sun is at some crazily high altitude. You may not be able to distinguish it because of the sizeable angular width of the sun, but it will be very high… in the millions of miles. This by itself will push the minimum allowed angular height of the sun up, not down, because it’s larger than what was taken for the calculation above. To handle the horizon problem where the sun can only appear to be higher than about 9 degrees in the sky and never cross the horizon, the height of the sun must be lower than 3,000 miles, not higher. Humans were unable to do this calculation in prehistory and used a different set of triangles to try to estimate the height of the sun.

If you are a good scientist, you will repeat this measurement a number of times with different base distances between the spotters. If the Earth is flat, every base length you choose between the spotters should produce the *same* height for the sun (this is an example of the scientific concept of Replication).

Here’s what you will actually find:

At a latitude close to the equator, during the first measurement, the sun will appear to be very far away at a really high altitude. With the second measurement, at mid latitudes on either side of the equator, the sun will appear to be at a significantly lower altitude. During the final measurement, at distant latitudes, as far north and south as you can get, the sun will appear to actually sit down on the face of the Earth. If you coordinate this experiment with six people on group chat all at once, this is what they will all see *simultaneously*. Could I coordinate the measurement locations so that the sun appears to be 3,000 miles high? Sure, but who in the hell would ever take that as honest? Flat earthers blame scientists for being dishonest… what if the *flat earthers* are the ones being dishonest? Does it not count for them somehow?

Since the sun suddenly appears to be speeding toward the Earth, does this mean that it’s about to crash down onto the experimenters you have stationed at the equator? No. It just means that your model is completely wrong because it hasn’t produced a self-consistent measurement. A mature scientist would consider the flat earth a dead hypothesis at this point.

Why does the round earth manage to succeed at explaining this series of observations? For one thing, the round earth doesn’t assume that the spotting scopes are stationed at the *same* angular level.

The leveling bubble on the spotting scope can only assume the local level. And, the angle that you end up measuring is the one between the local horizon and the sight line. On the equinox (very important) the sun will only appear to be directly overhead at noon on the equator.

If you’re still unconvinced that the flat earth is a dead hypothesis which doesn’t live up to testing and continue to focus on strange mirages seen over the surface of the ocean on warm days as evidence that the round earth can’t be right, consider the following observations.

Flat earthers use Polaris as the pivot around which the sky spins. Why is it that Polaris is not visible in the sky from latitudes south of the equator? Why is it that the Southern Cross star constellation is not visible from the northern hemisphere? Eddie Bravo, as a Gracie hunter, surely must have visited Brazil: did he ever go outside and *look* for the north star during a visit? Pending that, did he look for the Southern Cross from Las Vegas?

Flat earthers use the observation that the stars in the sky rotate *counterclockwise* around Polaris as evidence that the sky is rotating around the disc of the Earth. Have they ever gone and observed at night from the tip of Argentina in South America that the sky seems to rotate *clockwise* around some axis to the *south*? How can the sky rotate both clockwise and counterclockwise *at the same time*? In the flat earth model, it can’t, but in reality, it does! As an extension, why in the hell does the sun come straight up from the east and set straight in the west on equinox at the equator? When seen at the North Pole, on equinox day, *simultaneously*, the sun rolls around the horizon at the level of the ground and never quite rises. Use your smartphone and take the trip to see! Send a friend to Panama while you go to Juneau Alaska and talk on the smartphone to see that it happens this way in both places at once.

Don’t take my word for it, go and make the observations yourself!

How is this all possible?

I’ll tell you why.

It’s because flat earthers never test the models they put forward with the tools that are at their flipping fingertips. “Flat Earth ‘Research'” my ass.

Do I need NASA satellite pictures or rocket launches to know that the Earth is round? Pardon my french, but Fucking hell, no! Give me the combination of time zones with the fact that the sun actually pops up over the horizon when it rises and your ass is grass. Flat earth models can’t explain these observations simultaneously, they can only do one or the other.

Edit 11-28-17

Yeah, I have a tiny bit more to say.

If all of what I’ve said still does not convince you, likely you’re hopeless. But, here’s a comparison between what the sun does in the sky over the disc shaped flat earth and what it actually does.

Here’s how the sun travels across the sky on the disc-shaped earth:

Here’s what the sun really does depending on latitude:

This particular set of sun behaviors in the sky is actually visible year round, but the latitude where the sun travels from East, straight over the apex, to West varies North to South depending on the season when you look. At equinox, the observation is symmetric at the equator, but it shifts north and south of there as the months move on, producing the same general pattern above. In the winter, the axial tilt of the Earth prevents the sun from rising over the north pole –*ever*– while the same is true at the south pole during the summer of the northern hemisphere. Flat earthers seem to never make any observations about what happens in the sky to the sun south of the equator. Do they not go to Australia or South America to take a look?

As an extra, I have made the mistake of rooting through Eric Dubay’s “200 proofs” gallop. I once even thought about writing a blog post about the experience, but decided it was too exhausting. For one thing, quantity does not assure quality. Many of the 200 proofs are taken from accounts of 19th century navigation errors, and one must wonder whether such accounts hold as valid in the 21st century world. Further, some of the proofs are simple, flat out lies: among the proofs is an exhaustive observation of the lack of airline flight routes in the southern hemisphere, twisting route information to show that flights must pass through the northern hemisphere to reach destinations as far separated as the tip of South America and the tip of South Africa, which simply ignores the fact that flight routes exist for these destinations that do not go to the northern hemisphere. Are there more flight routes in the Northern hemisphere than in the southern hemisphere? Yes, most of the human population lives at or north of the equator… most of the places anybody would *want* to go are in the northern hemisphere. If you doubt that such a flight route exists, go to the Southern hemisphere and take an airline flight from Argentina to South Africa and use a stopwatch during the flight to see if it’s a fraction of the length Dubay would claim –commerical airline jets have a known flight profile that would be impossible to hide; the rate at which they cross distance is well-characterized. Did Dubay do this experiment? Nope. What should stun a person about Dubay is that he does not merely make wrong claims, it’s that he repeats the same wrong claims 60 times in a row to an audience that not only fawns over it, but fails to point out the giant logical gaps that are detailed above. How hard is it to see that you not only need to cope with time zones, but with sunrises too?

Pointing out a tiny detail, like not understanding how mirages work on the surface of the ocean, does not somehow validate a model that can’t handle the big ticket items, like time zones and sunrises. It only shows that you can’t understand how the small details work. I can also sort of understand that people are losing touch with the world around them as they grow more and more entrenched in the online world, but if you fail to understand that the online world does not dictate the physics of the real world, you are in big trouble.

(Edit 3-26-18:)

The steam rocket dude finally shot himself 1,800 ft into the air. Oh yeah, and “flat earth and stuff.” Tell me again how his little stunt was supposed to test anything. His interest was in launching himself in a steam powered rocket, it had nothing to do with finding out the roundness (or lack thereof) of the Earth.

If you vote for him for Governor, you deserve what you get.

For anybody actually interested in a test that did something, check this out. For the record, there are aberrations to the lenses here which do effect exactly what you see along the edges of the image, but ask yourself how the rocket can appear straight while the background appears curved. Further, if you doubt it, that test is something that can be done by someone with the limo driver’s means.

]]>Edit: 11-16-16

There was a slight error in the set-up of the center of mass calculation. Light appears to move an effective mass from -L/2 to L/2-l, the starting and stopping points of light. Here’s the reformulation to capture that.

Time spent on this derivation: ten minutes on the toilet last night and ten minutes before breakfast writing the correction (yeah, that’s what I get for going really fast).

Don’t thank me, this is Einstein’s calculation.

Edit: 11-18-17

As I’m still thinking about this post, I figure it might be beneficial to flesh out the reason that it was written. This post was used as a response to a comment on another blog… if you want to back up a statement you made about something and somebody is accusing you of not providing evidence, most people provide citations, net links, references, etc. In this particular case, the argument was about a piece of math and I was being accused of lying about said piece of math by someone who clearly likes to believe he knows everything without actually knowing practically anything. Yes, skeptics are guilty of Dunning-Kruger, just like everyone else (This is an unfair statement and I apologize for it.) What better way to slam the textbook in someone’s face than to actually work the problem? If you want the final word on what Einstein said about something, quote Einstein’s work! And so, a piece of Einstein’s work is posted above.

The argument in question started with a fellow suggesting to me that mass-energy equivalence can be derived but not proven with classical physics. I beg to differ; energy is a classical concept, from Thermo, E&M and classical mechanics… all three! You don’t need relativity or quantum mechanics to justify statements about how energy works; measurements of kinematics and force are sufficient to show that energy as a concept works. Mass-energy equivalence arose from Einstein’s notion that the newly completed *classical* field of electromagnetism must be consistent with the older fields of classical mechanics. The equation E=pc is not relativistic: it came directly out of electromagnetism (and, believe me, I’ve been through that calculation too because I didn’t believe it at first.) Imposing that these two fields must be cross-consistent is the origin of mass-energy equivalence…. light carries momentum (by Poynting’s vector and well defined in the electromagnetic stress-energy tensor) and light interacts with mass, therefore conservation of momentum (and consequently conservation of center of mass in absence of external forces) requires that light carry an equivalent of mass in order for forces to add up in a situation where light interacts with matter but no forces interact externally on the system comprised by the light and the matter. Mass-energy equivalence is required by this, no ifs, ands, buts or “yeah, but you didn’t proves…”

Einstein’s thought experiment validating this set-up is an exceptionally elegant one. It’s called “Einstein’s Box.” Everybody loves Schrodinger’s cat-in-a-box… well, Einstein had a box too and this box is older than Schrodinger’s. Einstein’s box is a closed box sitting out in space where it feels no external forces. A flash of light is emitted inside the box from one wall and travels across the box to strike the opposite wall. E&M states that light must carry momentum. If the system has no external forces acting on it, the emission of the light inside the box requires that momentum of the system be conserved, which requires that the box recoils with a momentum equal to that carried by the light, causing the box to move at some velocity consistent with the momentum carried by the light (which turns out to be directly proportional to the energy carried by that light as stated by E=pc). Net momentum of the system must remain zero by conservation of momentum. When the light travels across the box and collides with the opposite wall, the momentum of the light cancels the momentum of the box and the box stops moving. Thing about this is that center of mass, as a consequence of momentum conservation, could not have moved. No forces on the outside of the box.

Center of mass is a damn well classical concept, well worked out in the 1700s and 1800s… and since the box moved, the box’s center of mass moved! But no forces acted on the outside of the system, so the overall center of mass of the system could not have moved. This requires the light to have carried with it a value of mass, taken from the location where the light was emitted and deposited again at the location where the light was absorbed. But light is known not to carry mass since it is a wave-like solution of immaterial fields in the form of Maxwell’s equations. If you set up this situation and work through the calculation, Newtonian mechanics and electromagnetism –nothing more–, this turns out the *classical* requirement that energy and mass have an equivalence in the form of E=mc^2. No quantization or probability relations from quantum mechanics, no frame of reference shifting from relativity, not even delineating that light is some package of photons… this is purely classical. Moreover, energy is a tabular result to begin with: it is not something that is by itself ever *directly* observed and it must always be carried by something else, a field, a heat, a potential, a motion or what have you. The statement that this tabular relationship extends to something else that is technically only indirectly observed, mass, is a proof. And yes, mass is indirect since you can only know mass from weight, which is a force!

If you have concepts of weight, momentum and light together in the same model as expressed by classical physics, mass-energy equivalence is required for self-consistency.

Granted, special relativity quite naturally produces this result as well, but special relativity is not required to produce mass-energy equivalence. Had Einstein not discovered it, someone else damned well would’ve and it would not have required relativity to do –at all!

Now, the thing that doubly made me angry about this conversation is that it was with a fellow who absolutely craved physicist street cred: he name dropped Arxiv and seemed to want to chase around details. Sadly, his whole argument ultimately amounted to insulting someone and not backing up his ability to absolutely *know* what he was claiming to know. Does it matter that you don’t believe my statement if you aren’t competent to evaluate the field in question? Not at all: such a person has no place at the table to start with. This is why it’s possible for a Nobel Laureate to descend in to crankery… just because you have a big prize doesn’t mean you are always equally competent at everything! I’m guessing the guy was a surgeon given the ego and the blog, but if he was a physicist, I’m very disappointed. A physicist who doesn’t know Einstein’s box is a travesty. I’m not the greatest physicist that ever lived, but I work at it and I know what I’m talking about… where I have gaps, I do my best to admit it.

edit: 11-18-17

(Statement redacted. It was an unfairly insulting comment)

edit: 11-20-17

As this is still nagging at me, one further thought. What I consider the last statement of the conversation before it simply became obvious trolling, the fellow accused me of not including “a variable speed for light” in my calculation. In Einstein’s calculation, the speed of light is given the constant “c.” This is a constant which comes with a caveat; “c” is the speed of light when it is not passing through anything material, the speed of light in a vacuum. This distinction is important because light can travel at lower speeds when it’s passing through a material. This situation is well-handled by E&M and is considered a “solved effect” by Relativity, whose postulates include the explicit notion that E&M simply be true everywhere. The constant “c” is the maximum possible speed that light can travel, but it will travel at lower speeds in a medium with an index of refraction greater than 1, where permittivity and permeability might have values other than their vacuum values, which has the wonderful result of making lenses possible in glasses and microscopes. In my lavatory derivation above, a little screw up on my part is that I didn’t clobber the reader over the head with the constancy of the value “c,” I said “a box in zero gravity” and I said light travels at “c,” but I didn’t say “this is definitely all in a vacuum” which I probably should have. If index of refraction is “n”… the velocity of light in a medium with that refractive index is v = c/n. There are other ways to encode refractive index which allow for more sophisticated optical behaviors, but everything in that line is completely out of the pail for the argument in question, and drawing attention to it is simply chaff intended to shift the focus of the argument.

Light can travel at speeds lower than “c,” but “c” itself is so far found to be invariant. Moreover, the fact that light can travel at speeds other than “c” does not change the Einstein’s box derivation, which is set in explicit conditions where light would travel at “c.” Somebody who doesn’t know this isn’t a physicist (11-20-17: I’ll moderate this it’s unfair and was too angry.)

Also, as an aside, I mention above that Special Relativity can produce E = mc^2. Thinking about it, but not running through the calculations, I think this is actually backward; E=mc^2 is sort of needed first before it shows up in Special Rel. Einstein made some amazing leaps.

Edit: 11-20-17

As an added extra, here is a derivation of E=pc from the stress-energy and electromagnetic power continuity equations. These were written a few years ago, but I had the good sense to scan them:

The E=pc derivation begins on the second page above. The first page is the end of the continuity equation derivation. I’ll neglect that. No relativity here, just pure E&M. There are a couple pieces in here that I don’t remember so well and I need to think about to decide if they’re correct. The first page is included to show clearly the relation between force and the stress-energy tensor divergence.

Edit: 11-21-17

I’ve spent some time thinking about the form Narad put forward in the comments.

First of all, we have to be really sure of what is meant by “p” on the left side of this equation. My first reading of it was as “momentum,” but I’m realizing that it isn’t, and this may be leading to some misunderstandings about what is meant by E=pc. The thing in the middle is average poynting vector divided by speed of light… Poynting vector has units of Watts/meter^2 and speed of light has units of meters/sec, which works out to Newtons/m^2, or force per area, which is pressure, not momentum. The thing on the right is actually in units of energy… permittivity times peak E-field^2 over 2, which is just a form of electromagnetic energy, in units of Joules. For a literal reading of the equation above, unit analysis put me at momentum = pressure = energy, which is not right (apple can’t equal orange can’t equal pear). If I take “p” as pressure rather than momentum, the left side makes sense, but the right side still doesn’t quite work.

It’s a nice try, all the elements are there. It has energy and momentum can be massaged out of it. I think the route being taken here is to try to use the form of a plane wave to figure out the momentum based on the pressure and specifically for a plane wave form of the poynting vector, or else the peak E-field intensity wouldn’t be needed.

The approach in the E=pc derivation I posted above is *really* different. My starting point is with a classical structure called the Electromagnetic stress-energy tensor and with a second structure which is conservation of power given energy flux. (Wikipedia actually kind of pissed me off about this: they want to masturbate over the four-dimensional relativistic version, but wouldn’t provide me a clean on-line citation for the classical version shown above; the form given here is the same as it appears in Jackson E&M) The first equation is a consequence of the Lorentz force law (F = qE+qvxB) where the system has electromagnetic waves, but is sealed so that there is no net force… the equation says that the change in Poynting vector per unit time is equal to the divergence of the electromagnetic stress-energy tensor, all of which is in units of force or change of momentum with time. The second equation is a consequence of Power=current*voltage, believe it or not, and just says that the change in energy density in the system is equal to the divergence of the Poynting vector, all in units of power. These structures make no real initial assumptions about the form that the electromagnetic fields are taking, they speak only of change of momentum per time and change of energy density given energy flux and are derived directly from application of Maxwell’s laws.

The first step is to take the stress-energy continuity relation and to hold it as change in Poynting vector with time is equal to change in momentum density with time by direct application of Newtonian force. You end up with an expression that says that Poynting vector is equal to momentum density times speed of light squared.

The second step is to throw this Poynting vector relation into the power equation so that you get a relation that says that the momentum flux out of a volume of space is equal to the change of energy density with time. This gives you a “momentum current vector” equation, which is analogous to the relationship between electrical current “I” and current vector “J.”

I next establish a momentum current, basically just a beam of light with no specific frequency or field configuration. You could write this as white light in a Fourier composition. A set of very simple manipulations gets you to a relation that directly says that energy density is equal to momentum density times speed of light. Integrate out the density and you get E=pc directly. Please note, this set-up is explicitly agnostic on the idea of photons since it depends on a mixture of frequencies to produce a constant envelope of plane waves with constant momentum density distributed everywhere and therefore does not require quantum mechanics to work. I can’t claim this work is Einstein’s because I didn’t follow anyone to make it… this is me using Jacksonian E&M technique to prove E=pc for myself, all using classical physics.

With E=pc in hand by these means, the classical derivation of E=mc^2 is pretty much a shoe-in. Again, I used no quantum and no relativity. If I could do this, the geniuses at the turn of the century got it faster;-)

Edit: 11-22-17

I must’ve done something wrong with the Latex, it doesn’t seem to want to render in the body of my post; I’m still looking into whether I need to get the plugin…

Further, I figured out what was wrong with the unit analysis I did above… the right side of that equation is energy density (J/m^3) rather than energy (J)… and since J =N*m, J/m^3 is N/m^2…. the equation above is all in units of light pressure. To get to E = pc in approximate form in the plane wave, you just need to sub in the relation for momentum *density* per Poynting vector S = pc^2, then cancel the density by integrating over volume.

One additional thing about the Einstein’s box derivation that is important; it works in a classical framework. What I’ve provided above, then, is E=mc^2 as a classical equation, which is really torturing the point that it was “proven.” I’ve been thinking about whether or not I was doing this right since the whole discussion started and the derivation is only consistent from the standpoint that there are no effects included taking into account the potential relativistic characteristics of the box as it moves. I’m sorry about that, Narad. The derivation above would be insufficient from a modern physics standpoint because the box would undergo length contractions and dilations as it moves. To be perfectly honest, this nagged at me a tiny bit as I wrote the derivation, but maybe not as much as it should have… I drew the box as strictly “before” and “after” so that I ended up looking at the system only when it is located in the inertial frame of reference. That would call into question the nature of the boost pushing it into motion. I was assuming that the completely undisclosed relativistics located between the end-points were sufficient to conspire that the end-points be right! And, that’s an open end since length contraction would place the wall of the box in a different location depending on the frame… throwing off the whole calculation.

(For the people at home, here is something very important about how I designed to write this blog. I leave my edits visible so that the progression of my thinking is clear… one of the hardest, most human aspects of working in sciences is facing the fact that nobody is always right about everything. I think that being a good scientist is not about being right all the time, but about changing your mind when it’s important to do so. And, it’s about admitting when someone else was right, sometimes very publicly! Are you smart if you’re unwilling to abandon a sinking ship? I think not. Smart is being able to turn the steering wheel and to grow when its necessary to do so –especially when it effects your pride. I think this is the difference between arguing loudly and arguing productively.)

Here is the derivation converting the light pressure equation Narad offered into E=pc…

Hopefully that ties up all the loose ends! (Don’t be surprised to see me back here playing with a relativistic E=mc^2 proof at some point.)

]]>This post is intended to help recover a tiny fraction of the since-destroyed post I originally entitled “NMR and Spin Flipping part II.” I have every intention to reconstruct that post when I have time, but I decided to do it in fragments because the original loss was 5,000 words. I don’t have time to bust my head against that whole mess for the moment, but I can do it in bits, I think.

One section of that post which stands pretty well as a separate entity from the NMR theme was the fraction of work where I spent time deriving a version of the time dependent Schrodinger equation in the interaction picture.

I thought I would go ahead and expand this a little bit and talk generally about some of the basic structural features of non-relativistic quantum mechanics. Likely, this will mostly not be very mathematical, except for the derivation at the end. I’ll warn you when the real derivation is about to start if you are math averse…

Everybody has heard about Schrodinger’s cat. Poor cat is dragged out and flogged semi-dead, semi-alive pretty much any time anybody wants to speak as if they know something about “quantum physics.” The cat might be the one great mascot of quantum in popular culture. The kitty drags with it a name that you no doubt have heard: Erwin Schrodinger, the guy who first coined the anecdote of feline torture as an abstraction to describe some features of quantum mechanics on a level that laymen can embrace, if not totally understand. This name is immediately synonymous with the spine of quantum mechanics as the Schrodinger equation. This equation is not so simple as E = mc^2 or F = ma, but it is a popular equation…

I’ve included it here in its full-on psi-baiting time-dependent form with Planck’s constant uncompressed from ħ.

You hardly ever see it written this way anymore.

All this equation says is that the sum of kinetic and potential energy is total energy, which is tied implicitly to the evolution of the system with time. This equation is popular enough that I found it scrawled on a wall along with some Special Relativity inside the game “Portal 2” once. Admittedly, the game designers used ħ instead of h for Planck’s constant. It may not look that way, but the statement of this equation is no more complicated than F = ma or E = mc^2. It just says “conservation of energy” and that’s pretty much it.

Schrodinger’s equation is the source of wave mechanics, where Psi “ψ” is the notorious quantum mechanical wave function. If you care nothing more about Quantum mechanics, I could say that you’ve seen it all and we could stop here.

The structure of basic quantum mechanics has a great deal to it. Schrodinger’s equation tells you how dynamics happens in quantum physics. It says that the way the wave equation changes in time is tied to some characteristics related to the momentum of the object in question and to where it’s located. Structurally, this is the foundation of all non-relativistic quantum mechanics (I say “non-relativistic” because the more complete form of the Schrodinger equation competent to special relativistic energy is the Klein-Gordon Equation, which I will not touch anywhere in this post.) Pretty much all of quantum mechanics is about manipulating this basic relation in some manner or another in order to get what you want to know out of it. Here, the connection between position and momentum as well as between energy and time hides the famous “uncertainty relations,” all built directly into the Schrodinger equation and implicit to its solutions.

One thing you may not immediately know about Schrodinger’s equation is that it’s actually a member of a family of similar equations. In this case, the equation written above tells about the motion of an object in some volume of space, where the space in question in literally only one dimensional, along an effective line. Another Schrodinger equation (as the one written in this post) expands space into three dimensions. Still other Schrodinger equation-like forms are needed to understand how an object tumbles or rotates, or even how it might turn itself inside out or how it might play hopscotch on a crystalline lattice or bend and twist in a magnetic field. There are many different ways that the functional form above might be repurposed to express some permutation of the same set of general ideas.

This tremendous diversity is accomplished by a mathematical structure called “operator formalism.” Operators are small parcels of mathematical operation that transform the entity of the wave function in particular ways. An operator is sort of like a box of gears that hides what’s going on. You might fold down the gull-wing door in the equation above and hide the gears in an operator called the “Hamiltonian.”

This just shuffles everything you don’t care about at a given time under the rug and lets you work overarching operations on the outside. Operators can encode most everything you might want. There are a ton of rules that go into the manipulation of operators, which I won’t spend time on here because it distracts from where I’m headed. A hundred types of Schrodinger equation can be written by swapping out the inside of the Hamiltonian.

An additional simplification of operators comes from what’s called “representation formalism.” The first Schrodinger equation I wrote above is within a representation of position. Knowing about the structure of the representation places many requirements which help to define the form of the Hamiltonian. I could as easily have written the same Schrodinger equation in a representation of momentum, where the position variable becomes some strange differential equation… momentum is in that equation above, but you would never know it to look at because it’s in a form related to velocity, which is connected back to position, so that position and time are the only variables relevant to the representation. By backing out of a representation, into a representation free, “abstract form,” operators lose their bells and whistles while wave functions are converted to a structure called a “ket.”

Ket is short for “Bra-Ket,” which is a representation free notation developed by Paul Dirac, another quantum luminary working in Schrodinger’s time. A “bra” is related to a “ket” by an operation called a “conjugate transform,” but you need only know that it’s a way to talk about the wave equation when you are not saying how the wave equation is represented. If you’ve dealt with kets, you’ve probably been in a quantum mechanics class… “wave function” has a place in popular culture, “ket” does not.

Most quantum mechanics is performed with operators and kets. The operators act on kets to transform them.

One place where this general structure becomes slightly upset is when you start talking about time. And, of course time is needed if you’re going to talk about how things in the real world interact or behave. The variable of time is very special in quantum mechanics because of how it enters into Schrodinger’s equation… this may not be apparent from what I’ve written above, but time is treated as its own thing. Schrodinger’s equation can be rewritten to form what’s called a time displacement operation.

You might take a breath, derivation begins here….

This is just a way to completely twist around Schrodinger’s time dependent equation into a ket form where the ket now has its time dependence expressed by a time displacement modulated by the Hamiltonian. I’ve even broken up the Hamiltonian into static and time dependent parts (as this will be important to the Interaction Picture, down below). The time displacement operation just acts on the ket to push it forward in time. The thing inside the exponential is a form of quantum phase.

This ket is an example of a “state ket.” It is the abstract representation of a generalized wave function that solves Schrodinger’s time dependent equation. A second form of ket, called an “eigen ket,” emerges from a series of special solutions to the Schrodinger equation that have no time dependence. An eigen ket (I often write “eigenket”) remains the same at all times and is considered a “stationary solution” to the Schrodinger equation. “Eigen solutions” tend to be very special solutions in many other forms of physics: the notes on your flute or piano are eigen solutions, or stationary wave solutions, for the oscillatory physics in that particular instrument. In quantum mechanics, eigen modes are exceptionally useful because any general time dependent solution to the Schrodinger equation can be fabricated out of a linear sum of eigenkets. This math is connected intimately to Fourier series. The collection of all possible eigenket solutions to a particular Schrodinger equation forms a complete description of a given representation of that Schrodinger equation, which is called a Hilbert space. You can write any general solution for one particular Schrodinger equation using the Hilbert space of that equation. A particular eigenket solves the Hamiltonian of a Schrodinger equation with a constant, called an eigenvalue, which is the same as saying that an eigenket doesn’t change with time (producing Schrodinger’s time-independent wave equation).

This is just the eigenvalue equation for the stationary part of the Hamiltonian written above, which could be expanded into Schrodinger’s time *independent* equation.

Deep breath now, this dives into Interaction Picture quickly.

How quantum mechanics treats time can be reduced in its extrema to two paradigms which are called “Pictures.” The first picture is called the “Schrodinger Picture,” while the second is called the “Heisenberg Picture” for Werner Heisenberg. Heisenberg and Schrodinger developed the basics of non-relativistic quantum mechanics in parallel from two separate directions; Schrodinger gave us wave mechanics while Heisenberg gave us operator formalism. They are essentially the same thing and work extremely well when used together. Schrodinger and Heisenberg pictures are connected to each other from the time displacement operator. In Schrodinger picture, the time displacement operation acts on the state ket, causing the state to evolve forward in time. In Heisenberg picture, the time displacement is shifted onto the operators and the eigenkets, while the state ket remains constant in time. Schrodinger picture is like sitting on a curbside and watching a car drive past, while Heisenberg picture is like sitting inside the car and watching the world drive past. Both pictures agree that the car is traveling the same speed, but they are looking at the situation from different vantage points. The Schrodinger time dependent equation is balanced by the Heisenberg equation of motion.

Where time dependence starts to become really interesting is if the Hamiltonian is not completely constant. As I wrote above, you might have a part of the Hamiltonian which contains some dependence on time. One way in which quantum mechanics addresses this is by a construction called the “Interaction Picture”… Sakurai also calls it the “Dirac Picture.” The interaction picture is sort of like driving along in your car and wondering at the car you’re passing; the world outside appears to be moving, as is the car you’re looking at, if only at different speeds and maybe in different directions.

I’ve likened this notion to switching frames of reference, but I caution you from pushing that analogy too far. The transformation between one picture and the next is by quantum mechanical phase, not by some sort transformation of frame of reference. Switching pictures is simply changing where time dependent phase is accumulated. As the Schrodinger picture places all this phase in the ket, Heisenberg picture places it all on the operator. Interaction picture splits the difference: the stationary phase is stuck to the operator while the time dependent phase is accumulated by the ket. In all three pictures, the same observables result (rather, the same expectation values) but the phases are broken up. Here is how the phases can be split inside a state ket.

I’ve written the state ket as a sum of eigenkets |n>. The time dependence from a time varying potential “V” is hidden in the eigenket coefficient while the stationary phase remains behind. The “n” index of the sum allows you to step through the entire Hilbert space of eigenkets without writing any but the one. Often, the coefficient Cn(t) is what we’re ultimately interested in, so it helps to remember that it has the following form when represented in bra-ket notation:

I’ve skipped ahead a little by writing that ket in the Interaction picture (these images were created for the NMR post that died, so they’re not quite in sequence now), but the effect is consistent. The usage of “1” here just a way to move into a Hilbert space representation of eigenkets… with probability normalized eigenkets, “spanning the space” means that you can construct a linear projection operator that is the same as identity. The 1 = sum is all that says. This is just a way to write the coefficient above in a bra-ket form.

The actual transformation to the Interaction picture is accomplished by canceling out the stationary phase…

By multiplying through with the conjugate of the stationary phase, only the time varying phase in the coefficient remains. This extra phase will then show up on operators translated into the interaction picture…

This takes the potential as it appears in the Schrodinger picture and converts it to a form consistent with the Interaction picture.

You can then start passing these relationships through the time dependent Schrodinger equation. One must only keep in mind that every derivative of time must be accounted for and that there are several…

(edit 5-22-18: The image right here contains a bit of wrong math, see the end of the post for a more comprehensive and correct version. I made a mistake and I won’t try to hide it: see if you can find it!;-)

This little bit of algebra creates a new form for the time dependent Schrodinger equation where the time dependence is only due to the time varying potential “V”. You can then basically just drop into a representation and use all the equalities I’ve justified above…

The last result here has eliminated all the ket notation and created a version of the time dependent Schrodinger equation where the differential equation is for the *coefficients* describing how much of each eigenket shows up in the state ket. The dot over the coefficient is a shorthand to mean “time derivative.”

This form of the time dependent Schrodinger equation gives an interesting story. The interaction represented by the time dependent potential “V” scrambles eigenket m into eigenket n. As you might have guessed, this is one in the huge family of different equations related to the Schrodinger equation and this particular version has an apt use in describing interactions. Background quantum mechanical phase accumulated only by the forward passage of time is ignored in order to look at phase accumulated by an interaction.

I will ultimately use this to talk a bit more about the two state problem and NMR, as from the post that died. Much of this particular derivation appears in the Sakurai Quantum Mechanics text.

edit 5-22-18:

There is a quirk in this derivation for the interaction picture that continues to bother me. I didn’t really see it at first, but it bothers me having thought some time about it. The full Hamiltonian is defined to be some basic part plus some separable time-dependent potential. In the derivative that produces the evolution from the time-dependent potential, there is a basic assumption that this time-dependent potential does not contain time explicitly, meaning that no time derivative is taken on the potential. This seems like a self-contradiction to me: the potential is defined as time dependent, but must be the same form as the basic part of the Hamiltonian and not contain explicit time dependence in order for the derivation to work as shown above. I’m still thinking about it.

Here is a better version of the derivative that gives the time dependent Schrodinger form involving only the potential within the interaction picture:

]]>You may have noticed that I posted an entry entitled “NMR and Spin flipping (part 2)” which has since disappeared. It turns out that wordpress doesn’t synch so well between its mobile app and its main page: I had an incomplete version of the NMR post on a mobile phone which I accidentally pushed to publish and over-wrote the completed post that I had finished several days before. Thank you wordpress for not synching properly! The incomplete version had none of the intended content. As I don’t feel like reconstructing a 5,000 word post right now, I thought I would scale back a bit and bite off a tiny chunk of the big subject of how magnets work. In part, I figure I can use some of what is derived here in the next version of the NMR post, which I intend to rewrite.

So, this will be the continuation of my series about magnets.

Reading through the initial magnets post, you will see that I did a rather spectacular amount of math, some of it unquestionably uncalled for. But, hey, the basic point of a blog is excess. One of the windfalls of all that math can yield an important theoretic construct which turns out to be one of the most major contributors of the explanation of how magnets work.

What this has to do with a loop of wire, I’ll come back to…

When an exact answer is not available to a physics question, one of the go-to strategies used by physicists is series approximation. Often, the low orders of a series tend to contribute to solutions more strongly than the high orders, meaning that the first couple terms in an expansion can be good approximations. One such expansion is used in magnetism.

Recall the relation between the magnetic field and the magnetic vector potential:

This expression is useful because the crazy vector junk is moved outside the integral. The magnetic potential is easier to work with than the magnetic field as a result. The expansion of interest is usually directed at the vector potential and is called the “multipole expansion.” There are many ways to run the multipole expansion, but maybe the easiest (for me) is to come back to our old friends the spherical harmonics Ylm.

In the vector potential of the magnetic field, that r-r’ factor in the denominator is really hard to work with. By itself, it is usually too complicated to integrate over. The multipole expansion lets us replace it with something that can be calculated. In this expansion, r is the location where we’re looking for the field while r’ is where the current which sources the field is located. The expansion is converting the difference in these (the propagator which pushes influence from the location of the current to the location of the field) into an infinite series of terms: in the sum, r< is whichever of the two distances is lesser, while r> is which of these two is greater. If you’re looking at a location inside the current distribution, r’ is bigger than r… but if you’re looking at a location outside of the current distribution, r is bigger than r’. The Ylms appear because space has a spherical polar geometry.

The substitution changes the form of the vector potential:

The vector potential is now a sum of an infinite number of terms inside the integral. You still can’t just compute that because this sequence converges to 1/r-r’, which you can’t calculate by itself anyway. What you can do is introduce a cut-off. This is literally where the multipole terms all come from: instead of calculating the entire series all at once, you only calculate one term (or one level of terms, as the case may be). If you take l=0, you get the monopole term, if you take l=1, you get the dipole term, and so on and so forth for higher orders of l.

Since I’m interested in magnetic dipoles right now, this is the crux: I’ve simply called the l=1 term “the dipole” by definition. Further, I care only about locations where I’m looking for the magnetic field well outside of the dipole, since I’m not going to look directly inside of the bar magnet to start with, so that r>r’. For the dipole, l=1 and I only care about m=-1,0 and 1 of the Ylms. This collapses the sum to just three terms.

If you’ve spent any time messing around with either E&M or quantum, you may remember those three Ylms off the top of your head. They’re basically just sines and cosines.

I will note, this whole expansion can be done in terms of Legendre polynomials too, but I remember the Ylms better. For some expedience, I will focus on the Ylm part of the integral in order to help bring it into a more manageable form before moving on.

There’s a lot of trig in here, but the final form is actually very much more manageable than where I started. I’ve highlighted the pattern in red and green. If you squint really really hard at this, you’ll realize that it’s a dot product of the cartesian form of the hatted unit vector r. So, it’s just a dot product of cartesian unit vectors…

This dials down to just a dot product of two unit vectors pointing in the directions toward either where the current is located or where the field is. I’ve installed it in the vector potential in the last line. I note explicitly that both of these are functions of the spherical polar angles since this will be important when I start working integrals.

If all things were equal, I could start doing calculus right now. Unfortunately, I don’t know the form of the current vector. That could be any distribution of currents imaginable and not all of them have pure dipole contributions. Working the problem as is, the set-up will respond to the dipole moment of whatever J-current I choose to install. You could imagine a case with a non-zero current where this particular integral goes to zero –if I did a line of current going in some constant direction, that would probably kill this integral. But, I do know of one current distribution in particular that has a very high dipole contribution… you might recognize this as post hoc reasoning, but I’m doing this to try to focus our attention on how one particular term in the multipole expansion behaves. The current distribution which is most interesting here is a loop of wire with a electric current circling it.

I’ve sketched out the current vector here as well as a set of axes showing the relationship between the spherical polar and cartesian coordinates where the unit vectors are all labeled. This vector current is just a current ‘I’ constrained to the X-Y plane, maintaining a loop around the origin at a radius of R. The current runs in a direction phi, which is tangential to the loop in a counterclockwise sense, and presumably has a positive current definition. The delta functions do the constraining to the X-Y plane. The factor of sine and radius in the denominator is a correction for use of the delta function in a spherical polar measure. The factor 2 is included to avoid a double-counting problem with a loop which shows up more explicitly, for example, in Jackson E&M, where the definition of the magnetic dipole moment is directly written with respect to the current vector. You’ll be happy to know that pretty much none of my work here actually follows Jackson, though the set-up is based strongly on the methods used in Jackson (I hated how Jackson set up his delta functions because I found them opaque as hell! But, that’s Jackson for you…)

The measure of integration is the typical spherical polar measure. You may remember my defining this in my post on the radial solution of the hydrogen atom. I’ll just quote it here. If you’ve done any vector calculus, it should be familiar anyway.

I can then put these all together in the vector potential, collect the terms and begin solving it.

In the third line, I pulled everything out front that I don’t need inside the integral. The radial portion of the integral collapses on the delta function. The angular portion is somewhat harder because it involves a couple unit vectors that vary with the angles; one of the unit vectors, the unprimed r, could actually be pulled outside the integral, but I left it in to help display a useful construct that will help me simplify the integral again. I will again focus on the vector portion inside this integral:

This use of the BAC-CAB rule allows me to change the unit vectors around into a cross product and flip the direction slightly. In the next step, by converting the theta unit vector into a cartesian form, the integral becomes trivial.

This solves the integral. Use of the delta function guts the theta coordinate and no remaining dependence exists for phi. After the hatted unit vectors are decoupled from the integration coordinates, the cross product gets pulled out front in an uncomplicated form. You can then collect and cancel in what remains:

Here, I’ve collected a particular quantity dependent on electric current running around in a loop which I have called a “magnetic dipole moment.” I conspired pretty strongly to get all the variable terms to pop out in a form that people will find familiar. A magnetic dipole is simply a loop, which can be of arbitrary shape, it turns out. This current loop is always right-hand defined, as above, to be “current x area” pointed in a direction normal to the area. This object could simply be a wire loop. At this point, you should be having images of stereotypical electromagnets which are many wire loops wrapped around some solid core. This electrical current configuration is very special because of the magnetic field that it tends to produce.

As an aside, I’ve seen dipole moment derived in a much more simplistic fashion than presented here, but my purpose was to be a bit more complete without actually duplicating Jackson… which I’ve mostly avoided, believe it or not… and to produce the form which can generate the whole dipole magnetic field, which can’t be done in the E&M 102 variety derivation. The simple derivation tends to operate on the axis of the magnetic dipole only, and does not calculate the shape of the field elsewhere in space. To get the whole field, you need to be a bit more sophisticated.

Magnetic field is produced by taking the curl of the vector potential, as I wrote far above. The fastest way I’ve found to take this curl is using the spherical polar definition of the curl, found here. You can derive this form of the curl in a manner very similar to what I did in my hydrogen atom radial equation post, but I’m going to hold off deriving it here: I’m somewhat short on time and I had hoped that this post wouldn’t get too very long.

My starting point here is to figure out how much of the curl I actually need. If you massage the terms inside the vector potential, you rapidly discover that only one of the three vector components is present, thus simplifying the curl. And, of course, to get to the magnetic field from here, I just need to take a curl…

The last thing I end up with here is an accepted form for the dipolar magnetic field:

This is an exact solution for the magnetic field from a current dipole. This particular solution is dependent on the assumption that the location where you’re examining the field is large compared to the size of the loop; for real physical dipoles of appreciable size, there can be other non-zero terms in the multipole expansion, meaning that the field will be predominantly what’s written here with some small deviations.

Admittedly, this mathematical equation doesn’t have a very intuitive form. Why in the world do I care about deriving *this* particular equation? To understand, we need some choice pictures…

edit 10-25-17: It seemed kind of ridiculous that I worked through all that math to find the dipole field and then stole other people’s diagrams of it. For completeness, here’s a vector plot of mine in Mathematica of the field equation written above:

Another magnetic dipole picture with the location of the dipole explicitly drawn in:

This image, where the dipole is rotated by 90 degrees from how I plotted it, is taken from wikipedia.

My interest in this field becomes more obvious when compared side-by-side with the magnetic field produced by a bar magnet…

This image is taken from how-things-work-science-projects.com. The field produced by a bar magnet is very similar in shape to the field produced by the loop of wire. Going further out, here is a diagram of the magnetic field produced by the Earth:

Notice some similarity? You’ll notice the Earth’s field lines are assigned to point oppositely from my diagram above, but that has to do with how compass needles orient rather than from any actual fundamental difference in the field!

Physical ferromagnets tend frequently to have dipolar magnetic fields. As such, the quantity of the magnetic dipole moment has huge physical importance. Granted, the field of the Earth isn’t perfectly dipolar, but it has an overwhelming dipole contribution. Other planets also have fields that are dipolar in shape.

Understanding how magnets work, compass needles, bar magnets and most sorts of permanent magnets, requires dipolar behavior as the underlying structure. Even the NMR post that got ruined was about a quantum mechanical phenomenon which revolves around magnetic dipoles.

This is a large step forward. I haven’t explained much, but I will write another post later showing why it is that magnets, particularly dipoles, respond to magnetic fields, as well as what the source of magnetism is in ferromagnets (no off-switch on the current for God’s sake!) Stay tuned for part 3!

edit 11-5-17

Playing around with matplotlib, I constructed a streamplot of the magnetic field produced by three dipoles, all flattened into the same plane and oriented facing different directions in that plane. This is all just superpositions using the field determined above. Kind of pretty…

]]>The reason this puzzle is important to me is that many of my interests sort of straddle how to go from the angstrom scale to the nanometer scale. There is a cross-over where physics becomes chemistry, but chemists and physicists often look at things very differently. I was not directly trained as a P-chemist; I was trained separately as a Biochemist and a Physicist. Remarkably, the Venn diagrams describing the education for these pursuits only overlap slightly. When Biochemists and Molecular Biologists talk, the basic structures below that are frequently just assumed (the scale here is >1nm), while Physicists frequently tend to focus their efforts toward going more and more basic (the scale here is <1 Angstrom). This leads to a clear non-overlap in the scale where chemistry and P-chem are relevant (~1 angstrom). Quite ironically, the whole periodic table of the elements lies there. I have been through P-chem and I’ve gotten hit with it as a Chemist, but this is something of an inconvenient scale gap for me. So, a cat’s paw of mine has been understanding, and I mean *really* understanding, where quantum mechanics transitions to chemistry.

One place is understanding how to get from the eigenstates I know how to solve to the orbitals structuring the periodic table.

This assemblage is pure quantum mechanics. You learn a huge amount about this in your quantum class. But, there are some fine details which can be left on the counter.

One of those details for me was the discrepancy between the hydrogenic wave functions and the orbitals on the periodic table. If you aren’t paying attention, you may not even know that the s-, p-, d- orbitals are not all directly the hydrogenic eigenstates (or perhaps you were paying a bit closer attention in class than I was and didn’t miss when this detail was brought up). The discrepancy is a very subtle one because often times when you start looking for images of the orbitals, the sources tend to freely mix superpositions of eigenstates with direct eigenstates without telling why the mixtures were chosen…

For example, here are the S, P and D orbitals for the periodic table:

This image is from http://www.chemcomp.com. Focusing on the P row, how is it that these functions relate to the pure eigenstates? Recall the images that I posted previously of the P eigenstates:

In the image for the S, P and D orbitals, of the Px, Py and Pz orbitals, all three look like some variant of P210, which is the pure state on the left, rather than P21-1, which is the state on the right. In chemistry, you get the orbitals directly without really being told where they came from, while in physics, you get the eigenstates and are told somewhat abstractly that the s-, p-, d- orbitals are all superpositions of these eigenstates. I recall seeing a professor during an undergraduate quantum class briefly derive Px and Py, but I really didn’t understand why he selected the combinations he did! Rationally, it makes sense that Pz is identical to P210 and that Px and Py are superpositions that have the same probability distribution as Pz, but are rotated into the X-Y plane ninety degrees from one another. How do Px and Py arise from superpositions of P21-1 and P211? P21-1 and P211 have identical probability distributions despite having opposite angular momentum!

Admittedly, the intuitive rotations that produce Px and Py from Pz make sense at a qualitative level, but if you try to extend that qualitative understanding to the D-row, you’re going to fail. Four of the D orbitals look like rotations of one another, but one doesn’t. Why? And why are there four that look identical? I mean, there are only three spatial dimensions to fill, presumably. How do these five fit together three dimensionally?

Except for the Dz^2, none of the D-orbitals are pure eigenstates: they’re all superpositions. But what logic produces them? What is the common construction algorithm which unites the logic of the D-orbitals with that of the P-orbitals (which are all intuitive rotations).

I’ll actually hold back on the math in this case because it turns out that there is a simple revelation which can give you the jump.

As it turns out, all of chemistry is dependent on angular momentum. When I say all, I really do mean it. The stability of chemical structures is dependent on cases where angular momentum has tended in some way to cancel out. Chemical reactivity in organic chemistry arises from valence choices that form bonds between atoms in order to “complete an octet,” which is short-hand for saying that species combine with each other in such a way that enough electrons are present to fill in or empty out eight orbitals (roughly push the number of electrons orbiting one type of atom across the periodic table in its appropriate row to match the noble gases column). For example, in forming the salt crystal sodium chloride, sodium possesses only one electron in its valence shell while chlorine contains seven: if sodium gives up one electron, it goes to a state with no need to complete the octet (with the equivalent electronic completion of neon), while chlorine gaining an electron pushes it into a state that is electronically equal to argon, with eight electrons. From a physicist stand-point, this is called “angular momentum closure,” where the filled orbitals are sufficient to completely cancel out all angular momentum in that valence level. As another example, one highly reactive chemical structure you might have heard about is a “radical” or maybe a “free radical,” which is simply chemist shorthand for the situation a physicist would recognize contains an electron with uncancelled spin and orbital angular momentum. Radical driven chemical reactions are about passing around this angular momentum! Overall, reactions tend to be driven to occur by the need to cancel out angular momentum. Atomic stoichiometry of a molecular species always revolves around angular momentum closure –you may not see it in basic chemistry, but this determines how many of each atom can be connected, in most cases.

From the physics, what can be known about an orbital is essentially the total angular momentum present and what amount of that angular momentum is in a particular direction, namely along the Z-axis. Angular momentum lost in the X-Y plane is, by definition, not in either the X or Y direction, but in some superposition of both. Without preparing a packet of angular momentum, the distribution ends up having to be uniform, meaning that it is in no particular direction except *not* in the Z-direction. For the P-orbitals, the eigenstates are purely either all angular momentum in the Z-direction, or none in that direction. For the D-orbitals, the states (of which there are five) can be combinations, two with angular momentum all along Z, two with half in the X-Y plane and half along Z and one with all in the X-Y plane.

What I’ve learned is that, for chemically relevant orbitals, the general rule is “minimal definite angular momentum.” What I mean by this is that you want to minimize situations where the orbital angular momentum is in a particular direction. The orbits present on the periodic table are states which have canceled out angular momentum located along the Z-axis. This is somewhat obvious for the homology between P210 and Pz. P210 points all of its angular momentum perpendicular to the z-axis. It locates the electron on average somewhere along the Z-axis in a pair of lobes shaped like a peanut, but the orbital direction is undefined. You can’t tell how the electron goes around.

As it turns out, Px and Py can both be obtained by making simple superpositions of P21-1 and P211 that cancel out z-axis angular momentum… literally adding together these two states so that their angular momentum along the z-axis goes away. Px is the symmetric superposition while Py is the antisymmetric version. For the two states obtained by this method, if you look for the expectation value of the z-axis angular momentum, you’ll find it missing! It cancels to zero.

It’s as simple as that.

The D-orbitals all follow. D320 already has no angular momentum on the z-axis, so it is directly Dzz. You therefore find four additional combinations by simply adding states that cancel the z-axis angular momentum: D321 and D32-1 symmetric and antisymmetric combinations and then the symmetric and antisymmetric combinations of D322 and D32-2.

Notice, all I’m doing to make any of these states is by looking at the last index (the m-index) of the eignstates and making a linear combination where the first index plus the second gives zero. 1-1 =0, 2-2=0. That’s it. Admittedly, the symmetric combination sums these with a (+) sign and a 1/sqrt(2) weighting constant so that Px = (1/sqrt(2))(P21 + P21-1) is normalized and the antisymmetric combination sums with a (-) sign as in Py = (1/sqrt(2))(P211 – P21-1), but nothing more complicated than that! The D-orbitals can be generated in exactly the same manner. I found one easy reference on line that loosely corroborated this observation, but said it instead as that the periodic table orbitals are all written such that the wave functions have no complex parts… which is also kind of true, but somewhat misleading because you sometimes have to multiply by a complex phase to put it genuinely in the form of sines for the polar coordinate (and as the polar coordinate is integrated over 360 degrees, expectation values on this coordinate, as z-axis momentum would contain, cancel themselves out; sines and cosines integrated over a full period, or multiples of a full period, integrate to zero.)

Before I wrap up, I had a quick intent to touch on where S-, P-, D- and F- came from. “Why did they pick those damn letters?” I wondered one day. Why not A-, B-, C- and D-? The nomenclature emerged from how spectral lines appeared visually and groups were named: (S)harp, (P)rincipal, (D)iffuse and (F)undamental. (A second interesting bit of “why the hell???” nomenclature is the X-ray lines… you may hate this notation as much as me: K, L, M, N, O… “stupid machine uses the K-line… what does that mean?” These letters simply match the n quantum number –the energy level– as n=1,2,3,4,5… Carbon K-edge, for instance, is the amount of energy between the n=1 orbital level and the ionized continuum for a carbon atom.) The sharpness tends to reflect the complexity of the structure in these groups.

As a quick summary about structuring of the periodic table, S-, P-, D-, and F- group the vertical columns (while the horizontal rows are the associated relative energy, but not necessarily the n-number). The element is determined by the number of protons present in the nucleus, which creates the chemical character of the atom by requiring an equal number of electrons present to cancel out the total positive charge of the nucleus. Electrons, as fermions, are forced to occupy distinct orbital states, meaning that each electron has a distinct orbit from every other (fudging for the antisymmetry of the wave function containing them all). As electrons are added to cancel protons, they fall into the available orbitals depicted in the order on the periodic table going from left to right, which can be a little confusing because they don’t necessarily purely close one level of n before starting to fill S-orbitals of the next level of n; for example at n=3, l can equal 0, 1 and 2… but, the S-orbitals for n=4 will fill before D-orbitals for n=3 (which are found in row 4). This has purely to do with the S-orbitals having lower energy than P-orbitals which have lower energy than D-orbitals, but that the energy of an S-orbital for a higher n may have lower energy than the D-orbital for n-1, meaning that the levels fill by order of energy and not necessarily by order to angular momentum closure, even though angular momentum closure influences the chemistry. S-, P-, D-, and F- all have double degeneracy to contain up and down spin of each orbital, so that S- contains 2 instead of 1, P- contains 6 instead of 3, and D- from 10 instead of 5. If you start to count, you’ll see that this produces the numerics of the periodic table.

Periodic table is a fascinating construct: it contains a huge amount of quantum mechanical information which really doesn’t look much like quantum mechanics. And, everybody has seen the thing! An interesting test to see the depth of a conversation about periodic table is to ask those conversing if they understand why the word “periodic” is used in the name “Periodic table of the elements.” The choice of that word is pure quantum mechanics.

]]>Powerball is actually quite intriguing to me. They have a website here which details by level all the winners across the whole country who have won a Powerball prize in any given drawing. You may have looked at this chart at some point while trying to figure out if your ticket won something useful. A part of what intrigues me about this chart is that it tells you in a given drawing exactly how much money was spent on Powerball and how many people bought tickets. How does it tell you this? Because probability is an incredibly reliable gauge of behavior with big samples sizes. And, Powerball quite willingly lays all the numbers out for you to do their book keeping for them by telling you exactly how many people won… particularly at the high-probability-to-win levels which push into the regime of Gaussian statistics. For big samples, like millions of people buying powerball tickets, where N=big, the errors on average values become relatively insignificant since they go as sqrt(N). And, the probabilities reveal what those average values are.

The game is doubly intriguing to me because of the psychological component that drives it. As the pot becomes big, people’s willingness to play becomes big even though the probabilities never change. It suddenly leaps into the national consciousness every time the size of the pot becomes big and people play more aggressively as if they had a greater chance of winning said money. It is true that somebody ultimately walks away with the big pot, but what’s the likelihood that somebody is you?

But, as a starter, what are the probabilities that you win anything when you buy a ticket? To understand this, it helps to know how the game is set up.

As everybody knows, powerball is one of these games where they draw a bunch of little balls printed with numbers out of a machine with a spinning basket and you, as the player, simply match the numbers on your ticket to the numbers on the balls. If your ticket matches all the numbers, you win big! And, as an incentive to make people feel like they’re getting something out of playing, the powerball company awards various combinations of matching numbers and adds in multipliers which increase the size of the award if you do get any sort of match. You might only match a number or two, but they reward you a couple bucks for your effort. If you really want, you can pick the numbers yourself, but most people simply grab random numbers spat out of a computer… not like I’m telling you anything you don’t already know at this point.

One of the interesting qualities of the game is that the probabilities of prizes are very easy to adjust. The whole apparatus stays the same; they just add or subtract balls from the basket. In powerball, as currently run, there are two baskets: the first basket contains 69 balls while the second contains 26. Five balls are drawn from the first basket while only one, the Powerball, is drawn from the second. There is actually an entire record available of how the game has been run in the past, how many balls were in either the first or second baskets and when balls were added or subtracted from each. As the game has crossed state lines and the number of players has grown, the number of balls has also steadily swelled. I think the choice in numbering has been pretty careful to make the smallest prize attainably easy to get while pushing the chances for the grand prize to grow enticingly larger and larger. Prizes are mainly regulated by the presence of the Powerball: if your ticket manages to match the Powerball and nothing else, you win a small prize, no matter what. Prizes get bigger as a larger number of the other five balls are matched on your ticket.

The probabilities at a low level work almost exactly as you would expect: if there are 26 balls in the powerball basket, at any given drawing, you have 1 chance in 26 of matching the powerball. This means that you have 1 chance in 26 of winning some prize as determined by the presence of the powerball. There are also prizes for runs of larger than three matching balls drawn from the main basket, which tends to push the probabilities of winning *anything* to a slightly higher frequency than 1 in 26.

For the number savvy this begins to reveal the economics of powerball: an assured win by these means requires you to spend, on average, $48. That’s 26 tickets where you are likely to have *one* that matches the powerball. Note, the prize for matching that number is $4. $44 dollars spent to net only $4 is a big overall loss. But, this 26 ticket buy-in is actually hiding the fact that you have a small chance of matching some sequence of other numbers and obtaining a bigger prize… and it would certainly not be an economic loss if you matched the powerball and then the 5 other balls, yielding you a profit in the hundreds of millions of dollars (and this is usually what people tell themselves as they spend $2 for each number).

The probability to win the matched powerball prize *only*, that is to match just the powerball number, is actually somewhat worse than 1 in 26. The probability is attenuated by the requirement that you hit no matches on any other of the five possible numbers drawn.

Finding the actual probability is as follows: (1/26)*(64/69)*(63/68)*(62/67)*(61/66)*(60/65). If you multiply that out and invert it, you get 1 hit in 38.32 tries. The first number is, of course, the chances of hitting the powerball, while the other five are the chance of hitting numbers that aren’t picked… most of these probabilities are naturally quite close to 1, so you are likely to hit them, but they are probabilities that count toward hitting the powerball *only*.

This number may not be that interesting to you, but lots of people play the game and that means that the likelihood of hitting just the powerball is close to Gaussian. This is useful to a physicist because it reveals something about the structure of the Powerball playing audience on any given week: that site I gave tells you how many people won with only the powerball, meaning that by multiplying that number by 38.32, you know how many tickets were purchased prior to the drawing in question. For example, as of the August 12 2017 drawing, 1,176,672 numbers won the powerball-only prize, meaning that very nearly 38.32*1,176,672 numbers were purchased: ~45,090,071 numbers +/- 6,715, including error (notice that the error here is well below 1%).

How many people are playing? If people mostly purchase maybe two or three numbers, around 15-20 million people played. Of course, I’m not accounting for the slavering masses who went whole hog and dropped $20 on numbers; if everybody did this, 4.5 million people played… truly, I can’t really know people’s purchasing habits for certain, but I can with certainty say that only a couple tens of millions of people played.

The number there reveals quite clearly the economics of the game for the period between the 8/12 drawing and the one a couple days prior: $90 million was spent on tickets! This is really quite easy arithmetic since it’s all in factors of 2 over the number of ticket numbers sold. If you look at the total prize pay-out, also on that page I provided, $19.4 million was won. This means that the Powerball company kept ~$70 million made over about three days, of which some got dumped into the grand prize and some went to whatever overhead they keep (I hear at least some of that extra is supposed to go into public works and maybe some also ends up in the Godfather’s pocket). Lucrative business.

If you look at the prize payouts for the game, most of the lower level prizes pay off between $4 and $7. You can’t get a prize that exceeds $100 until you match at least 4 balls. Note, here, that the probability of matching 4 balls (including the powerball) is about 1 in 14,494. This means, that to assure yourself a prize of $100, you have to spend ~$29,000. You might argue that in 14,494 tickets, you’ll win a couple smaller prizes ($4 prizes are 1 in 38, 1 in 91, and $7 prizes are 1 in 700 and 1 in 580) and maybe break even. Here’s the calculation for how much you’ll likely make for that buy-in: $4*(14,494*(1/38 + 1/91)) + $7*(14,494*(1/700 + 1/580))… I’ve rounded the probabilities a bit… =$2482.65. For $29,000 spent to assure a single $100 win, you are assured to win at most $2500 from lesser winnings for a total loss of $27,500. Notice, $4 on a $44 loss is about 10%, while $2500 on $27,500 is also about 10%… the payoff does not improve at attainable levels! Granted, there’s a chance at a couple hundred million, but the probability of the bigger prize is still pretty well against you.

Suppose you are a big spender and you managed to rake up $29,000 *in cash* to dump into tickets, how likely is it that you will win just the $1 million prize? That’s five matched balls excluding the powerball. The probability is 1 in 11,688,053. By pushing the numbers, your odds of this prize have become 14,500/11,688,053, or about *1 chance in 800*. Your odds are substantially improved here, but 1 in 800 is still not a wonderful bet despite the fact that you assured yourself a fourth tier prize of $100! The grand prize is still a much harder bet with odds running at about 1 in 20,000, despite the amount you just dropped on it. Do you just happen to have $30,000 burning a hole in your pocket? Lucky you! Lots of people live on that salary for a year.

Most of this is simple arithmetic and I’ve been bandying about probabilities gleaned from the Powerball website. If you’re as curious about it as me, you might be wondering exactly how all those probabilities were calculated. I gave an example above of the mechanical calculation of the lowest level probability, but I also went and figured out a pair of formulae that calculate any of the powerball prize probabilities. It reminded me a bit of stat mech…

I’ve colored the main equations and annotated the the parts to make them a little clearer. The final relation just shows how you can see the number of tries needed in order to hit one success, given a probability as calculated with the other two equations. The first equation differs from the second in that it refers to probabilities where you have matched numbers without managing to match the powerball, while the second is the complement, where you match numbers having hit the powerball. Between these two equations, you can calculate all the probabilities for the powerball prizes. Since probabilities were always hard for me, I’ll try to explain the parts of these equations. If you’re not familiar with the factorial operation, this is what is denoted by the exclamation point “!” and it denotes a product string counting up from one to the number of the factorial… for example 5! means 1x2x3x4x5. The special case 0! should be read as 1. The first part, in blue, is the probability relating to either hitting on missing the powerball, where K = 26, the number of balls in the powerball basket. The second part (purple) is the multiplicity and tells you how many ways that you can draw a certain number of matches (Y) to fill a number of open slots (X), while drawing a number of mismatches (Z) in the process, where X=Y+Z. In powerball, you draw five balls, so X=5 and Y is the number of matches (anywhere from 0 to 5), while Z is the number of misses. Multiplicity shows up in stat mech and is intimately related to entropy. The totals drawn (green) is perhaps mislabeled… here I’m referring to the number of possible choices in the main basket, N=69, and the number of those that will not be drawn M = N – X, or 64. I should probably have called it “Main basket balls” or something. The last two parts determine the probabilities related to the given number of hits (Y) (orange) and the given number of misses (Z) (red) and I have applied the product operator to spiffy up the notation. Product operator is another iterand much like the summation operator and means that you repeatedly multiply successive values, much like a factorial, but where the value you are multiplying is produced from a particular range and given a set form. In these, the small script m and n start at zero (my bad, this should be under the Pi) and iterate until they are just less than the number up top (Y – 1 or Z – 1 and not equal to). At the extreme cases of either all hits or all misses, the relevant product operator (either Miss or Hit respectively) must be set equal to one in order to not count it.

This is one of those rare situations where the American public does a probability experiment with the values all well recorded where it’s possible to see the outcomes. How hard is it to win the grand prize? Well, the odds are one in 292 million. Consider that the population of the United States is 323 million. That means that if everybody in the United States bought one powerball number, about one person would win.

Only one.

Thanks to the power of the media, everybody has the opportunity to know that *somebody* won. Or not. That this person exists, nobody wants to doubt, but consider that the odds of winning are so scant that you not only won’t win, but you pretty likely will never meet anyone who did. Sort of surreal… everything is above board, you would think, but the rarity is so rare that there’s no assurance that it ever actually happens. You can suppose that maybe it does happen because people do win those dinky $4 prizes, but maybe this is just a red herring and nobody really actually wins! Those winner testimonials could be from actors!

Yeah, I’m not much of a conspiracy theorist, but it is true that a founding tenant of the idea of a ‘limit’ in math is that 99.99999% is effectively 100%. Going to the limit where the discrepancy is so small as to be infinitesimal is what calculus is all about. It is fair to say that it very nearly never happens! Everybody wants to be the one who beats the odds, which is why Powerball tickets are sold, but the extraordinarily vast majority never will win anything useful… I say “useful” because winning $4 or $7 is always a net loss. You have to win one of the top three prizes for it to be anywhere near worth anything, which you likely never will.

One final fairly interesting feature of the probability is that you can make some rough predictions about how frequently the grand prize is won based on how frequently the first prize is won. First prize is matching all five of the balls, but not the powerball. This frequency is about once per 12 million numbers, which is about 26 times more likely than all 5 plus the Powerball. In the report on winnings, a typical frequency is about 2 to 3 winners per drawing. About 1 time in 26 a person with all five manages to get the powerball too, so, with two drawings per week and about 2.5 first prize winners per drawing, that’s five winners per week… which implies that the grand prize should be won at a frequency of about once every five to six weeks –every month and a half or so. The average here will have a very large standard deviation because the number of winners is compact, meaning that the error is an appreciable portion of the measurement, which is why there is a great deal of variation in period between times when the grand prize is won. The incidence becomes much more Poissonian and stochastic, and allows some prizes to get quite big compared to others and causes their values to disperse across a fairly broad range. Uncertainty tends to dominate, making the game a bit more exciting.

While the grand prize is small, the number of people winning the first prize in a given week is small (maybe none or one), but this number grows in proportion to the size of the grand prize (maybe 5 or 6 or as high as 9). When the prize grows large enough to catch the public consciousness, the likelihood that somebody will win goes up simply because more people are playing it and this can be witnessed in the fluctuating frequency of the wins of lower level prizes. It breathes around the pulse of maybe 200 million dollars, lubbing at 40 million (maybe 0 to 1 person winning the first prize) and dubbing at 250 million (with 5 people or more winning the first prize).

Quite a story is told if you’re boring and as easily amused as me.

In my opinion, if you do feel inclined to play the game, be aware that when I say you probably won’t win, I mean that the numbers are so strongly against you that you do not appreciably improve your odds by throwing down $100 or even $1,000. The little $4 wins do happen, but they never pay and $1,000 spent will likely not get you more than $100 in total of winnings. It might as well be a voluntary tax. Cherish the dream your $2 buys, but do not stake your well-being on it. There’s nothing wrong with dreaming as long as you understand where to wake up.

(edit 8-24-17)

There was a grand prize winner last night (Wednesday 8-23-17). The outcomes are almost completely as should be expected: the winner is in Massachusetts… the majority of the country’s population is located in states on either the east or west coast, so this is unsurprising. There were 40 match 5 winners, so you would anticipate at least one to be a grand prize winner, which is exactly what happened (1 in 26 difference between 5 with powerball and 5 without). There were about 5.9 million powerball-only winners, so 38.32*5.9 is 226 million total powerball numbers sold in the run-up to last night’s drawing… with grand prize odds of 1 in 292 million, this is approaching parity. This means that more than $452 million was spent since Saturday on powerball lottery numbers (calculation excludes the extra dollar spent on multipliers). About five times as many ticket numbers were sold for this drawing as when I made my original analysis a week ago. With that many tickets sold, there was almost assuredly going to be a winner *last night*. This is not to say there shouldn’t have been a winner before this –probability is a fickle mistress– but the numbers are such that it was unlikely, but not impossible, for the prize to grow bigger. The last time the powerball was won was on 6-10-17, about two months and thirteen days ago… you can know that this is an unusually large jackpot because this period is longer than the usual period between wins (I had generously estimated 6 weeks based on the guess of 2 match 5 winners per drawing, but I think this might actually be a bit too high).

There was only one grand prize winning number out of 226 *million* tickets sold (not counting all the drawings that failed to yield a grand prize winner prior to this.) Think on that for a moment.

I admit that I had a great deal of trouble getting motivated to attack the Chapter 4 problems. When I saw the first aspects of symmetry in class, I just did not particularly understand it. Coming back to it on my own was not much better. Abstract symmetry is not easy to understand.

In Sakurai chapter 4, the text delves into a few different symmetries that are important to quantum mechanics and pretty much all of them are difficult to see at first. As it turns out, some of these symmetries are very powerful tools. For example, use of the reflection symmetry operation in a chiral molecule (like the C-alpha carbon of proteins or the hydrated carbons of sugars) can reveal neighboring degenerate ground states which can be accessed by racemization, where an atomic substituent of the molecule tunnels through the plane of the molecule and reverses the chirality of the state at some infrequent rate. Another example is translation symmetry operation, where a lattice of identical attractive potentials serves to hide a near infinite number of identical states where a bound particle can hop from one minimum to the next and traverse the lattice… this behavior essentially a specific model describing the passage of electrons through a crystalline semiconductor.

One of the harder symmetries was time reversal symmetry. I shouldn’t say “one of the harder;” for me time reversal was the hardest to understand and I would be hesitant to say that I completely understand it yet. Time reversal operator causes time to translate backward, making momenta and angular momenta reverse. Time reversal is really hard because the operator is anti-unitary, meaning that the operation switches the sign on complex quantities that it operates on. Nevertheless, time reversal has some interesting outcomes. For instance, if a spinless particle is bound to a fixed center where the state in question is not degenerate (Only one state at the given energy), time reversal says that the state can have no average angular momentum (it can’t be rotating or orbiting). On the other hand, if the particle has spin, the bound state must be degenerate because the particle can’t have no angular momentum!

A quick digression here for the laymen: in quantum mechanics, the word “degenerate” is used to refer to situations where multiple states lie on top of one another and are indistinguishable. Degeneracy is very important in quantum mechanics because certain situations contain only enough information to know an incomplete picture of the model where more information is needed to distinguish alternative answers… coexisting alternatives subsist in superposition, meaning that a wave function is in a superposition of its degenerate alternative outcomes if there is no way to distinguish among them. This is part of how entanglement arises: you can generate entanglement by creating a situation where discrete parts of the system simultaneously occupy degenerate states encompassing the *whole system*. The discrete parts become entangled.

Symmetry is important because it provides a powerful tool by which to break apart degeneracy. A set of degenerate states can often be distinguished from one another by exploiting the symmetries present in the system. L- and R- enantiomers in a molecule are related by a reflection symmetry at a stereo center, meaning that there are two states of indistinguishable energy that are reflections of one another. People don’t often notice it, but chemists are masters of quantum mechanics even though they typically don’t know as much of the math: how you build molecules is totally governed by quantum mechanics and chemists must understand the qualitative results of the physical models. I’ve seen chemists speak competently of symmetry transformations in places where the physicists sometimes have problems.

Another place where symmetry is important is in the search for new physics. The way to discover new physical phenomena is to look for observational results that break the expected symmetries of a given mathematical model. The LHC was built to explore symmetries. Currently known models are said to hold CPT symmetry, referring to Charge, Parity and Time Reversal symmetry… I admit that I don’t understand all the implications of this, but simply put, if you make an observation that violates CPT, you have discovered physics not accounted for by current models.

I held back talking about Parity in all this because I wanted to speak of it in greater detail. Of the symmetries covered in Sakurai chapter 4, I feel that I made the greatest jump in understanding on Parity.

Parity is symmetry under space inversion.

What?

Just saying that sounds diabolical. Space inversion. It sounds like that situation in Harry Potter where somebody screws up trying to disapparate and manages to get splinched… like they space invert themselves and can’t undo it.

The parity operation carries all the cartesian variables in a function to their negative values.

Here Phi just stands in for the parity operator. By performing the parity operation, all the variables in the function which denote spatial position are turned inside out and sent to their negative value. Things get splinched.

You might note here that applying parity twice gets you back to where you started, unsplinching the splinched. This shows that parity operator has the special property that it is it’s own inverse operation. You might understand how special this is by noting that we can’t all literally be our own brother, but the parity operator basically is.

Applying parity twice is like multiplying by 1… which is how you know parity is its own inverse. This also makes parity a unitary operator since it doesn’t effect absolute value of the function. Parity operation times inverse parity is one, so unitary.

or

Here, the daggered superscript means “complex conjugate” which is an automatic requirement for the inverse operation if you’re a unitary operator. Hello linear algebra. Be assured I’m not about the break out the matrices, so have no fear. We will stay in a representation free zone. In this regard, parity operation is very much like a rotation: the inverse operation is the complex conjugate of the operation, never mind the details that the inverse operation is the operation.

Parity symmetry is “symmetry under the parity operation.” There are many states that are not symmetric under parity, but we would be interested in searching particularly for parity operation eigenstates, which are states that parity operator will transform to give back that state times some constant eigenvalue. As it turns out, the parity operator can only ever have two eigenvalues, which are +1 and -1. A parity eigenstate is a state that only changes its sign (or not) when acted on by the parity operator. The parity eigenvalue equations are therefore:

All this says is that under space inversion, the parity eigenstates will either not be affected by the transformation, or will be negative of their original value. If the sign doesn’t change, the state is symmetric under space inversion (called even). But, if the sign does change, the state is antisymmetric under space inversion (called odd). As an example, in a space of one dimension (defined by ‘x’), the function sine is antisymmetric (odd) while the function cosine is symmetric (even).

In this image, taken from a graphing app on my smartphone, the white curve is plain old sine while the blue curve is the parity transformed sine. As mentioned, cosine does not change under parity.

As you may be aware, sines and cosines are energy eigenstates for the particle-in-the-box problem and so would constitute one example of legit parity eigenstates with physical significance.

Operators can also be transformed by parity. In order to see the significance, you just note that the definition of parity is that the position operation is reversed. So, a parity transformation of the position operator is this:

Kind of what should be expected. Position under parity turns negative.

As expressed, all of this is really academic. What’s the point?

Parity can give some insights that have deep significance. The deepest result that I understood is that matrix elements and expectation values will conserve with parity transformation. Matrix elements are a generalization of the expectation value where the bra and ket are not necessarily to the same eigenfunction. The proof of the statement here is one line:

At the end, the squiggles all denote parity transformed values, ‘m’ and ‘n’ are blanket eigenstates with arbitrary parity eigenvalues and V is some miscellaneous operator. First, the complex conjugation that turns a ket into a bra does not affect the parity eigenvalue equation, since parity is its own inverse operation and since the eigenvalues of 1 and -1 are not complex, so the bra above has just the same eigenvalue as if it were a ket. So, the matrix element does not change with the parity transformation –the combined parity transformation of all these parts are as if you just multiplied by identity a couple times, which should do nothing but return the original value.

What makes this important is that it sets a requirement on how many -1 eigenvalues can appear within the parity transformed matrix element (which is equal to the original matrix element): it can never be more than an even number (either zero or two). For the element to exist (that is, for it to have a non-zero value), if the initial and final states connected by the potential are *both* parity odd or parity even, the potential connecting them must be symmetric. Conversely, if the potential is parity odd, either the initial or final state must be odd, while the other is even. To sum up, a parity odd operator has non-zero matrix elements only when connecting states of differing parity while a parity even operator must connect states of the same parity. This restriction is observed simply by noting that the sign can’t change between a matrix element and the parity transformed matrix element.

Now, since an expectation value (average position, for example) is always a matrix element connecting an eigenket to itself, expectation values can only be non-zero for operators of even parity. For example, in a system defined across all space, average position ends up being zero because the position operator is odd, while both eigenbra and eigenket are of the same function, and therefore have the same parity. For average position to be non-zero, the wavefunction would need to be a superposition of eigenkets of opposite parity (and therefore not an eigenstate of parity at all!)

A tangible, far reaching result of this symmetry, related particularly to the position operator, is that no pure eigenstate can have an electric dipole moment. The dipole moment operator is built around the position operator, so a situation where position expectation value goes to zero will require dipole moment to be zero also. Any observed electric dipole moment must be from a mixture of states.

If you stop and think about that, that’s really pretty amazing. It tells you whether an observable is zero or not depending on which eigenkets are present and whether the operator for that observable can be inverted or not.

Hopefully I got that all correct. If anybody more sophisticated than me sees holes in my statement, please speak up!

Welcome to symmetry.

(For the few people who may have noticed, I still have it in mind to write more about the magnets puzzle, but I really haven’t had time recently. Magnets are difficult.)

]]>Know your meme.

It’s been a while since this became a thing, but I think it’s actually a really good question. Truly, the original meme exploded from an unlikely source who wanted to relish in appreciating those things that seem magical without really appreciating how mind-bending and thought-expanding the explanation to this seemingly earnest question actually is.

As I got on in this writing, I realized that the scope of the topic is bigger than can be tackled in a single post. What is presented here will only be the first part (though I haven’t yet had a chance to write later parts!) The succeeding posts may end up being as mathematical as this, but perhaps less so. Moveover, as I got to writing, I realized that I haven’t posted a good bit of math here in a while: what good is the the mathematical poetry of physics if nobody sees it?

Magnets do not get less magical when you understand how they work: they get more compelling.

This image, taken from a website that sells quackery, highlights the intriguing properties of magnets. A solid object with apparently no moving parts has this manner of influencing the world around it. How can that not be magical? Lodestones have been magic forever and they do not get less magical with the explanation.

Truthfully, I’ve been thinking about the question of how they work for a couple days now. When I started out, I realized that I couldn’t just answer this out of hand, even though I would like to think that I’ve got a working understanding of magnetic fields –this is actually significant to me because the typical response to the Insane Clown Posse’s somewhat vacuous pondering is not really as simple as “Well, duh, magnetic fields you dope!” Someone really can explain how magnets work, but the explanation is really not trivial. That I got to a level in asking how they work where I said, “Well, um, I don’t really know this,” got my attention. How the details fit together gets deep in a hurry. What makes a bar magnet like the one in the picture above special? You don’t put batteries in it. You don’t flick a switch. It just works.

For most every person, that pattern above is the depth of how it works. How does it work? Well, it has a magnetic field. And, everybody has played with magnets at some point, so we sort of all know what they do, if not how they do it.

In this picture from penguin labs, these magnets are exerting sufficient force on one another that many of them apparently defy gravity. Here, the rod simply keeps the magnets confined so that they can’t change orientations with respect to one another and they exert sufficient repulsive force to climb up the rod as if they have no weight.

It’s definitely cool, no denying. There is definitely a quality to this that is magical and awe inspiring.

But, is it better knowing how they work, or just blindly appreciating them because it’s too hard to fill in the blank?

The central feature of how magnets work is quite effortlessly explained by the physics of Electromagnetism. Or, maybe it’s better to say that the details are laboriously and completely explained. People rebel against how hard it is to understand the details, but no true explanation is required to be easily explicable.

The forces which hold those little pieces of metal apart are relatively understandable.

Here’s the Lorentz force law. It says that the force (F) on an object with a charge is equal to sum of the electric force on the object (qE) plus the magnetic force (qvB). Magnets interact solely by magnetic force, the second term.

In this picture from Wikipedia, if a charge (q) moving with speed (v) passes into a region containing this thing we call a “magnetic field,” it will tend to curve in its trajectory depending on whether the charge is negative or positive. We can ‘see’ this magnetic field thing in the image above with the bar magnet and iron filings. What is it, how is it produced?

The fundamental observation of magnetic fields is tied up into a phenomenological equation called the Biot-Savart law.

This equation is immediately intimidating. I’ve written it in all of it’s horrifying Jacksonian glory. You can read this equation like a sentence. It says that all the magnetic field (B) you can find at a location in space (r) is proportional to a sum of all the electric currents (J) at all possible locations where you can find any current (r’) and inversely proportional to the square of the distance between where you’re looking for the magnetic field and where all the electrical currents are –it may say ‘inverse cube’ in the equation, but it’s actually an inverse square since there’s a full power of length in the numerator. Yikes, what a sentence! Additionally, the equation says that the direction of the magnetic field is at right angles to both the direction that the current is traveling and the direction given by the line between where you’re looking for magnetic field and where the current is located. These directions are all wrapped up in the arrow scripts on every quantity in the equation and are determined by the cross-product as denoted by the ‘x’. The difference between the two ‘r’ vectors in the numerator creates a pure direction between the location of a particular current element and where you’re looking for magnetic field. The ‘d’ at the end is the differential volume that confines the electric currents and simply means that you’re adding up locations in 3D space. The scaling constants outside the integral sign are geometrical and control strength; the 4 and Pi relate to the dimensionality of the field source radiated out into a full solid angle (it covers a singularity in the field due to the location of the field source) and the ‘μ’ essentially tells how space broadcasts magnetic field… where the constant ‘μ’ is closely tied to the speed of light. This equation has the structure of a propagator: it takes an electric current located at r’ and propagates it into a field at r.

It may also be confusing to you that I’m calling current ‘J’ when nearly every basic physics class calls it ‘I’… well, get used to it. ‘Current vector’ is a subtle variation of current.

I looked for some diagrams to help depict Biot-Savart’s components, but I wasn’t satisfied with what Google coughed up. Here’s a rendering of my own with all the important vectors labeled.

Now, I showed the crazy Biot-Savart equation, but I can tell you right now that it is a pain in the ass to work with. Very few people wake up in the morning and say “Boy oh boy, Biot-Savart for me today!” For most physics students this equation comes with a note of dread. Directly using it to analytically calculate magnetic fields is not easy. That cross product and all the crazy vectors pointing in every which direction make this equation a monster. There are some basic feature here which are common to many fields, particularly the inverse square, which you can find in the Newtonian gravity formula or Coulomb’s law for electrostatics, and the field being proportional to some source, in this case an electric current, where gravity has mass and electrostatics have charge.

Magnetic field becomes extraordinary because of that flipping (God damned, effing…) cross product, which means that it points in counter-intuitive directions. With electrostatics and gravity, the field is usually going toward or away from the source, while magnetism has the field seems to be going ‘around’ the source. Moreover, unlike electrostatics and gravity, the source isn’t exactly a something, like a charge or a mass, it’s dynamic… as in a change in state; electric charges are present in a current, but if you have those charges sitting stationary, even though they are still present, they can’t produce a magnetic field. Moreover, if you neutralize the charge, a magnetic field can still be present if those now invisible charges are moving to produce a current: current flowing in a copper wire is electric charges that are moving along the wire and this produces a magnetic field around the wire, but the presence of positive charges fixed to the metal atoms of the wire neutralizes the negative charges of the moving electrons, resulting in a state of otherwise net neutral charge. So, no electrostatic field, even though you have a magnetic field. It might surprise you to know that neutron stars have powerful magnetic fields, even though there are no electrons or protons present in order give any actual electric currents at all. The requirement for moving charges to produce a magnetic field is not inconsistent with the moving charge required to feel force from a magnetic field as well. Admittedly, there’s more to it than just ‘currents’ but I’ll get to that in another post.

With a little bit of algebraic shenanigans, Biot-Savart can be twisted around into a slightly more tractable form called Ampere’s Law, which is one of the four Maxwell’s equations that define electromagnetism. I had originally not intended to show this derivation, but I had a change of heart when I realized that I’d forgotten the details myself. So, I worked through them again just to see that I could. Keep in mind that this is really just a speed bump along the direction toward learning how magnets work.

For your viewing pleasure, the derivation of the Maxwell-Ampere law from the Biot-Savart equation.

In starting to set up for this, there are a couple fairly useful vector identities.

This trio contains several basic differential identities which can be very useful in this particular derivation. Here, the variables r are actually vectors in three dimensions. For those of you who don’t know these things, all it means is this:

These can be diagrammed like this:

This little diagram just treats the origin like the corner of a 3D box and each distance is a length along one of the three edges emanating from the corner.

I’ll try not to get too far afield with this quick vector tutorial, but it helps to understand that this is just a way to wrap up a 3D representation inside a simple symbol. The hatted symbols of x,y and z are all unit vectors that point in the relevant three dimensional directions where the un-hatted symbols just mean a variable distance along x or y or z. The prime (r’) means that the coordinate is used to tell where the electric current is located while the unprime (r) means that this is the coordinate for the magnetic field. The upside down triangle is an operator called ‘del’… you may know it from my hydrogen wave function post. What I’m doing here is quite similar to what I did over there before. For the uninitiated, here are gradient, divergence and curl:

Gradient works on a scalar function to produce a vector, divergence works on a vector to produce a scalar function and curl works on a vector to produce a vector. I will assume that the reader can take derivatives and not go any further back than this. The operations on the right of the equal sign are wrapped up inside the symbols on the left.

One final useful bit of notation here is the length operation. Length operation just finds the length of a vector and is denoted by flat braces as an absolute value. Everywhere I’ve used it, I’ve been applying it to a vector obtained by finding the distance between where two different vectors point:

As you can see, notation is all about compressing operations away until they are very compact. The equations I’ve used to this point all contain a great deal of math lying underneath what is written, but you can muddle through by the examples here.

Getting back to my identity trio:

The first identity here (I1) takes the vector object written on the left and produces a gradient from it… the thing in the quotient of that function is the length of the difference between those two vectors, which is simply a scalar number without a direction as shown in the length operation as written above.

The second identity (I2) here takes the divergence of the gradient and reveals that it’s the same thing as a Dirac delta (incredibly easy way to kill an integral!). I’ve not written the operation as divergence on a gradient, but instead wrapped it up in the ‘square’ on the del… you can know it’s a divergence of a gradient because the function inside the parenthesis is a scalar, meaning that the first operation has to be a gradient, which produces a vector, which automatically necessitates the second operation to be a divergence, since that only works on vectors to produce scalars.

The third identity (I3) shows that the gradient with respect to the unprimed vector coordinate system is actually equal to a negative sign times the primed coordinate system… which is a very easy way to switch from a derivative with respect to the first r and the same form of derivative with respect to the second r’.

To be clear, these identities are tailor-made to this problem (and similar electrodynamics problems) and you probably will never ever see them anywhere but the *cough cough* Jackson book. The first identity can be proven by working the gradient operation and taking derivatives. The second identity can be proven by using the vector divergence theorem in a spherical polar coordinate system and is the source of the 4*Pi that you see everywhere in electromagnetism. The third identity can also be proven by the same method as the first.

There are two additional helpful vector identities that I used which I produced in the process of working this derivation. I will create them here because, why not! If the math scares you, you’re on the wrong blog. To produce these identities, I used the component decomposition of the cross product and a useful Levi-Civita kroenecker delta identity –I’m really bad at remembering vector identities, so I put a great deal of effort into learning how to construct them myself: my Levi-Civita is ghetto, but it works well enough. For those of you who don’t know the ol’ Levi-Civita symbol, it’s a pretty nice tool for constructing things in a component-wise fashion: ε_{ijk} . To make this work, you just have to remember it as I just wrote it… if any indices are equal, the symbol is zero, if they are all different, they are 1 or -1. If you take it as ijk, with the indices all different as I wrote, it equals 1 and becomes -1 if you reverse two of the indices: ijk=1, jik=-1, jki=1, kji=-1 and so on and so forth. Here are the useful Levi-Civita identities as they relate to cross product:

Using these small tools, the first vector identity that I need is a curl of a curl. I derive it here:

Let’s see how this works. I’ve used colors to show the major substitutions and tried to draw arrows where they belong. If you follow the math, you’ll note that the Kroenecker deltas have the intriguing property of trading out indices in these sums. Kroenecker delta works on a finite sum the same way a Dirac delta works on an integral, which is nothing more than an infinite sum. Also, the index convention says that if you see duplicated indices, but without a sum on that index, you associate a sum with that index… this is how I located the divergences in that last step. This identity is a soft stopping point for the double curl: I could have used the derivative produce rule to expand it further, but that isn’t needed (if you want to see it get really complex, go ahead and try it! It’s do-able.) One will note that I have double del applied on a vector here… I said that it only applies on scalars above… in this form, it would only act on the scalar portion of each vector component, meaning that you would end up with a sum of three terms multiplied by unit vectors! Double del only ever acts on scalars, but you actually don’t need to know that in the derivation below.

This first vector identity I’ve produced I’ll call I4:

Here’s a second useful identity that I’ll need to develop:

This identity I’ll call I5:

*Pant Pant* I’ve collected all the identities I need to make this work. If you don’t immediately know something off the top of your head, you can develop the pieces you need. I will use I1, I2, I3, I4 and I5 together to derive the Maxwell-Ampere Law from Biot-Savart. Most of the following derivation comes from *Jackson Electrodynamics*, with a few small embellishments of my own.

In this first line of the derivation, I’ve rewritten Biot-Savart with the constants outside the integral and everything variable inside. Inside the integral, I’ve split the meat so that the different vector and scalar elements are clear. In what follows, it’s very important to remember that unprimed del operators are in a different space from the primed del operators: a value (like J) that is dependent on the primed position variable is essentially a constant with respect to the unprimed operator and will render a zero in a derivative by the unprimed del. Moreover, unprimed del can be moved into or out of the integral, which is with respect to the primed position coordinates. This observation is profoundly important to this derivation.

The usage of the first two identities here manages to extract the cross product from the midst of the function and puts it into a manipulable position where the del is unprimed while the integral is primed, letting me move it out of the integrand if I want.

This intermediate contains another very important magnetic quantity in the form of the vector potential (A) –“A” here not to be confused with the alphabetical placeholder I used while deriving my vector identities. I may come back to vector potential later, but this is simply an interesting stop-over for now. From here, we press on toward the Maxwell-Ampere law by acting in from the left with a curl onto the magnetic field…

The Dirac delta I end with in the final term allows me to collapse r’ into r at the expense of that last integral. At this point, I’ve actually produced the magnetostatic Ampere’s law if I feel like claiming that the current has no divergence, but I will talk about this later…

This substitution switches del from being unprimed to primed, putting it in the same terms as the current vector J. I use integration by parts next to switch which element of the first term the primed del is acting on.

Were I being really careful about how I depicted the integration by parts, there would be a unit vector dotted into the J in order to turn it into a scalar sum in that first term ahead of the integral… this is a little sloppy on my part, but nobody ever cares about *that* term anyway because it’s presupposed to vanish at the limits where it’s being evaluated. This is a physicist trick similar to pulling a rug over a mess on the floor –I’ve seen it performed in many contexts.

This substitution is not one of the mathematical identities I created above, this is purely physics. In this case, I’ve used conservation of charge to connect the divergence of the current vector to the change in charge density over time. If you don’t recognize the epic nature of this particular substitution, take my word for it… I’ve essentially inverted magnetostatics into electrodynamics, assuring that a ‘current’ is actually a form of moving charge.

In this line, I’ve switched the order of the derivatives again. Nothing in the integral is dependent on time except the charge density, so almost everything can pass through the derivative with respect to time. On the other hand, only the distance is dependent on the unprimed r, meaning that the unprimed del can pass inward through everything in the opposite direction.

At this point something amazing has emerged from the math. Pardon the pun; I’m feeling punchy. The quantity I’ve highlighted blue is a form of Coulomb’s law! If that name doesn’t tickle you at the base of your spine, what you’re looking at is the electrostatic version of the Biot-Savart law, which makes electric fields from electric charges. This is one of the reasons I like this derivation and why I decided to go ahead and detail the whole thing. This shows explicitly a connection between magnetism and electrostatics where such connection was not previously clear.

And thus ends the derivation. In this casting, the curl of the magnetic field is dependent both on the electric field and on currents. If there is no time varying electric field, that first term vanishes and you get the plain old magnetostatic Ampere’s law:

This says simply that the curl of the magnetic field is equal to the current. There are some interesting qualities to this equation because of how the derivation leaves only a single positional dependence. As you can see, there is no separate position coordinate to describe magnetic field independently from its source. And, really, it isn’t describing the magnetic field as ‘generated’ by the current, but rather that a deformation to the linearity of the magnetic field is due to the presence of a current at that location… which is an interesting way to relate the two.

This relationship tends to cause magnetic lines to orbit around the current vector.

This image from hyperphysics sums up the whole situation –I realize I’ve been saying something similar from way up, but this equation is proof. If you have current passing along a wire, magnetic field will tend to wrap around the wire in a right handed sense. For all intents and purposes, this is all the Ampere’s law says, neglecting that you can manipulate the geometry of the situation to make the field do some interesting things. But, this is all.

Well, so what? I did a lot of math. What, if anything, have I gained from it? How does this help me along the path to understanding magnets?

The Ampere Law is useful in generating very simple magnetic field configurations that can be used in the Lorentz force law, ultimately showing a direct dynamical connection between moving currents and magnetic fields. I have it in mind to show a freshman level example of how this is done in the next part of this series. Given the length of this post, I will do more math in a different post.

This is a big step in the direction of learning how magnets work, but it should leave you feeling a little unsatisfied. How exactly do the forces work? In physics, it is widely known that magnetic fields do no work, so why is it that bar magnets can drag each other across the counter? That sure looks like work to me! And if electric currents are necessary to drive magnets, why is it that bar magnets and horseshoe magnets don’t require batteries? Where are the electric currents that animate a bar magnet and how is it that they seem to be unlimited or unpowered? These questions remain to be addressed.

Until the next post…

]]>