You are viewing the html version of General Relativity, by Benjamin Crowell. This version is only designed for casual browsing, and may have some formatting problems. For serious reading, you want the Adobe Acrobat version. |
General relativity describes gravitation as a curvature of spacetime, with matter acting as the source of the curvature in the same way that electric charge acts as the source of electric fields. Our goal is to arrive at Einstein's field equations, which relate the local intrinsic curvature to the locally ambient matter in the same way that Gauss's law relates the local divergence of the electric field to the charge density. The locality of the equations is necessary because relativity has no action at a distance; cause and effect propagate at a maximum velocity of \(c(=1)\).
The hard part is arriving at the right way of defining curvature. We've already seen that it can be tricky to distinguish intrinsic curvature, which is real, from extrinsic curvature, which can never produce observable effects. E.g., example 4 on page 95 showed that spheres have intrinsic curvature, while cylinders do not. The manifestly intrinsic tensor notation protects us from being misled in this respect. If we can formulate a definition of curvature expressed using only tensors that are expressed without reference to any preordained coordinate system, then we know it is physically observable, and not just a superficial feature of a particular model.
As an example, drop two rocks side by side, b. Their trajectories are vertical, but on a \((t,x)\) coordinate plot rendered in the Earth's frame of reference, they appear as parallel parabolas. The curvature of these parabolas is extrinsic. The Earth-fixed frame of reference is defined by an observer who is subject to non-gravitational forces, and is therefore not a valid Lorentz frame. In a free-falling Lorentz frame \((t',x')\), the two rocks are either motionless or moving at constant velocity in straight lines. We can therefore see that the curvature of world-lines in a particular coordinate system is not an intrinsic measure of curvature; it can arise simply from the choice of the coordinate system. What would indicate intrinsic curvature would be, for example, if geodesics that were initially parallel were to converge or diverge.
Nor is the metric a measure of intrinsic curvature. In example 19 on page 140, we found the metric for an accelerated observer to be
where the primes indicate the accelerated observer's frame. The fact that the timelike element is not equal to \(-1\) is not an indication of intrinsic curvature. It arises only from the choice of the coordinates \((t',x')\) defined by a frame tied to the accelerating rocket ship.
The fact that the above metric has nonvanishing derivatives, unlike a constant Lorentz metric, does indicate the presence of a gravitational field. However, a gravitational field is not the same thing as intrinsic curvature. The gravitational field seen by an observer aboard the ship is, by the equivalence principle, indistinguishable from an acceleration, and indeed the Lorentzian observer in the earth's frame does describe it as arising from the ship's acceleration, not from a gravitational field permeating all of space. Both observers must agree that “I got plenty of nothin' ” --- that the region of the universe to which they have access lacks any stars, neutrinos, or clouds of dust. The observer aboard the ship must describe the gravitational field he detects as arising from some source very far away, perhaps a hypothetical vast sheet of lead lying billions of light-years aft of the ship's deckplates. Such a hypothesis is fine, but it is unrelated to the structure of our hoped-for field equation, which is to be local in nature.
Not only does the metric tensor not represent the gravitational field, but no tensor can represent it. By the equivalence principle, any gravitational field seen by observer A can be eliminated by switching to the frame of a free-falling observer B who is instantaneously at rest with respect to A at a certain time. The structure of the tensor transformation law guarantees that A and B will agree on whether a given tensor is zero at the point in spacetime where they pass by one another. Since they agree on all tensors, and disagree on the gravitational field, the gravitational field cannot be a tensor.
We therefore conclude that a nonzero intrinsic curvature of the type that is to be included in the Einstein field equations is not encoded in any simple way in the metric or its first derivatives. Since neither the metric nor its first derivatives indicate curvature, we can reasonably conjecture that the curvature might be encoded in its second derivatives.
A further complication is the need to distinguish tidal curvature from curvature caused by local sources. Figure a shows Comet Shoemaker-Levy, broken up into a string of fragments by Jupiter's tidal forces shortly before its spectacular impact with the planet in 1994. Immediately after each fracture, the newly separated chunks had almost zero velocity relative to one another, so once the comet finished breaking up, the fragments' world-lines were a sheaf of nearly parallel lines separated by spatial distances of only \(~1\) km. These initially parallel geodesics then diverged, eventually fanning out to span millions of kilometers.
If initially parallel lines lose their parallelism, that is clearly an indication of intrinsic curvature. We call it a measure of sectional curvature, because the loss of parallelism occurs within a particular plane, in this case the \((t,x)\) plane represented by figure b.
But this curvature was not caused by a local source lurking in among the fragments. It was caused by a distant source: Jupiter. We therefore see that the mere presence of sectional curvature is not enough to demonstrate the existence of local sources. Even the sign of the sectional curvature is not a reliable indication. Although this example showed a divergence of initially parallel geodesics, referred to as a negative curvature, it is also possible for tidal forces exerted by distant masses to create positive curvature. For example, the ocean tides on earth oscillate both above and below mean sea level, c.
As an example that really would indicate the presence of a local source, we could release a cloud of test masses at rest in a spherical shell around the earth, and allow them to drop, d. We would then have positive and equal sectional curvature in the \(t-x\), \(t-y\), and \(t-z\) planes. Such an observation cannot be due to a distant mass. It demonstrates an over-all contraction of the volume of an initially parallel sheaf of geodesics, which can never be induced by tidal forces. The earth's oceans, for example, do not change their total volume due to the tides, and this would be true even if the oceans were a gas rather than an incompressible fluid. It is a unique property of \(1/r^2\) forces such as gravity that they conserve volume in this way; this is essentially a restatement of Gauss's law in a vacuum.
In general, the curvature of spacetime will contain contributions from both tidal forces and local sources, superimposed on one another. To develop the right formulation for the Einstein field equations, we need to eliminate the tidal part. Roughly speaking, we will do this by averaging the sectional curvature over all three of the planes \(t-x\), \(t-y\), and \(t-z\), giving a measure of curvature called the Ricci curvature. The “roughly speaking” is because such a prescription would treat the time and space coordinates in an extremely asymmetric manner, which would violate local Lorentz invariance.
To get an idea of how this would work, let's compare with the Newtonian case, where there really is an asymmetry between the treatment of time and space. In the Cartan curved-spacetime theory of Newtonian gravity (page 41), the field equation has a kind of scalar Ricci curvature on one side, and on the other side is the density of mass, which is also a scalar. In relativity, however, the source term in the equation clearly cannot be the scalar mass density. We know that mass and energy are equivalent in relativity, so for example the curvature of spacetime around the earth depends not just on the mass of its atoms but also on all the other forms of energy it contains, such as thermal energy and electromagnetic and nuclear binding energy. Can the source term in the Einstein field equations therefore be the mass-energy \(E\)? No, because \(E\) is merely the timelike component of a particle's momentum four-vector. To single it out would violate Lorentz invariance just as much as an asymmetric treatment of time and space in constructing a Ricci measure of curvature. To get a properly Lorentz invariant theory, we need to find a way to formulate everything in terms of tensor equations that make no explicit reference to coordinates. The proper generalization of the Newtonian mass density in relativity is the stress-energy tensor \(T^{ij}\), whose 16 elements measure the local density of mass-energy and momentum, and also the rate of transport of these quantities in various directions. If we happen to be able to find a frame of reference in which the local matter is all at rest, then \(T^{tt}\) represents the mass density. The reason for the word “stress” in the name is that, for example, the flux of \(x\)-momentum in the \(x\) direction is a measure of pressure.
For the purposes of the present discussion, it's not necessary to introduce the explicit definition of \(T\); the point is merely that we should expect the Einstein field equations to be tensor equations, which tells us that the definition of curvature we're seeking clearly has to be a rank-2 tensor, not a scalar. The implications in four-dimensional spacetime are fairly complex. We'll end up with a rank-4 tensor that measures the sectional curvature, and a rank-2 Ricci tensor derived from it that averages away the tidal effects. The Einstein field equations then relate the Ricci tensor to the energy-momentum tensor in a certain way. The stress-energy tensor is discussed further in section 8.1.2 on page 269.
Since the curvature tensors in 3+1 dimensions are complicated, let's start by considering lower dimensions. In one dimension, a, there is no such thing as intrinsic curvature. This is because curvature describes the failure of parallelism to behave as in E5, but there is no notion of parallelism in one dimension.
The lowest interesting dimension is therefore two, and this case was studied by Carl Friedrich Gauss in the early nineteenth century. Gauss ran a geodesic survey of the state of Hanover, inventing an optical surveying instrument called a heliotrope that in effect was used to cover the Earth's surface with a triangular mesh of light rays. If one of the mesh points lies, for example, at the peak of a mountain, then the sum \(\Sigma\theta\) of the angles of the vertices meeting at that point will be less than \(2\pi\), in contradiction to Euclid. Although the light rays do travel through the air above the dirt, we can think of them as approximations to geodesics painted directly on the dirt, which would be intrinsic rather than extrinsic. The angular defect around a vertex now vanishes, because the space is locally Euclidean, but we now pick up a different kind of angular defect, which is that the interior angles of a triangle no longer add up to the Euclidean value of \(\pi\).
In d/1, the survey is extrinsic, because the lines pass below the surface of the sphere. The curvature is detectable because the angles at each vertex add up to \(120+120+110=350\) degrees, giving an angular defect of 10 degrees.
In d/2, the lines have been projected to form arcs of great circles on the surface of the sphere. Because the space is locally Euclidean, the sum of the angles at a vertex has its Euclidean value of 360 degrees. The curvature can be detected, however, because the sum of the internal angles of a polygon is greater than the Euclidean value. For example, each spherical hexagon gives a sum of \(6\times 124.31\) degrees, rather than the Euclidean \(6\times120\). The angular defect of \(6\times4.31\) degrees is an intrinsic measure of curvature.
This example suggests another way of measuring intrinsic curvature, in terms of the ratio \(C/r\) of the circumference of a circle to its radius. In Euclidean geometry, this ratio equals \(2\pi\). Let \(\rho\) be the radius of the Earth, and consider the equator to be a circle centered on the north pole, so that its radius is the length of one of the sides of the triangle in figure e, \(r=(\pi/2)\rho\). (Don't confuse \(r\), which is intrinsic, with \(\rho\), the radius of the sphere, which is extrinsic and not equal to \(r\).) Then the ratio \(C/r\) is equal to 4, which is smaller than the Euclidean value of \(2\pi\).
Let \(\epsilon=\Sigma\theta-\pi\) be the angular defect of a triangle, and for concreteness let the triangle be in a space with an elliptic geometry, so that it has constant curvature and can be modeled as a sphere of radius \(\rho\), with antipodal points identified.
Self-check: In elliptic geometry, what is the minimum possible value of the quantity \(C/r\) discussed in example 2? How does this differ from the case of spherical geometry?
We want a measure of curvature that is local, but if our space is locally flat, we must have \(\epsilon\rightarrow0\) as the size of the triangles approaches zero. This is why Euclidean geometry is a good approximation for small-scale maps of the earth. The discrete nature of the triangular mesh is just an artifact of the definition, so we want a measure of curvature that, unlike \(\epsilon\), approaches some finite limit as the scale of the triangles approaches zero. Should we expect this scaling to go as \(\epsilon\propto \rho\)? \(\rho^2\)? Let's determine the scaling. First we prove a classic lemma by Gauss, concerning a slightly different version of the angular defect, for a single triangle.
Theorem: In elliptic geometry, the angular defect \(\epsilon=\alpha+\beta+\gamma-\pi\)
of a triangle is proportional to its area \(A\).
Proof: By axiom E2, extend each side of the triangle to form a line, figure f/1.
Each pair of lines crosses at only one point (E1)
and divides the plane into two lunes with their four vertices touching at this
point, figure f/2.
Of the six lunes, we focus on the three shaded ones, which overlap the triangle.
In each of these, the two interior angles
at the vertex are the same (Euclid I.15).
The area of a lune is proportional to its interior angle, as follows from dissection into narrower lunes;
since a lune with an interior angle of \(\pi\) covers the entire area \(P\) of the plane, the constant
of proportionality is \(P/\pi\).
The sum of the areas of the three lunes is \((P/\pi)(\alpha+\beta+\gamma)\), but these three areas also cover the entire plane,
overlapping three times on the given triangle, and therefore their sum also equals \(P+2A\).
Equating the two expressions leads to the desired result.
This calculation was purely intrinsic, because it made no use of any model or coordinates. We can therefore construct a measure of curvature that we can be assured is intrinsic, \(K=\epsilon/A\). This is called the Gaussian curvature, and in elliptic geometry it is constant rather than varying from point to point. In the model on a sphere of radius \(\rho\), we have \(K=1/\rho^2\).
Self-check: Verify the equation \(K=1/\rho^2\) by considering a triangle covering one octant of the sphere, as in example 2.
It is useful to introduce normal or Gaussian normal coordinates, defined as follows. Through point O, construct perpendicular geodesics, and define affine coordinates \(x\) and \(y\) along these. For any point P off the axis, define coordinates by constructing the lines through P that cross the axes perpendicularly. For P in a sufficiently small neighborhood of O, these lines exist and are uniquely determined. Gaussian polar coordinates can be defined in a similar way.
Here are two useful interpretations of \(K\).
1. The Gaussian curvature measures the failure of parallelism in the following sense. Let line \(\ell\) be constructed so that it crosses the normal \(y\) axis at \((0,dy)\) at an angle that differs from perpendicular by the infinitesimal amount \(d\alpha\) (figure h). Construct the line \(x'=dx\), and let \(d\alpha'\) be the angle its perpendicular forms with \(\ell\). Then^{4} the Gaussian curvature at O is
where \(d^2\alpha=d\alpha'-d\alpha\).
2. From a point P, emit a fan of rays at angles filling a certain range \(\theta\) of angles in Gaussian polar coordinates (figure i). Let the arc length of this fan at \(r\) be \(L\), which may not be equal to its Euclidean value \(L_E=r\theta\). Then^{5}
Let's now generalize beyond elliptic geometry. Consider a space modeled by a surface embedded in three dimensions, with geodesics defined as curves of extremal length, i.e., the curves made by a piece of string stretched taut across the surface. At a particular point P, we can always pick a coordinate system \((x,y,z)\) such that the surface \(z=\frac{1}{2}k_1x^2+\frac{1}{2}k_2y^2\) locally approximates the surface to the level of precision needed in order to discuss curvature. The surface is either paraboloidal or hyperboloidal (a saddle), depending on the signs of \(k_1\) and \(k_2\). We might naively think that \(k_1\) and \(k_2\) could be independently determined by intrinsic measurements, but as we've seen in example 4 on page 95, a cylinder is locally indistinguishable from a Euclidean plane, so if one \(k\) is zero, the other \(k\) clearly cannot be determined. In fact all that can be measured is the Gaussian curvature, which equals the product \(k_1k_2\). To see why this should be true, first consider that any measure of curvature has units of inverse distance squared, and the \(k\)'s have units of inverse distance. The only possible intrinsic measures of curvature based on the \(k\)'s are therefore \(k_1^2+k_2^2\) and \(k_1k_2\). (We can't have, for example, just \(k_1^2\), because that would change under an extrinsic rotation about the \(z\) axis.) Only \(k_1k_2\) vanishes on a cylinder, so it is the only possible intrinsic curvature.
When people eat pizza by folding the slice lengthwise, they are taking advantage of the intrinsic nature of the Gaussian curvature. Once \(k_1\) is fixed to a nonzero value, \(k_2\) can't change without varying \(K\), so the slice can't droop.
We've seen that figures behaving according to the axioms of elliptic geometry can be modeled on part of a sphere, which is a surface of constant \(K>0\). The model can be made into global one satisfying all the axioms if the appropriate topological properties are ensured by identifying antipodal points. A paraboloidal surface \(z=k_1x^2+k_2y^2\) can be a good local approximation to a sphere, but for points far from its apex, \(K\) varies significantly. Elliptic geometry has no parallels; all lines meet if extended far enough.
A space of constant negative curvature has a geometry called hyperbolic, and is of some interest because it appears to be the one that describes the spatial dimensions of our universe on a cosmological scale. A hyperboloidal surface works locally as a model, but its curvature is only approximately constant; the surface of constant curvature is a horn-shaped one created by revolving a mountain-shaped curve called a tractrix about its axis. The tractrix of revolution is not as satisfactory a model as the sphere is for elliptic geometry, because lines are cut off at the cusp of the horn. Hyperbolic geometry is richer in parallels than Euclidean geometry; given a line \(\ell\) and a point P not on \(\ell\), there are infinitely many lines through P that do not pass through \(\ell\).
Without violating reflection symmetry, it is still conceivable that the flea could determine the orientation of the tip-to-tip line running through his position. Surprisingly, even this is impossible. The flea can only measure the single number \(K\), which carries no information about directions in space.
We might not have been able to guess the pattern in advance, but we can verify that some of its features make sense. For example, charge A has more neighbors on the right than on the left, which would tend to make it accelerate off to the left. But when we look at the picture as a whole, it appears reasonable that this is prevented by the larger number of more distant charges on its left than on its right.
There also seems to be a pattern to the nonuniformity: the charges collect more densely in areas like B, where the Gaussian curvature is large, and less densely in areas like C, where \(K\) is nearly zero (slightly negative).
To understand the reason for this pattern, consider l/3. It's straightforward to show that the density of charge \(\sigma\) on each sphere is inversely proportional to its radius, or proportional to \(K^{1/2}\). Lord Kelvin proved that on a conducting ellipsoid, the density of charge is proportional to the distance from the center to the tangent plane, which is equivalent^{1} to \(\sigma\propto K^{1/4}\); this result looks similar except for the different exponent. McAllister showed in 1990^{2} that this \(K^{1/4}\) behavior applies to a certain class of examples, but it clearly can't apply in all cases, since, for example, \(K\) could be negative, or we could have a deep concavity, which would form a Faraday cage. Problem 1 on p. 200 discusses the case of a knife-edge.
Similar reasoning shows why Benjamin Franklin used a sharp tip when he invented the lightning rod. The charged stormclouds induce positive and negative charges to move to opposite ends of the rod. At the pointed upper end of the rod, the charge tends to concentrate at the point, and this charge attracts the lightning. The same effect can sometimes be seen when a scrap of aluminum foil is inadvertently put in a microwave oven. Modern experiments^{3} show that although a sharp tip is best at starting a spark, a more moderate curve, like the right-hand tip of the pear in this example, is better at successfully sustaining the spark for long enough to connect a discharge to the clouds.
The example of the flea suggests that if we want to express curvature as a tensor, it should have even rank. Also, in a coordinate system in which the coordinates have units of distance (they are not angles, for instance, as in spherical coordinates), we expect that the units of curvature will always be inverse distance squared. More elegantly, we expect that under a uniform rescaling of coordinates by a factor of \(\mu\), a curvature tensor should scale down by \(\mu^{-2}\).
Combining these two facts, we find that a curvature tensor should have one of the forms \(R_{ab}\), \(R^a_{bcd}\), ..., i.e., the number of lower indices should be two greater than the number of upper indices. The following definition has this property, and is equivalent to the earlier definitions of the Gaussian curvature that were not written in tensor notation.
Definition of the Riemann curvature tensor: Let \(dp^c\) and \(dq^d\) be two infinitesimal vectors, and use them to form a quadrilateral that is a good approximation to a parallelogram.^{6} Parallel-transport vector \(v^b\) all the way around the parallelogram. When it comes back to its starting place, it has a new value \(v^b \rightarrow v^b+dv^b\). Then the Riemann curvature tensor is defined as the tensor that computes \(dv^a\) according to \(dv^a=R^a_{bcd}v^bdp^cdq^d\). (There is no standardization in the literature of the order of the indices.)
If vectors \(dp^c\) and \(dq^d\) lie along the same line, then \(dv^a\) must vanish, and interchanging \(dp^c\) and \(dq^d\) simply reverses the direction of the circuit around the quadrilateral, giving \(dv^a \rightarrow -dv^a\). This shows that \(R^a_{bcd}\) must be antisymmetric under interchange of the indices \(c\) and \(d\), \(R^a_{bcd}=-R^a_{bdc}\).
In local normal coordinates, the interpretation of the Riemann tensor becomes particularly transparent. The constant-coordinate lines are geodesics, so when the vector \(v^b\) is transported along them, it maintains a constant angle with respect to them. Any rotation of the vector after it is brought around the perimeter of the quadrilateral can therefore be attributed to something that happens at the vertices. In other words, it is simply a measure of the angular defect. We can therefore see that the Riemann tensor is really just a tensorial way of writing the Gaussian curvature \(K=d\epsilon/dA\).
In normal coordinates, the local geometry is nearly Cartesian, and when we take the product of two vectors in an antisymmetric manner, we are essentially measuring the area of the parallelogram they span, as in the three-dimensional vector cross product. We can therefore see that the Riemann tensor tells us something about the amount of curvature contained within the infinitesimal area spanned by \(dp^c\) and \(dq^d\). A finite two-dimensional region can be broken down into infinitesimal elements of area, and the Riemann tensor integrated over them. The result is equal to the finite change \(\Delta v^b\) in a vector transported around the whole boundary of the region.
Let's find the curvature tensors on a sphere of radius \(\rho\).
Construct normal coordinates \((x,y)\) with origin O, and let vectors \(dp^c\) and \(dq^d\) represent infinitesimal displacements along \(x\) and \(y\), forming a quadrilateral as described above. Then \(R^x_{yxy}\) represents the change in the \(x\) direction that occurs in a vector that is initially in the \(y\) direction. If the vector has unit magnitude, then \(R^x_{yxy}\) equals the angular deficit of the quadrilateral. Comparing with the definition of the Gaussian curvature, we find \(R^x_{yxy}=K=1/\rho^2\). Interchanging \(x\) and \(y\), we find the same result for \(R^y_{xyx}\). Thus although the Riemann tensor in two dimensions has sixteen components, only these two are nonzero, and they are equal to each other.
This result represents the defect in parallel transport around a closed loop per unit area. Suppose we parallel-transport a vector around an octant, as shown in figure b. The area of the octant is \((\pi/2)\rho^2\), and multiplying it by the Riemann tensor, we find that the defect in parallel transport is \(\pi/2\), i.e., a right angle, as is also evident from the figure.
The above treatment may be somewhat misleading in that it may lead you to believe that there is a single coordinate system in which the Riemann tensor is always constant. This is not the case, since the calculation of the Riemann tensor was only valid near the origin O of the normal coordinates. The character of these coordinates becomes quite complicated far from O; we end up with all our constant-\(x\) lines converging at north and south poles of the sphere, and all the constant-\(y\) lines at east and west poles.
Angular coordinates \((\phi,\theta)\) are more suitable as a large-scale description of the sphere. We can use the tensor transformation law to find the Riemann tensor in these coordinates. If O, the origin of the \((x,y)\) coordinates, is at coordinates \((\phi,\theta)\), then \(dx/d \phi=\rho\sin\theta\) and \(dy/d \theta=\rho\). The result is \(R^\phi_{\theta\phi\theta}=R^x_{yxy}(dy/d \theta)^2=1\) and \(R^\theta_{\phi\theta\phi}=R^y_{xyx}(dx/d \phi)^2=\sin^2\theta\). The variation in \(R^\theta_{\phi\theta\phi}\) is not due to any variation in the sphere's intrinsic curvature; it represents the behavior of the coordinate system.
The Riemann tensor only measures curvature within a particular plane, the one defined by \(dp^c\) and \(dq^d\), so it is a kind of sectional curvature. Since we're currently working in two dimensions, however, there is only one plane, and no real distinction between sectional curvature and Ricci curvature, which is the average of the sectional curvature over all planes that include \(dq^d\): \(R_{cd}=R^a_{cad}\). The Ricci curvature in two spacelike dimensions, expressed in normal coordinates, is simply the diagonal matrix \(\text{diag}(K,K)\).
How could we confirm experimentally that parallel transport around a closed path can cause a vector to rotate? The rotation is related to the amount of spacetime curvature contained within the path, so it would make sense to choose a loop going around a gravitating body. The rotation is a purely relativistic effect, so we expect it to be small. To make it easier to detect, we should go around the loop many times, causing the effect to accumulate. This is essentially a description of a body orbiting another body. A gyroscope aboard the orbiting body is expected to precess. This is known as the geodetic effect. In 1916, shortly after Einstein published the general theory of relativity, Willem de Sitter calculated the effect on the earth-moon system. The effect was not directly verified until the 1980's, and the first high-precision measurement was in 2007, from analysis of the results collected by the Gravity Probe B satellite experiment. The probe carried four gyroscopes made of quartz, which were the most perfect spheres ever manufactured, varying from sphericity by no more than about 40 atoms.
Let's estimate the size of the effect. The first derivative of the metric is, roughly, the gravitational field, whereas the second derivative has to do with curvature. The curvature of spacetime around the earth should therefore vary as \(GMr^{-3}\), where \(M\) is the earth's mass and \(G\) is the gravitational constant. The area enclosed by a circular orbit is proportional to \(r^2\), so we expect the geodetic effect to vary as \(nGM/r\), where \(n\) is the number of orbits. The angle of precession is unitless, and the only way to make this result unitless is to put in a factor of \(1/c^2\). In units with \(c=1\), this factor is unnecessary. In ordinary metric units, the \(1/c^2\) makes sense, because it causes the purely relativistic effect to come out to be small. The result, up to unitless factors that we didn't pretend to find, is
We might also expect a Thomas precession. Like the spacetime curvature effect, it would be proportional to \(nGM/c^2r\). Since we're not worrying about unitless factors, we can just lump the Thomas precession together with the effect already calculated.
The data for Gravity Probe B are \(r=r_e+(650\ \text{km})\) and \(n \approx 5000\) (orbiting once every 90 minutes for the 353-day duration of the experiment), giving \(\Delta\theta \sim 3\times10^{-6}\) radians. Figure b shows the actual results^{8} the four gyroscopes aboard the probe. The precession was about 6 arc-seconds, or \(3\times10^{-5}\) radians. Our crude estimate was on the right order of magnitude. The missing unitless factor on the right-hand side of the equation above is \(3\pi\), which brings the two results into fairly close quantitative agreement. The full derivation, including the factor of \(3\pi\), is given on page 214.
Let's estimate the size of this effect. We've already seen that the Riemann tensor is essentially just a tensorial way of writing the Gaussian curvature \(K=d\epsilon/dA\). Suppose, for the sake of this rough estimate, that the sun, earth, and star form a non-Euclidean triangle with a right angle at the sun. Then the angular deflection is the same as the angular defect \(\epsilon\) of this triangle, and equals the integral of the curvature over the interior of the triangle. Ignoring unitless constants, this ends up being exactly the same calculation as in section 5.5.1, and the result is \(\epsilon\sim GM/c^2r\), where \(r\) is the light ray's distance of closest approach to the sun. The value of \(r\) can't be less than the radius of the sun, so the maximum size of the effect is on the order of \(GM/c^2r\), where \(M\) is the sun's mass, and \(r\) is its radius. We find \(\epsilon\sim10^{-5}\) radians, or about a second of arc. To measure a star's position to within an arc second was well within the state of the art in 1919, under good conditions in a comfortable observatory. This observation, however, required that Eddington's team travel to the island of Principe, off the coast of West Africa. The weather was cloudy, and only during the last 10 seconds of the seven-minute eclipse did the sky clear enough to allow photographic plates to be taken of the Hyades star cluster against the background of the eclipse-darkened sky. The observed deflection was 1.6 seconds of arc, in agreement with the relativistic prediction. The relativistic prediction is derived on page 222.
In the preceding section we were able to estimate a nontrivial general relativistic effect, the geodetic precession of the gyroscopes aboard Gravity Probe B, up to a unitless constant \(3\pi\). Let's think about what additional machinery would be needed in order to carry out the calculation in detail, including the \(3\pi\).
First we would need to know the Einstein field equation, but in a vacuum this is fairly straightforward: \(R_{ab}=0\). Einstein posited this equation based essentially on the considerations laid out in section 5.1.
But just knowing that a certain tensor vanishes identically in the space surrounding the earth clearly doesn't tell us anything explicit about the structure of the spacetime in that region. We want to know the metric. As suggested at the beginning of the chapter, we expect that the first derivatives of the metric will give a quantity analogous to the gravitational field of Newtonian mechanics, but this quantity will not be directly observable, and will not be a tensor. The second derivatives of the metric are the ones that we expect to relate to the Ricci tensor \(R_{ab}\).
To see how this issue arises, let's retreat to the more familiar terrain of electromagnetism. In quantum mechanics, the phase of a charged particle's wavefunction is unobservable, so that for example the transformation \(\Psi \rightarrow -\Psi\) does not change the results of experiments. As a less trivial example, we can redefine the ground of our electrical potential, \(\Phi \rightarrow \Phi+\delta\Phi\), and this will add a constant onto the energy of every electron in the universe, causing their phases to oscillate at a greater rate due to the quantum-mechanical relation \(E=hf\). There are no observable consequences, however, because what is observable is the phase of one electron relative to another, as in a double-slit interference experiment. Since every electron has been made to oscillate faster, the effect is simply like letting the conductor of an orchestra wave her baton more quickly; every musician is still in step with every other musician. The rate of change of the wavefunction, i.e., its derivative, has some built-in ambiguity.
For simplicity, let's now restrict ourselves to spin-zero particles, since details of electrons' polarization clearly won't tell us anything useful when we make the analogy with relativity. For a spin-zero particle, the wavefunction is simply a complex number, and there are no observable consequences arising from the transformation \(\Psi \rightarrow \Psi' = e^{i\alpha} \Psi\), where \(\alpha\) is a constant. The transformation \(\Phi \rightarrow \Phi-\delta\Phi\) is also allowed, and it gives \(\alpha(t)=(q\delta\Phi/\hbar) t\), so that the phase factor \(e^{i\alpha(t)}\) is a function of time \(t\). Now from the point of view of electromagnetism in the age of Maxwell, with the electric and magnetic fields imagined as playing their roles against a background of Euclidean space and absolute time, the form of this time-dependent phase factor is very special and symmetrical; it depends only on the absolute time variable. But to a relativist, there is nothing very nice about this function at all, because there is nothing special about a time coordinate. If we're going to allow a function of this form, then based on the coordinate-invariance of relativity, it seems that we should probably allow \(\alpha\) to be any function at all of the spacetime coordinates. The proper generalization of \(\Phi \rightarrow \Phi-\delta\Phi\) is now \(A_b \rightarrow A_b-\partial_b \alpha\), where \(A_b\) is the electromagnetic potential four-vector (section 4.2.5, page 137).
Self-check: Suppose we said we would allow \(\alpha\) to be a function of \(t\), but forbid it to depend on the spatial coordinates. Prove that this would violate Lorentz invariance.
The transformation has no effect on the electromagnetic fields, which are the direct observables. We can also verify that the change of gauge will have no effect on observable behavior of charged particles. This is because the phase of a wavefunction can only be determined relative to the phase of another particle's wavefunction, when they occupy the same point in space and, for example, interfere. Since the phase shift depends only on the location in spacetime, there is no change in the relative phase.
But bad things will happen if we don't make a corresponding adjustment to the derivatives appearing in the Schrödinger equation. These derivatives are essentially the momentum operators, and they give different results when applied to \(\Psi'\) than when applied to \(\Psi\):
To avoid getting incorrect results, we have to do the substitution \(\partial_b \rightarrow \partial_b+ieA_b\), where the correction term compensates for the change of gauge. We call the operator \(\nabla\) defined as
the covariant derivative. It gives the right answer regardless of a change of gauge.
Now consider how all of this plays out in the context of general relativity. The gauge transformations of general relativity are arbitrary smooth changes of coordinates. One of the most basic properties we could require of a derivative operator is that it must give zero on a constant function. A constant scalar function remains constant when expressed in a new coordinate system, but the same is not true for a constant vector function, or for any tensor of higher rank. This is because the change of coordinates changes the units in which the vector is measured, and if the change of coordinates is nonlinear, the units vary from point to point.
Consider the one-dimensional case, in which a vector \(v^a\) has only one component, and the metric is also a single number, so that we can omit the indices and simply write \(v\) and \(g\). (We just have to remember that \(v\) is really a covariant vector, even though we're leaving out the upper index.) If \(v\) is constant, its derivative \(dv/dx\), computed in the ordinary way without any correction term, is zero. If we further assume that the coordinate \(x\) is a normal coordinate, so that the metric is simply the constant \(g=1\), then zero is not just the answer but the right answer. (The existence of a preferred, global set of normal coordinates is a special feature of a one-dimensional space, because there is no curvature in one dimension. In more than one dimension, there will typically be no possible set of coordinates in which the metric is constant, and normal coordinates only give a metric that is approximately constant in the neighborhood around a certain point. See figure g pn page 164 for an example of normal coordinates on a sphere, which do not have a constant metric.)
Now suppose we transform into a new coordinate system \(X\), which is not normal. The metric \(G\), expressed in this coordinate system, is not constant. Applying the tensor transformation law, we have \(V = v dX/dx\), and differentiation with respect to \(X\) will not give zero, because the factor \(dX/dx\) isn't constant. This is the wrong answer: \(V\) isn't really varying, it just appears to vary because \(G\) does.
We want to add a correction term onto the derivative operator \(d/dX\), forming a covariant derivative operator \(\nabla_X\) that gives the right answer. This correction term is easy to find if we consider what the result ought to be when differentiating the metric itself. In general, if a tensor appears to vary, it could vary either because it really does vary or because the metric varies. If the metric itself varies, it could be either because the metric really does vary or ... because the metric varies. In other words, there is no sensible way to assign a nonzero covariant derivative to the metric itself, so we must have \(\nabla_X G=0\). The required correction therefore consists of replacing \(d/dX\) with
Applying this to \(G\) gives zero. \(G\) is a second-rank contravariant tensor. If we apply the same correction to the derivatives of other second-rank contravariant tensors, we will get nonzero results, and they will be the right nonzero results. For example, the covariant derivative of the stress-energy tensor \(T\) (assuming such a thing could have some physical significance in one dimension!) will be \( \nabla_X T=dT/dX-G^{-1}(dG/dX)T\).
Physically, the correction term is a derivative of the metric, and we've already seen that the derivatives of the metric (1) are the closest thing we get in general relativity to the gravitational field, and (2) are not tensors. In 1+1 dimensions, suppose we observe that a free-falling rock has \(dV/dT=9.8\ \text{m}/\text{s}^2\). This acceleration cannot be a tensor, because we could make it vanish by changing from Earth-fixed coordinates \(X\) to free-falling (normal, locally Lorentzian) coordinates \(x\), and a tensor cannot be made to vanish by a change of coordinates. According to a free-falling observer, the vector \(v\) isn't changing at all; it is only the variation in the Earth-fixed observer's metric \(G\) that makes it appear to change.
Mathematically, the form of the derivative is \((1/y)dy/dx\), which is known as a logarithmic derivative, since it equals \(d(\ln y)/dx\). It measures the multiplicative rate of change of \(y\). For example, if \(y\) scales up by a factor of \(k\) when \(x\) increases by 1 unit, then the logarithmic derivative of \(y\) is \(\ln k\). The logarithmic derivative of \(e^{cx}\) is \(c\). The logarithmic nature of the correction term to \(\nabla_X\) is a good thing, because it lets us take changes of scale, which are multiplicative changes, and convert them to additive corrections to the derivative operator. The additivity of the corrections is necessary if the result of a covariant derivative is to be a tensor, since tensors are additive creatures.
What about quantities that are not second-rank covariant tensors? Under a rescaling of contravariant coordinates by a factor of \(k\), covariant vectors scale by \(k^{-1}\), and second-rank covariant tensors by \(k^{-2}\). The correction term should therefore be half as much for covariant vectors,
and should have an opposite sign for contravariant vectors.
Generalizing the correction term to derivatives of vectors in more than one dimension, we should have something of this form:
where \(\Gamma^b_{ac}\), called the Christoffel symbol, does not transform like a tensor, and involves derivatives of the metric. (“Christoffel” is pronounced “Krist-AWful,” with the accent on the middle syllable.) The explicit computation of the Christoffel symbols from the metric is deferred until section 5.9, but the intervening sections 5.7 and 5.8 can be omitted on a first reading without loss of continuity.
An important gotcha is that when we evaluate a particular component of a covariant derivative such as \(\nabla_2 v^3\), it is possible for the result to be nonzero even if the component \(v^3\) vanishes identically. This can be seen in example 5 on p. 279 and example 21 on p. 318.
At P, the plane's velocity vector points directly west. At Q, over New England, its velocity has a large component to the south. Since the path is a geodesic and the plane has constant speed, the velocity vector is simply being parallel-transported; the vector's covariant derivative is zero. Since we have \(v_\theta=0\) at P, the only way to explain the nonzero and positive value of \(\partial_\phi v^\theta\) is that we have a nonzero and negative value of \(\Gamma^\theta_{\phi\phi}\).
By symmetry, we can infer that \(\Gamma^\theta_{\phi\phi}\) must have a positive value in the southern hemisphere, and must vanish at the equator.
\(\Gamma^\theta_{\phi\phi}\) is computed in example 10 on page 188.
Symmetry also requires that this Christoffel symbol be independent of \(\phi\), and it must also be independent of the radius of the sphere.
Example 9 is in two spatial dimensions. In spacetime, \(\Gamma\) is essentially the gravitational field (see problem 6, p. 200), and early papers in relativity essentially refer to it that way.^{9} This may feel like a joyous reunion with our old friend from freshman mechanics, \(g=9.8\ \text{m}/\text{s}\). But our old friend has changed. In Newtonian mechanics, accelerations like \(g\) are frame-invariant (considering only inertial frames, which are the only legitimate ones in that theory). In general relativity they are frame-dependent, and as we saw on page 176, the acceleration of gravity can be made to equal anything we like, based on our choice of a frame of reference.
To compute the covariant derivative of a higher-rank tensor, we just add more correction terms, e.g.,
With the partial derivative \(\partial_\mu\), it does not make sense to use the metric to raise the index and form \(\partial^\mu\). It does make sense to do so with covariant derivatives, so \(\nabla^a = g^{ab} \nabla_b\) is a correct identity.
Some authors use superscripts with commas and semicolons to indicate partial and covariant derivatives. The following equations give equivalent notations for the same derivatives:
Figure e shows two examples of the corresponding birdtracks notation. Because birdtracks are meant to be manifestly coordinate-independent, they do not have a way of expressing non-covariant derivatives. We no longer want to use the circle as a notation for a non-covariant gradient as we did when we first introduced it on p. 48.
A geodesic can be defined as a world-line that preserves tangency under parallel transport, a. This is essentially a mathematical way of expressing the notion that we have previously expressed more informally in terms of “staying on course” or moving “inertially.”
A curve can be specified by giving functions \(x^i(\lambda)\) for its coordinates, where \(\lambda\) is a real parameter. A vector lying tangent to the curve can then be calculated using partial derivatives, \(T^i=\partial x^i/\partial\lambda\). There are three ways in which a vector function of \(\lambda\) could change: (1) it could change for the trivial reason that the metric is changing, so that its components changed when expressed in the new metric; (2) it could change its components perpendicular to the curve; or (3) it could change its component parallel to the curve. Possibility 1 should not really be considered a change at all, and the definition of the covariant derivative is specifically designed to be insensitive to this kind of thing. 2 cannot apply to \(T^i\), which is tangent by construction. It would therefore be convenient if \(T^i\) happened to be always the same length. If so, then 3 would not happen either, and we could reexpress the definition of a geodesic by saying that the covariant derivative of \(T^i\) was zero. For this reason, we will assume for the remainder of this section that the parametrization of the curve has this property. In a Newtonian context, we could imagine the \(x^i\) to be purely spatial coordinates, and \(\lambda\) to be a universal time coordinate. We would then interpret \(T^i\) as the velocity, and the restriction would be to a parametrization describing motion with constant speed. In relativity, the restriction is that \(\lambda\) must be an affine parameter. For example, it could be the proper time of a particle, if the curve in question is timelike.
The notation of section 5.6 is not quite adapted to our present purposes, since it allows us to express a covariant derivative with respect to one of the coordinates, but not with respect to a parameter such as \(\lambda\). We would like to notate the covariant derivative of \(T^i\) with respect to \(\lambda\) as \(\nabla_\lambda T^i\), even though \(\lambda\) isn't a coordinate. To connect the two types of derivatives, we can use a total derivative. To make the idea clear, here is how we calculate a total derivative for a scalar function \(f(x,y)\), without tensor notation:
This is just the generalization of the chain rule to a function of two variables. For example, if \(\lambda\) represents time and \(f\) temperature, then this would tell us the rate of change of the temperature as a thermometer was carried through space. Applying this to the present problem, we express the total covariant derivative as
Recognizing \(\partial_b T^i dx^b/d\lambda\) as a total non-covariant derivative, we find
Substituting \(\partial x^i/\partial\lambda\) for \(T^i\), and setting the covariant derivative equal to zero, we obtain
This is known as the geodesic equation.
If this differential equation is satisfied for one affine parameter \(\lambda\), then it is also satisfied for any other affine parameter \(\lambda'=a\lambda+b\), where \(a\) and \(b\) are constants (problem 4). Recall that affine parameters are only defined along geodesics, not along arbitrary curves. We can't start by defining an affine parameter and then use it to find geodesics using this equation, because we can't define an affine parameter without first specifying a geodesic. Likewise, we can't do the geodesic first and then the affine parameter, because if we already had a geodesic in hand, we wouldn't need the differential equation in order to find a geodesic. The solution to this chicken-and-egg conundrum is to write down the differential equations and try to find a solution, without trying to specify either the affine parameter or the geodesic in advance. We will seldom have occasion to resort to this technique, an exception being example 19 on page 316.
The geodesic equation is useful in establishing one of the necessary theoretical foundations of relativity, which is the uniqueness of geodesics for a given set of initial conditions. This is related to axiom O1 of ordered geometry, that two points determine a line, and is necessary physically for the reasons discussed on page 22; briefly, if the geodesic were not uniquely determined, then particles would have no way of deciding how to move. The form of the geodesic equation guarantees uniqueness. To see this, consider the following algorithm for determining a numerical approximation to a geodesic:
Since the result of the calculation depends only on the inputs at step 1, we find that the geodesic is uniquely determined.
To see that this is really a valid way of proving uniqueness, it may be helpful to consider how the proof could have failed. Omitting some of the details of the tensors and the multidimensionality of the space, the form of the geodesic equation is essentially \(\ddot{x}+f\dot{x}^2=0\), where dots indicate derivatives with respect to \(\lambda\). Suppose that it had instead had the form \(\ddot{x}^2+f\dot{x}=0\). Then at step 2 we would have had to pick either a positive or a negative square root for \(\ddot{x}\). Although continuity would usually suffice to maintain a consistent sign from one iteration to the next, that would not work if we ever came to a point where \(\ddot{x}\) vanished momentarily. An equation of this form therefore would not have a unique solution for a given set of initial conditions.
The practical use of this algorithm to compute geodesics numerically is demonstrated in section 5.9.2 on page 188.
Self-check: Interpret the mathematical meaning of the equation \(\Gamma^a_{[bc]}=0\), which is expressed in the notation introduced on page 102.
It seems clear that something like the covariant derivative is needed for vectors, since they have a direction in spacetime, and thus their measures vary when the measure of spacetime itself varies. Since scalars don't have a direction in spacetime, the same reasoning doesn't apply to them, and this is reflected in our rules for covariant derivatives. The covariant derivative has one \(\Gamma\) term for every index of the tensor being differentiated, so for a scalar there should be no \(\Gamma\) terms at all, i.e., \(\nabla_a\) is the same as \(\partial_a\).
But just because derivatives of scalars don't require special treatment for this particular reason, that doesn't mean they are guaranteed to behave as we intuitively expect, in the strange world of coordinate-invariant relativity.
One possible way for scalars to behave counterintuitively would be by analogy with parallel transport of vectors. If we stick a vector in a box (as with, e.g., the gyroscopes aboard Gravity Probe B) and carry it around a closed loop, it changes. Could the same happen with a scalar? This is extremely counterintuitive, since there is no reason to imagine such an effect in any of the models we've constructed of curved spaces. In fact, it is not just counterintuitive but mathematically impossible, according to the following argument. The only reason we can interpret the vector-in-a-box effect as arising from the geometry of spacetime is that it applies equally to all vectors. If, for example, it only applied to the magnetic polarization vectors of ferromagnetic substances, then we would interpret it as a magnetic field living in spacetime, not a property of spacetime itself. If the value of a scalar-in-a-box was path-dependent, and this path-dependence was a geometric property of spacetime, then it would have to apply to all scalars, including, say, masses and charges of particles. Thus if an electron's mass increased by 1% when transported in a box along a certain path, its charge would have to increase by 1% as well. But then its charge-to-mass ratio would remain invariant, and this is a contradiction, since the charge-to-mass ratio is also a scalar, and should have felt the same 1% effect. Since the varying scalar-in-a-box idea leads to a contradiction, it wasn't a coincidence that we couldn't find a model that produced such an effect; a theory that lacks self-consistency doesn't have any models.
Self-check: Explain why parallel transporting a vector can only rotate it, not change its magnitude.
There is, however, a different way in which scalars could behave counterintuitively, and this one is mathematically self-consistent. Suppose that Helen lives in two spatial dimensions and owns a thermometer. She wants to measure the spatial variation of temperature, in particular its mixed second derivative \(\partial^2 T/\partial x\partial y\). At home in the morning at point A, she prepares by calibrating her gyrocompass to point north and measuring the temperature. Then she travels \(\ell=1\) km east along a geodesic to B, consults her gyrocompass, and turns north. She continues one kilometer north to C, samples the change in temperature \(\Delta T_1\) relative to her home, and then retraces her steps to come home for lunch. In the afternoon, she checks her work by carrying out the same process, but this time she interchanges the roles of north and east, traveling along ADE. If she were living in a flat space, this would form the other two sides of a square, and her afternoon temperature sample \(\Delta T_2\) would be at the same point in space C as her morning sample. She actually doesn't recognize the landscape, so the sample points C and E are different, but this just confirms what she already knew: the space isn't flat.^{10}
None of this seems surprising yet, but there are now two qualitatively different ways that her analysis of her data could turn out, indicating qualitatively different things about the laws of physics in her universe. The definition of the derivative as a limit requires that she repeat the experiment at smaller scales. As \(\ell\rightarrow 0\), the result for \(\partial^2 T/\partial x\partial y\) should approach a definite limit, and the error should diminish in proportion to \(\ell\). In particular the difference between the results inferred from \(\Delta T_1\) and \(\Delta T_2\) indicate an error, and the discrepancy between the second derivatives inferred from them should shrink appropriately as \(\ell\) shrinks. Suppose this doesn't happen. Since partial derivatives commute, we conclude that her measuring procedure is not the same as a partial derivative. Let's call her measuring procedure \(\nabla\), so that she is observing a discrepancy between \(\nabla_x\nabla_y\) and \(\nabla_y\nabla_x\). The fact that the commutator \(\nabla_x\nabla_y-\nabla_y\nabla_x\) doesn't vanish cannot be explained by the Christoffel symbols, because what she's differentiating is a scalar. Since the discrepancy arises entirely from the failure of \(\Delta T_1-\Delta T_2\) to scale down appropriately, the conclusion is that the distance \(\delta\) between the two sampling points is not scaling down as quickly as we expect. In our familiar models of two-dimensional spaces as surfaces embedded in three-space, we always have \(\delta\sim\ell^3\) for small \(\ell\), but she has found that it only shrinks as quickly as \(\ell^2\).
For a clue as to what is going on, note that the commutator \(\nabla_x\nabla_y-\nabla_y\nabla_x\) has a particular handedness to it. For example, it flips its sign under a reflection across the line \(y=x\). When we “parallel”-transport vectors, they aren't actually staying parallel. In this hypothetical universe, a vector in a box transported by a small distance \(\ell\) rotates by an angle proportional to \(\ell\). This effect is called torsion. Although no torsion effect shows up in our familiar models, that is not because torsion lacks self-consistency. Models of spaces with torsion do exist. In particular, we can see that torsion doesn't lead to the same kind of logical contradiction as the varying-scalar-in-a-box idea. Since all vectors twist by the same amount when transported, inner products are preserved, so it is not possible to put two vectors in one box and get the scalar-in-a-box paradox by watching their inner product change when the box is transported.
Note that the elbows ABC and ADE are not right angles. If Helen had brought a pair of gyrocompasses with her, one for \(x\) and one for \(y\), she would have found that the right angle between the gyrocompasses was preserved under parallel transport, but that a gyrocompass initially tangent to a geodesic did not remain so. There are in fact two inequivalent definitions of a geodesic in a space with torsion. The shortest path between two points is not necessarily the same as the straightest possible path, i.e., the one that parallel-transports its own tangent vector.
Since torsion is odd under parity, it must be represented by an odd-rank tensor, which we call \(\tau^c_{ab}\) and define according to
where \(f\) is any scalar field, such as the temperature in the preceding section. There are two different ways in which a space can be non-Euclidean: it can have curvature, or it can have torsion. For a full discussion of how to handle the mathematics of a spacetime with both curvature and torsion, see the article by Steuard Jensen at http://www.slimy.com/~steuard/teaching/tutorials/GRtorsion.pdf. For our present purposes, the main mathematical fact worth noting is that vanishing torsion is equivalent to the symmetry \(\Gamma^a_{bc}=\Gamma^a_{cb}\) of the Christoffel symbols. Using the notation introduced on page 102, \(\Gamma^a_{[bc]}=0\) if \(\tau=0\).
Self-check: Use an argument similar to the one in example 5 on page 166 to prove that no model of a two-space embedded in a three-space can have torsion.
Generalizing to more dimensions, the torsion tensor is odd under the full spacetime reflection \(x_a \rightarrow -x_a\), i.e., a parity inversion plus a time-reversal, PT.
In the story above, we had a torsion that didn't preserve tangent vectors. In three or more dimensions, however, it is possible to have torsion that does preserve tangent vectors. For example, transporting a vector along the \(x\) axis could cause only a rotation in the \(y\)-\(z\) plane. This relates to the symmetries of the torsion tensor, which for convenience we'll write in an \(x\)-\(y\)-\(z\) coordinate system and in the fully covariant form \(\tau_{\lambda\mu\nu}\). The definition of the torsion tensor implies \(\tau_{\lambda(\mu\nu)}=0\), i.e., that the torsion tensor is antisymmetric in its two final indices. Torsion that does not preserve tangent vectors will have nonvanishing elements such as \(\tau_{xxy}\), meaning that parallel-transporting a vector along the \(x\) axis can change its \(x\) component. Torsion that preserves tangent vectors will have vanishing \(\tau_{\lambda\mu\nu}\) unless \(\lambda\), \(\mu\), and \(\nu\) are all distinct. This is an example of the type of antisymmetry that is familiar from the vector cross product, in which the cross products of the basis vectors behave as \(\mathbf{x}\times\mathbf{y}=\mathbf{z}\), \(\mathbf{y}\times\mathbf{z}=\mathbf{x}\), \(\mathbf{y}\times\mathbf{z}=\mathbf{x}\). Generalizing the notation for symmetrization and antisymmetrization of tensors from page 102, we have
where the sums are over all permutations of the indices, and in the second line we have used the Levi-Civita symbol. In this notation, a totally antisymmetric torsion tensor is one with \(\tau_{\lambda\mu\nu=}\tau_{[\lambda\mu\nu]}\), and torsion of this type preserves tangent vectors under translation.
In two dimensions, there are no totally antisymmetric objects with three indices, because we can't write three indices without repeating one. In three dimensions, an antisymmetric object with three indices is simply a multiple of the Levi-Civita tensor, so a totally antisymmetric torsion, if it exists, is represented by a single number; under translation, vectors rotate like either right-handed or left-handed screws, and this number tells us the rate of rotation. In four dimensions, we have four independently variable quantities, \(\tau_{xyz}\), \(\tau_{tyz}\), \(\tau_{txz}\), and \(\tau_{txy}\). In other words, an antisymmetric torsion of 3+1 spacetime can be represented by a four-vector, \(\tau^a=\epsilon^{abcd}\tau_{bcd}\).
One way of stating the equivalence principle (see p. 142) is that it forbids spacetime from coming equipped with a vector field that could be measured by free-falling observers, i.e., observers in local Lorentz frames. A variety of high-precision tests of the equivalence principle have been carried out. From the point of view of an experimenter doing this kind of test, it is important to distinguish between fields that are “built in” to spacetime and those that live in spacetime. For example, the existence of the earth's magnetic field does not violate the equivalence principle, but if an experiment was sensitive to the earth's field, and the experimenter didn't know about it, there would appear to be a violation. Antisymmetric torsion in four dimensions acts like a vector. If it constitutes a universal background effect built into spacetime, then it violates the equivalence principle. If it instead arises from specific material sources, then it may still show up as a measurable effect in experimental tests designed to detect Lorentz-invariance. Let's consider the latter possibility.
Since curvature in general relativity comes from mass and energy, as represented by the stress-energy tensor \(T_{ab}\), we could ask what would be the sources of torsion, if it exists in our universe. The source can't be the rank-2 stress-energy tensor. It would have to be an odd-rank tensor, i.e., a quantity that is odd under PT, and in theories that include torsion it is commonly assumed that the source is the quantum-mechanical angular momentum of subatomic particles. If this is the case, then torsion effects are expected to be proportional to \(\hbar G\), the product of Planck's constant and the gravitational constant, and they should therefore be extremely small and hard to measure. String theory, for example, includes torsion, but nobody has found a way to test string theory empirically because it essentially makes predictions about phenomena at the Planck scale, \(\sqrt{\hbar G/c^3} \sim 10^{-35}\ \text{m}\), where both gravity and quantum mechanics are strong effects.
There are, however, some high-precision experiments that have a reasonable chance of detecting whether our universe has torsion. Torsion violates the equivalence principle, and by the turn of the century tests of the equivalence principle had reached a level of precision sufficient to rule out some models that include torsion. Figure d shows a torsion pendulum used in an experiment by the Eöt-Wash group at the University of Washington.^{11} If torsion exists, then the intrinsic spin \(\boldsymbol{\sigma}\) of an electron should have an energy \(\boldsymbol{\sigma}\cdot\boldsymbol{\tau}\), where \(\boldsymbol{\tau}\) is the spacelike part of the torsion vector. The torsion could be generated by the earth, the sun, or some other object at a greater distance. The interaction \(\boldsymbol{\sigma}\cdot\boldsymbol{\tau}\) will modify the behavior of a torsion pendulum if the spins of the electrons in the pendulum are polarized nonrandomly, as in a magnetic material. The pendulum will tend to precess around the axis defined by \(\boldsymbol{\tau}\).
This type of experiment is extremely difficult, because the pendulum tends to act as an ultra-sensitive magnetic compass, resulting in a measurement of the ambient magnetic field rather than the hypothetical torsion field \(\boldsymbol{\tau}\). To eliminate this source of systematic error, the UW group first eliminated the ambient magnetic field as well as possible, using mu-metal shielding and Helmholtz coils. They also constructed the pendulum out of a combination of two magnetic materials, Alnico 5 and \(\text{Sm}\text{Co}_5\), in such a way that the magnetic dipole moment vanished, but the spin dipole moment did not; Alnico 5's magnetic field is due almost entirely to electron spin, whereas the magnetic field of \(\text{Sm}\text{Co}_5\) contains significant contributions from orbital motion. The result was a nonmagnetic object whose spins were polarized. After four years of data collection, they found \(|\boldsymbol{\tau}|\lesssim 10^{-21}\ \text{eV}\). Models that include torsion typically predict such effects to be of the order of \(m_e^2/m_P \sim 10^{-17}\ \text{eV}\), where \(m_e\) is the mass of the electron and \(m_P=\sqrt{\hbar c/G}\approx10^{19}\ \text{GeV}\approx 20\ \mu\text{g}\) is the Planck mass. A wide class of these models is therefore ruled out by these experiments.
Since there appears to be no experimental evidence for the existence of gravitational torsion in our universe, we will assume from now on that it vanishes identically. Einstein made the same assumption when he originally created general relativity, although he and Cartan later tinkered with non-torsion-free theories in a failed attempt to unify gravity with electromagnetism. Some models that include torsion remain viable. For example, it has been argued that the torsion tensor should fall off quickly with distance from the source.^{12}
We've already found the Christoffel symbol in terms of the metric in one dimension. Expressing it in tensor notation, we have
where inversion of the one-component matrix \(G\) has been replaced by matrix inversion, and, more importantly, the question marks indicate that there would be more than one way to place the subscripts so that the result would be a grammatical tensor equation. The most general form for the Christoffel symbol would be
where \(L\), \(M\), and \(N\) are constants. Consistency with the one-dimensional expression requires \(L+M+N=1\), and vanishing torsion gives \(L=M\). The \(L\) and \(M\) terms have a different physical significance than the \(N\) term.
Suppose an observer uses coordinates such that all objects are described as lengthening over time, and the change of scale accumulated over one day is a factor of \(k>1\). This is described by the derivative \(\partial_t g_{xx}\lt1\), which affects the \(M\) term. Since the metric is used to calculate squared distances, the \(g_{xx}\) matrix element scales down by \(1/\sqrt{k}\). To compensate for \(\partial_t v^x\lt0\), so we need to add a positive correction term, \(M>0\), to the covariant derivative. When the same observer measures the rate of change of a vector \(v^t\) with respect to space, the rate of change comes out to be too small, because the variable she differentiates with respect to is too big. This requires \(N\lt0\), and the correction is of the same size as the \(M\) correction, so \(|M|=|N|\). We find \(L=M=-N=1\).
Self-check: Does the above argument depend on the use of space for one coordinate and time for the other?
The resulting general expression for the Christoffel symbol in terms of the metric is
One can readily go back and check that this gives \(\nabla_c g_{ab}=0\). In fact, the calculation is a bit tedious. For that matter, tensor calculations in general can be infamously time-consuming and error-prone. Any reasonable person living in the 21st century will therefore resort to a computer algebra system. The most widely used computer algebra system is Mathematica, but it's expensive and proprietary, and it doesn't have extensive built-in facilities for handling tensors. It turns out that there is quite a bit of free and open-source tensor software, and it falls into two classes: coordinate-based and coordinate-independent. The best open-source coordinate-independent facility available appears to be Cadabra, and in fact the verification of \(\nabla_c g_{ab}=0\) is the first example given in the Leo Brewin's handy guide to applications of Cadabra to general relativity.^{13}
Self-check: In the case of 1 dimension, show that this reduces to the earlier result of \(-(1/2)dG/dX\).
Since \(\Gamma\) is not a tensor, it is not obvious that the covariant derivative, which is constructed from it, is a tensor. But if it isn't obvious, neither is it surprising -- the goal of the above derivation was to get results that would be coordinate-independent.
The metric on a sphere is \(ds^2=R^2d\theta^2+R^2\sin^2\thetad\phi^2\). The only nonvanishing term in the expression for \(\Gamma^\theta_{\phi\phi}\) is the one involving \(\partial_\theta g_{\phi\phi}=2R^2\sin\theta\cos\theta\). The result is \(\Gamma^\theta_{\phi\phi}=-\sin\theta\cos\theta\), which can be verified to have the properties claimed above.
import math l = 0 # affine parameter lambda dl = .001 # change in l with each iteration l_max = 100. # initial position: r=1 phi=0 # initial derivatives of coordinates w.r.t. lambda vr = 0 vphi = 1 k = 0 # keep track of how often to print out updates while l<l_max: l = l+dl # Christoffel symbols: Grphiphi = -r Gphirphi = 1/r # second derivatives: ar = -Grphiphi*vphi*vphi aphi = -2.*Gphirphi*vr*vphi # ... factor of 2 because G^a_{bc}=G^a_{cb} and b # is not the same as c # update velocity: vr = vr + dl*ar vphi = vphi + dl*aphi # update position: r = r + vr*dl phi = phi + vphi*dl if k%10000==0: # k is divisible by 10000 phi_deg = phi*180./math.pi print "lambda=%6.2f r=%6.2f phi=%6.2f deg." % (l,r,phi_deg) k = k+1
It is not necessary to worry about all the technical details of the language (e.g., line 1, which makes available such conveniences as math.pi for \(\pi\)). Comments are set off by pound signs. Lines 16-34 are indented because they are all to be executed repeatedly, until it is no longer true that \(\lambda\lt\lambda_{max}\) (line 15).
Self-check: By inspecting lines 18-22, find the signs of \(\ddot{r}\) and \(\ddot{\phi}\) at \(\lambda=0\). Convince yourself that these signs are what we expect geometrically.
The output is as follows:
lambda= 0.00 r= 1.00 phi= 0.06 deg. lambda= 10.00 r= 10.06 phi= 84.23 deg. lambda= 20.00 r= 20.04 phi= 87.07 deg. lambda= 30.00 r= 30.04 phi= 88.02 deg. lambda= 40.00 r= 40.04 phi= 88.50 deg. lambda= 50.00 r= 50.04 phi= 88.78 deg. lambda= 60.00 r= 60.05 phi= 88.98 deg. lambda= 70.00 r= 70.05 phi= 89.11 deg. lambda= 80.00 r= 80.06 phi= 89.21 deg. lambda= 90.00 r= 90.06 phi= 89.29 deg.
We can see that \(\phi\rightarrow 90\ \text{deg.}\) as \(\lambda\rightarrow\infty\), which makes sense, because the geodesic is a straight line parallel to the \(y\) axis.
A less trivial use of the technique is demonstrated on page 222, where we calculate the deflection of light rays in a gravitational field, one of the classic observational tests of general relativity.
The covariant derivative of a vector can be interpreted as the rate of change of a vector in a certain direction, relative to the result of parallel-transporting the original vector in the same direction. We can therefore see that the definition of the Riemann curvature tensor on page 168 is a measure of the failure of covariant derivatives to commute:
A tedious calculation now gives \(R\) in terms of the \(\Gamma\)s:
This is given as another example later in Brewin's manual for applying Cadabra to general relativity.^{14} (Brewin writes the upper index in the second slot of \(R\).)
{\renewcommand{\arraystretch}{2}
| electromagnetism | differential geometry |
global symmetry | A constant phase shiftα has no observable effects. | Adding a constant onto a coordinate has no observable effects. |
local symmetry | A phase shiftα that varies from point to point has no observable effects. | An arbitrary coordinate transformation has no observable effects. |
The gauge is described by … | α | g_{μν} |
…and differentiation of this gives the gauge field… | A_{b} | Gammaindicescab |
A second differentiation gives the directly observable field(s) … | vcE andvcB | Rindicescdab |
}
The interesting thing here is that the directly observable fields do not carry all of the necessary information, but the gauge fields are not directly observable. In electromagnetism, we can see this from the Aharonov-Bohm effect, shown in figure a.^{15} The solenoid has \(\mathbf{B}=0\) externally, and the electron beams only ever move through the external region, so they never experience any magnetic field. Experiments show, however, that turning the solenoid on and off does change the interference between the two beams. This is because the vector potential does not vanish outside the solenoid, and as we've seen on page 137, the phase of the beams varies according to the path integral of the \(A_b\). We are therefore left with an uncomfortable, but unavoidable, situation. The concept of a field is supposed to eliminate the need for instantaneous action at a distance, which is forbidden by relativity; that is, (1) we want our fields to have only local effects. On the other hand, (2) we would like our fields to be directly observable quantities. We cannot have both 1 and 2. The gauge field satisfies 1 but not 2, and the electromagnetic fields give 2 but not 1.
Figure b shows an analog of the Aharonov-Bohm experiment in differential geometry. Everywhere but at the tip, the cone has zero curvature, as we can see by cutting it and laying it out flat. But even an observer who never visits the tightly curved region at the tip can detect its existence, because parallel-transporting a vector around a closed loop can change the vector's direction, provided that the loop surrounds the tip.
In the electromagnetic example, integrating \(\mathbf{A}\) around a closed loop reveals, via Stokes' theorem, the existence of a magnetic flux through the loop, even though the magnetic field is zero at every location where \(\mathbf{A}\) has to be sampled. In the relativistic example, integrating \(\Gamma\) around a closed loop shows that there is curvature inside the loop, even though the curvature is zero at all the places where \(\Gamma\) has to be sampled.
The fact that \(\Gamma\) is a gauge field, and therefore not locally observable, is simply a fancy way of expressing the ideas introduced on pp. 176 and 177, that due to the equivalence principle, the gravitational field in general relativity is not locally observable. This non-observability is local because the equivalence principle is a statement about local Lorentz frames. The example in figure b is non-local.
\(\triangleright\) In section 5.5.1 on page 170, we estimated the geodetic effect on Gravity Probe B and found a result that was only off by a factor of \(3\pi\). The mathematically pure form of the \(3\pi\) suggests that the geodetic effect is insensitive to the distribution of mass inside the earth. Why should this be so?
\(\triangleright\) The change in a vector upon parallel transporting it around a closed loop can be expressed in terms of either (1) the area integral of the curvature within the loop or (2) the line integral of the Christoffel symbol (essentially the gravitational field) on the loop itself. Although I expressed the estimate as 1, it would have been equally valid to use 2. By Newton's shell theorem, the gravitational field is not sensitive to anything about its mass distribution other than its near spherical symmetry. The earth spins, and this does affect the stress-energy tensor, but since the velocity with which it spins is everywhere much smaller than \(c\), the resulting effect, called frame dragging, is much smaller.
This section can be omitted on a first reading.
General relativity doesn't assume a predefined background metric, and this creates a chicken-and-egg problem. We want to define a metric on some space, but how do we even specify the set of points that make up that space? The usual way to define a set of points would be by their coordinates. For example, in two dimensions we could define the space as the set of all ordered pairs of real numbers \((x,y)\). But this doesn't work in general relativity, because space is not guaranteed to have this structure. For example, in the classic 1979 computer game Asteroids, space “wraps around,” so that if your spaceship flies off the right edge of the screen, it reappears on the left, and similarly at the top and bottom. Even before we impose a metric on this space, it has topological properties that differ from those of the Euclidean plane. By “topological” we mean properties that are preserved if the space is thought of as a sheet of rubber that can be stretched in any way, but not cut or glued back together. Topologically, the space in Asteroids is equivalent to a torus (surface of a doughnut), but not to the Euclidean plane.
Another useful example is the surface of a sphere. In example 10 on page 188, we calculated \(\Gamma^\theta_{\phi\phi}\). A similar calculation gives \(\Gamma^\phi_{\theta\phi}=\cot\theta/R\). Now consider what happens as we drive our dogsled north along the line of longitude \(\phi=0\), cross the north pole at \(\theta=0\), and continue along the same geodesic. As we cross the pole, our longitude changes discontinuously from 0 to \(\pi\). Consulting the geodesic equation, we see that this happens because \(\Gamma^\phi_{\theta\phi}\) blows up at \(\theta=0\). Of course nothing really special happens at the pole. The bad behavior isn't the fault of the sphere, it's the fault of the \((\theta,\phi)\) coordinates we've chosen, that happen to misbehave at the pole. Unfortunately, it is impossible to define a pair of coordinates on a two-sphere without having them misbehave somewhere. (This follows from Brouwer's famous 1912 “Hairy ball theorem,” which states that it is impossible to comb the hair on a sphere without creating a cowlick somewhere.)
There is a general notion of a topological space, which is too general for our purposes. In such a space, the only structure we are guaranteed is that certain sets are defined as “open,” in the same sense that an interval like \(0\lt x \lt 1\) is called “open.” Any point in an open set can be moved around without leaving the set. An open set is essentially a set without a boundary, for in a set like \(0\le x \le 1\), the boundary points 0 and 1 can only be moved in one direction without taking them outside.
A topological space is too general for us because it can include spaces like fractals, infinite-dimensional spaces, and spaces that have different numbers of dimensions in different regions. It is nevertheless useful to recognize certain concepts that can be defined using only the generic apparatus of a topological space, so that we know they do not depend in any way on the presence of a metric. An open set surrounding a point is called a neighborhood of that point. In a topological space we have a notion of getting arbitrarily close to a certain point, which means to take smaller and smaller neighborhoods, each of which is a subset of the last. But since there is no metric, we do not have any concept of comparing distances of distant points, e.g., that P is closer to Q than R is to S. A continuous function is a purely topological idea; a continuous function is one such that for any open subset U of its range, the set V of points in its domain that are mapped to points in U is also open. Although some definitions of continuous functions talk about real numbers like \(\epsilon\) and \(\delta\), the notion of continuity doesn't depend on the existence of any structure such as the real number system. A homeomorphism is a function that is invertible and continuous in both directions. Homeomorphisms formalize the informal notion of “rubber-sheet geometry without cutting or gluing.” If a homeomorphism exists between two topological spaces, we say that they are homeomorphic; they have the same structure and are in some sense the same space.
The more specific type of topological space we want is called a manifold. Without attempting any high level of mathematical rigor, we define an \(n\)-dimensional manifold M according to the following informal principles:^{16}
The set of all real numbers is a 1-manifold. Similarly, any line with the properties specified in Euclid's Elements is a 1-manifold. All such lines are homeomorphic to one another, and we can therefore speak of “the line.”
A circle (not including its interior) is a 1-manifold, and it is not homeomorphic to the line. To see this, note that deleting a point from a circle leaves it in one connected piece, but deleting a point from a line makes two. Here we use the fact that a homeomorphism is guaranteed to preserve “rubber-sheet” properties like the number of pieces.
A “lollipop” formed by gluing an open 2-circle (i.e., a circle not including its boundary) to an open line segment is not a manifold, because there is no \(n\) for which it satisfies M1.
It also violates M2, because points in this set fall into three distinct classes: classes that live in 2-dimensional neighborhoods, those that live in 1-dimensional neighborhoods, and the point where the line segment intersects the boundary of the circle.
The rational numbers are not a manifold, because specifying an arbitrarily small neighborhood around \(\sqrt{2}\) excludes every rational number, violating M3.
Similarly, the rational plane defined by rational-number coordinate pairs \((x,y)\) is not a 2-manifold. It's good that we've excluded this space, because it has the unphysical property that curves can cross without having a point in common. For example, the curve \(y=x^2\) crosses from one side of the line \(y=2\) to the other, but never intersects it. This is physically undesirable because it doesn't match up with what we have in mind when we talk about collisions between particles as intersections of their world-lines, or when we say that electric field lines aren't supposed to intersect.
The open half-plane \(y>0\) in the Cartesian plane is a 2-manifold. The closed half-plane \(y\ge 0\) is not, because it violates M2; the boundary points have different properties than the ones on the interior.
Two nonintersecting lines are a 1-manifold. Physically, disconnected manifolds of this type would represent a universe in which an observer in one region would never be able to find out about the existence of the other region.
Hold your hands like you're pretending you know karate, and then use one hand to karate-chop the other. Suppose we want to join two open half-planes in this way. As long as they're separate, then we have a perfectly legitimate disconnected manifold. But if we want to join them by adding the point P where their boundaries coincide, then we violate M2, because this point has special properties not possessed by any others. An example of such a property is that there exist points Q and R such that every continuous curve joining them passes through P. (Cf. problem 5, p. 340.)
An alternative way of characterizing an \(n\)-manifold is as an object that can locally be described by \(n\) real coordinates. That is, any sufficiently small neighborhood is homeomorphic to an open set in the space of real-valued \(n\)-tuples of the form \((x_1,x_2,...,x_n)\). For example, a closed half-plane is not a 2-manifold because no neighborhood of a point on its edge is homeomorphic to any open set in the Cartesian plane.
Self-check: Verify that this alternative definition of a manifold gives the same answers as M1-M3 in all the examples above.
Roughly speaking, the equivalence of the two definitions occurs because we're using \(n\) real numbers as coordinates for the dimensions specified by M1, and the real numbers are the unique number system that has the usual arithmetic operations, is ordered, and is complete in the sense of M3.
As usual when we say that something is “local,” a question arises as to how local is local enough. The language in the definition above about “any sufficiently small neighborhood” is logically akin to the Weierstrass \(\epsilon\)-\(\delta\) approach: if Alice gives Bob a manifold and a point on a manifold, Bob can always find some neighborhood around that point that is compatible with coordinates, but it may be an extremely small neighborhood.
If we are to define coordinates on a circle, they should be continuous functions. The angle \(\phi\) about the center therefore doesn't quite work as a global coordinate, because it has a discontinuity where \(\phi=0\) is identified with \(\phi=2\pi\). We can get around this by using different coordinates in different regions, as is guaranteed to be possible by the local-coordinate definition of a manifold. For example, we can cover the circle with two open sets, one on the left and one on the right. The left one, L, is defined by deleting only the \(\phi=0\) point from the circle. The right one, R, is defined by deleting only the one at \(\phi=\pi\). On L, we use coordinates \(0\lt\phi_L\lt2\pi\), which are always a continuous function from L to the real numbers. On R, we use \(-\pi\lt\phi_R\lt\pi\).
In examples like this one, the sets like L and R are referred to as patches. We require that the coordinate maps on the different patches match up smoothly. In this example, we would like all four of the following functions, known as transition maps, to be continuous:
The local-coordinate definition only states that a manifold can be coordinatized. That is, the functions that define the coordinate maps are not part of the definition of the manifold, so, for example, if two people define coordinates patches on the unit circle in different ways, they are still talking about exactly the same manifold.
Let L be an open line segment, such as the open interval \((0,1)\). L is homeomorphic to a line, because we can map \((0,1)\) to the real line through the function \(f(x)=\tan(\pi x-\pi/2)\).
A closed line segment (which is not a manifold) is not homeomorphic to a line. If we map it to a line, then the endpoints have to go to two special points A and B. There is then no way for the mapping to visit the points exterior to the interval \([\text{A},\text{B}]\) without visiting A and B more than once.
A differentiable manifold means a manifold with enough extra structure so you can do calculus on it, but this extra structure doesn't necessarily include anything as fancy as a metric. As a concrete example, suppose that in a \(1+1\)-dimensional Galilean universe, observer Alice constructs a global coordinate system \((t,x)\). Her spacetime is clearly a manifold, based on the local-coordinate definition, and this is true even though Galilean spacetime doesn't have a metric. Meanwhile, observer Bob constructs his own coordinate system \((t',x')\). But something disturbing happens when Alice constructs the transition map from Bob's coordinate grid to hers. As shown in figure e, Bob's grid has a kink in it. “Bob,” says Alice, “something is wrong with your coordinate system. I hypothesize that at a certain time, which we can call \(t=0\), an invisible giant struck your body with an invisible croquet mallet and suddenly changed your state of motion.” “No way, Alice,” Bob answers. “I didn't feel anything happen at \(t=0\). I think you're the one who got whacked.”
By a differentiable manifold we mean one in which this sort of controversy never happens. The manifold comes with an a collection of local coordinate systems, called charts, and wherever these charts overlap, the transition map is differentiable. Every coordinate is a differentiable function of every other coordinate. In fact, we will assume for convenience that not just the first derivative but derivatives of all orders are defined. This makes our manifold not just a differentiable manifold but a smooth manifold. This definition sounds coordinate-dependent, but it isn't. Our collection of charts (called an atlas) can contain infinitely many possible coordinate systems; we can even specify that it contains all possible coordinate systems that could be obtained from one another by any diffeomorphism.
Points in the manifold are considered close if the Euclidean distance between them in coordinate space is \(O(\epsilon)\). This definition sounds coordinate-dependent, but isn't, and sounds like it's assuming an actual Euclidean metric, but isn't.^{19} Define a prevector at point P as a pair (P,Q) of points that are close, figure f/1. Define prevectors to be equivalent if the difference between them is infinitesimal even compared to \(\epsilon\).
Definition: A tangent vector at point P is the set of all prevectors at P that are equivalent to a particular prevector at P.
The tangent space \(T_\text{P}\) is the set of all tangent vectors at P. The tangent space has the structure of a vector space over the reals simply by using the coordinate differences to define the vector-space operations, just as we would do if \((P,Q)\) meant an arrow extending from P to Q, as in freshman physics.
In practice, we don't really care about the details of the construction of the tangent space, and different people don't even have to use the same construction. All we care about is that the tangent space has a certain structure. In particular, it has \(n\) dimensions, as we would expect intuitively. Since we're going to forget the details of the construction, it doesn't matter that we've made all tangent vectors infinitesimal by definition. The vector space's internal structure only has to do with how big the vectors are compared to each other. (If we wanted to, we could scale up all the tangent vectors by a factor of \(1/\epsilon\).) This justifies the visualization in figure f/2.
Actually it's not quite true that we only care about the tangent space's internal structure, because then we could have avoided the fancy definition and simply used the ordinary vector space consisting of \(n\)-tuples of real numbers. The fancy definition is needed because it ties the tangent space in a natural way to the structure of the manifold at a particular point. Therefore it will allow us (1) to define parallel transport, which brings a vector from one tangent space to another, and (2) to define components of vectors in a particular coordinate system.
For an alternative definition of the tangent space, see ch. 2 of Carroll.^{20} Briefly, this involves taking a tangent vector to be something that behaves like a directional derivative. In particular, a partial derivative with respect to a coordinate such as \(\partial/\partial x\) qualifies as a tangent vector, which we think of as pointing in the \(x\) direction. The set of such coordinate derivatives forms a basis for the tangent space and gives a convenient way of notating tangent vectors. We will find this notation convenient in section 7.1, p. 243.
1. Example 6 on p. 167 discussed some examples in electrostatics where the charge density on the surface of a conductor depends on the Gaussian curvature, when the curvature is positive. In the case of a knife-edge formed by two half-planes at an exterior angle \(\beta>\pi\), there is a standard result^{21} that the charge density at the edge blows up to infinity as \(R^{\pi/\beta-1}\). Does this match up with the hypothesis that Gaussian curvature determines the charge density?(solution in the pdf version of the book)
2. Show, as claimed on page 188, that for polar coordinates in a Euclidean plane, \(\Gamma^r_{\phi\phi}=-r\) and \(\Gamma^\phi_{r\phi}=1/r\).
3. Partial derivatives commute with partial derivatives. Covariant derivatives don't commute with covariant derivatives. Do covariant derivatives commute with partial derivatives?
4. Show that if the differential equation for geodesics on page 178 is satisfied for one affine parameter \(\lambda\), then it is also satisfied for any other affine parameter \(\lambda'=a\lambda+b\), where \(a\) and \(b\) are constants.
5. Equation [] on page gives a flat-spacetime metric in rotating polar coordinates. (a) Verify by explicit computation that this metric represents a flat spacetime. (b) Reexpress the metric in rotating Cartesian coordinates, and check your answer by verifying that the Riemann tensor vanishes.
6. The purpose of this problem is to explore the difficulties inherent in finding anything in general relativity that represents a uniform gravitational field \(g\). In example 12 on page 59, we found, based on elementary arguments about the equivalence principle and photons in elevators, that gravitational time dilation must be given by \(e^\Phi\), where \(\Phi=gz\) is the gravitational potential. This results in a metric
On the other hand, example 19 on page 140 derived the metric
by transforming from a Lorentz frame to a frame whose origin moves with constant proper acceleration \(g\).
(These are known as Rindler coordinates.)
Prove the following facts. None of the calculations are so complex as to require symbolic math software,
so you might want to perform them by hand first, and then check yourself on a computer.
(a) The metrics [] and []
are approximately consistent with one another for \(z\) near 0.
(b) When a test particle is released from rest in either of these metrics, its initial proper acceleration is \(g\).
(c) The two metrics are not exactly equivalent to one another under any change of coordinates.
(d) Both spacetimes are uniform in the sense that the curvature is constant. (In both cases, this can be proved without
an explicit computation of the Riemann tensor.)
(solution in the pdf version of the book)
Some further properties of the metric [] are analyzed in subsection 7.5 on page 260.
7. In a topological space T, the complement of a subset U is defined as the set of all points in T that are not members of U. A set whose complement is open is referred to as closed. On the real line, give (a) one example of a closed set and (b) one example of a set that is neither open nor closed. (c) Give an example of an inequality that defines an open set on the rational number line, but a closed set on the real line.
8. Prove that a double cone (e.g., the surface \(r=z\) in cylindrical coordinates) is not a manifold.(solution in the pdf version of the book)
11.
Curvature on a Riemannian space in 2 dimensions is a topic that goes back to Gauss and has
a simple interpretation: the only intrinsic measure of curvature is a single number, the Gaussian curvature.
What about 1+1 dimensions? The simplest metrics I can think of are of the form \(ds^2=dt^2-f(t)dx^2\).
(Something like \(ds^2=f(t)dt^2-dx^2\) is obviously equivalent to Minkowski space under a change
of coordinates, while \(ds^2=f(x)dt^2-dx^2\) is the same as the original example except that
we've swapped \(x\) and \(t\).) Playing around with simple examples, one stumbles across the seemingly
mysterious fact that the metric \(ds^2=dt^2-t^2dx^2\) is flat, while \(ds^2=dt^2-tdx^2\) is not.
This seems to require some simple explanation. Consider the metric \(ds^2=dt^2-t^p dx^2\).
(a) Calculate the Christoffel symbols by hand.
(b) Use a computer algebra system such as Maxima to show that the Ricci tensor vanishes only when \(p=2\).(solution in the pdf version of the book)
(c) 1998-2013 Benjamin Crowell, licensed under the Creative Commons Attribution-ShareAlike license. Photo credits are given at the end of the Adobe Acrobat version.