You are viewing the html version of General Relativity, by Benjamin Crowell. This version is only designed for casual browsing, and may have some formatting problems. For serious reading, you want the Adobe Acrobat version. (c) 1998-2011 Benjamin Crowell, licensed under the Creative Commons Attribution-ShareAlike license. Photo credits are given at the end of the Adobe Acrobat version. |
General relativity is described mathematically in the language of differential geometry. Let's take those two terms in reverse order.
The geometry of spacetime is non-Euclidean, not just in the sense that the 3+1-dimensional geometry of Lorentz frames is different than that of 4 interchangeable Euclidean dimensions, but also in the sense that parallels do not behave in the way described by E5 or A1-A3. In a Lorentz frame, which describes space without any gravitational fields, particles whose world-lines are initially parallel will continue along their parallel world-lines forever. But in the presence of gravitational fields, initially parallel world-lines of free-falling particles will in general diverge, approach, or even cross. Thus, neither the existence nor the uniqueness of parallels can be assumed. We can't describe this lack of parallelism as arising from the curvature of the world-lines, because we're using the world-lines of free-falling particles as our definition of a “straight” line. Instead, we describe the effect as coming from the curvature of spacetime itself. The Lorentzian geometry is a description of the case in which this curvature is negligible.
What about the word differential? The equivalence principle states that even in the presence of gravitational fields, local Lorentz frames exist. How local is “local?” If we use a microscope to zoom in on smaller and smaller regions of spacetime, the Lorentzian approximation becomes better and better. Suppose we want to do experiments in a laboratory, and we want to ensure that when we compare some physically observable quantity against predictions made based on the Lorentz geometry, the resulting discrepancy will not be too large. If the acceptable error is ε, then we should be able to get the error down that low if we're willing to make the size of our laboratory no bigger than δ. This is clearly very similar to the Weierstrass style of defining limits and derivatives in calculus. In calculus, the idea expressed by differentiation is that every smooth curve can be approximated locally by a line; in general relativity, the equivalence principle tells us that curved spacetime can be approximated locally by flat spacetime. But consider that no practitioner of calculus habitually solves problems by filling sheets of scratch paper with epsilons and deltas. Instead, she uses the Leibniz notation, in which dy and dx are interpreted as infinitesimally small numbers. You may be inclined, based on your previous training, to dismiss infinitesimals as neither rigorous nor necessary. In 1966, Abraham Robinson demonstrated that concerns about rigor had been unfounded; we'll come back to this point in section 3.2. Although it is true that any calculation written using infinitesimals can also be carried out using limits, the following example shows how much more well suited the infinitesimal language is to differential geometry.
The area of a region S in the Cartesian plane can be calculated as
, where dA=dxdy
is the area of an infinitesimal rectangle of width dx and height dy. A curved surface such as a
sphere does not admit a global Cartesian coordinate system in which the constant coordinate curves are both
uniformly spaced and perpendicular to one another. For example, lines of longitude on the earth's surface
grow closer together as one moves away from the equator. Letting θ be the angle with respect to the
pole, and φ the azimuthal angle, the approximately rectangular patch bounded by θ, θ+dθ,
φ, and φ+dφ has width rsinθdθ and height rdφ,
giving dA=r2sinθdθdφ. If you look at the corresponding derivation in an elementary
calculus textbook that strictly eschews infinitesimals, the technique is to start from scratch with Riemann sums.
This is extremely laborious, and moreover must be carried out again for every new case. In differential geometry,
the curvature of the space varies from one point to the next, and clearly we don't want to reinvent the wheel with
Riemann sums an infinite number of times, once at each point in space.
An important example of the differential, i.e., local, nature of our geometry is the generalization of the affine parameter to a context broader than affine geometry.
Our construction of the affine parameter with a scaffolding of parallelograms depended on the existence and uniqueness of parallels expressed by A1, so we might imagine that there was no point in trying to generalize the construction to curved spacetime. But the equivalence principle tells us that spacetime is locally affine to some approximation. Concretely, clock-time is one example of an affine parameter, and the curvature of spacetime clearly can't prevent us from building a clock and releasing it on a free-fall trajectory. To generalize the recipe for the construction (figure a), the first obstacle is the ambiguity of the instruction to construct parallelogram 01q0q1, which requires us to draw 1q1 parallel to 0q0. Suppose we construe this as an instruction to make the two segments initially parallel, i.e., parallel as they depart the line at 0 and 1. By the time they get to q0 and q1, they may be converging or diverging.
Because parallelism is only approximate here,
there will be a certain amount of error in the construction of the affine parameter. One way of detecting such an
error is that lattices constructed with different initial distances will get out of step with one another.
For example, we can define
as before by requiring that the lattice constructed with initial segment
line up with the original lattice at 1. We will find, however, that they do not quite line up at
other points, such as 2. Let's use this discrepancy ε=2-2' as a numerical measure of the error.
It will depend on both δ1, the distance 01, and on δ2, the
distance between 0 and q0. Since ε vanishes for either δ1=0 or δ2=0, and since the equivalence
principle guarantees smooth behavior on small scales, the leading term in the error will in general
be proportional to the product δ1δ2. In the language of infinitesimals, we can replace δ1 and δ2
with infinitesimally short distances, which for simplicity we assume to be equal, and which we call dλ.
Then the affine parameter λ is defined as
, where the error of order dλ2
is, as usual, interpreted as the negligible discrepancy between the integral and its approximation as a Riemann sum.
b / Parallel transport is path-dependent. On the surface of this sphere, parallel-transporting a vector along ABC gives a different answer than transporting it along AC.
c / Bad things happen if we try to construct an affine parameter along a curve that isn't a geodesic. This curve is similar to path ABC in figure b. Parallel transport doesn't preserve the vectors' angle relative to the curve, as it would with a geodesic. The errors in the construction blow up in a way that wouldn't happen if the curve had been a geodesic. The fourth dashed parallel flies off wildly around the back of the sphere, wrapping around and meeting the curve at a point, 4, that is essentially random.
If you were alert, you may have realized that I cheated you at a crucial point in this construction. We were to make 1q1 and 0q0 “initially parallel” as they left 01. How should we even define this idea of “initially parallel?” We could try to do it by making angles q001 and q112 equal, but this doesn't quite work, because it doesn't specify whether the angle is to the left or the right on the two-dimensional plane of the page. In three or more dimensions, the issue becomes even more serious. The construction workers building the lattice need to keep it all in one plane, but how do they do that in curved spacetime?
A mathematician's answer would be that our geometry lacks some additional structure called a connection, which is a rule that specifies how one locally flat neighborhood is to be joined seamlessly onto another locally flat neighborhood nearby. If you've ever bought two maps and tried to tape them together to make a big map, you've formed a connection. If the maps were on a large enough scale, you also probably noticed that this was impossible to do perfectly, because of the curvature of the earth.
Physically, the idea is that in flat spacetime, it is possible to construct inertial guidance systems like the ones discussed on page 67. Since they are possible in flat spacetime, they are also possible in locally flat neighborhoods of spacetime, and they can then be carried from one neighborhood to another.
In three space dimensions, a gyroscope's angular momentum vector maintains its direction, and we can orient other vectors, such as 1q1, relative to it. Suppose for concreteness that the construction of the affine parameter above is being carried out in three space dimensions. We place a gyroscope at 0, orient its axis along 0q0, slide it along the line to 1, and then construct 1q1 along that axis.
In 3+1 dimensions, a gyroscope only does part of the job. We now have to maintain the direction of a four-dimensional vector. Four-vectors will not be discussed in detail until section 4.2, but similar devices can be used to maintain their orientations in spacetime. These physical devices are ways of defining a mathematical notion known as parallel transport, which allows us to take a vector from one point to another in space. In general, specifying a notion of parallel transport is equivalent to specifying a connection.
Parallel transport is path-dependent, as shown in figure b.
In the context of flat spacetime, the affine parameter was defined only along lines, not arbitrary curves, and could not be compared between lines running in different directions. In curved spacetime, the same limitation is present, but with “along lines” replaced by “along geodesics.” Figure c shows what goes wrong if we try to apply the construction to a world-line that isn't a geodesic. One definition of a geodesic is that it's the course we'll end up following if we navigate by keeping a fixed bearing relative to an inertial guidance device such as gyroscope; that is, the tangent to a geodesic, when parallel-transported farther along the geodesic, is still tangent. A non-geodesic curve lacks this property, and the effect on the construction of the affine parameter is that the segments nqn drift more and more out of alignment with the curve.
a / Tullio Levi-Civita (1873-1941) worked on models of number systems possessing infinitesimals and on differential geometry. He invented the tensor notation, which Einstein learned from his textbook. He was appointed to prestigious endowed chairs at Padua and the University of Rome, but was fired in 1938 because he was a Jew and an anti-fascist.
b / An Einstein's ring is formed when there is a chance alignment of a distant source with a closer gravitating body. Here, a quasar, MG1131+0456, is seen as a ring due to focusing of light by an unknown object, possibly a supermassive black hole. Because the entire arrangement lacks perfect axial symmetry, the ring is nonuniform; most of its brightness is concentrated in two lumps on opposite sides. This type of gravitational lensing is direct evidence for the curvature of space predicted by gravitational lensing. The two geodesics form a lune, which is a figure that cannot exist in Euclidean geometry.
A typical first reaction to the phrase “curved spacetime” --- or even “curved space,” for that matter --- is that it sounds like nonsense. How can featureless, empty space itself be curved or distorted? The concept of a distortion would seem to imply taking all the points and shoving them around in various directions as in a Picasso painting, so that distances between points are altered. But if space has no identifiable dents or scratches, it would seem impossible to determine which old points had been sent to which new points, and the distortion would have no observable effect at all. Why should we expect to be able to build differential geometry on such a logically dubious foundation? Indeed, historically, various mathematicians have had strong doubts about the logical self-consistency of both non-Euclidean geometry and infinitesimals. And even if an authoritative source assures you that the resulting system is self-consistent, its mysterious and abstract nature would seem to make it difficult for you to develop any working picture of the theory that could play the role that mental sketches of graphs play in organizing your knowledge of calculus.
Models provide a way of dealing with both the logical issues and the conceptual ones. Figure a on page 81 “pops” off of the page, presenting a strong psychological impression of a curved surface rendered in perspective. This suggests finding an actual mathematical object, such as a curved surface, that satisfies all the axioms of a certain logical system, such as non-Euclidean geometry. Note that the model may contain extrinsic elements, such as the existence of a third dimension, that are not connected to the system being modeled.
Let's focus first on consistency. In general, what can we say about the self-consistency of a mathematical system? To start with, we can never prove anything about the consistency or lack of consistency of something that is not a well-defined formal system, e.g., the Bible. Even Euclid's Elements, which was a model of formal rigor for thousands of years, is loose enough to allow considerable ambiguity. If you're inclined to scoff at the silly Renaissance mathematicians who kept trying to prove the parallel postulate E5 from postulates E1-E4, consider the following argument. Suppose that we replace E5 with E5', which states that parallels don't exist: given a line and a point not on the line, no line can ever be drawn through the point and parallel to the given line. In the new system of plane geometry E' consisting of E1-E4 plus E5', we can prove a variety of theorems, and one of them is that there is an upper limit on the area of any figure. This imposes a limit on the size of circles, and that appears to contradict E3, which says we can construct a circle with any radius. We therefore conclude that E' lacks self-consistency. Oops! As your high school geometry text undoubtedly mentioned in passing, E' is a perfectly respectable system called elliptic geometry. So what's wrong with this supposed proof of its lack of self-consistency? The issue is the exact statement of E3. E3 does not say that we can construct a circle given any real number as its radius. Euclid could not have intended any such interpretation, since he had no notion of real numbers. To Euclid, geometry was primary, and numbers were geometrically constructed objects, being represented as lengths, angles, areas, and volumes. A literal translation of Euclid's statement of the axiom is “To describe a circle with any center and distance.”2 “Distance” means a line segment. There is therefore no contradiction in E', because E' has a limit on the lengths of line segments.
Now suppose that such ambiguities have been eliminated from the system's basic definitions and axioms. In general, we expect it to be easier to prove an inconsistent system's inconsistency than to demonstrate the consistency of a consistent one. In the former case, we can start cranking out theorems, and if we can find a way to prove both proposition P and its negation ≠gP, then obviously something is wrong with the system. One might wonder whether such a contradiction could remain contained within one corner of the system, like nuclear waste. It can't. Aristotelian logic allows proof by contradiction: if we prove both P and ≠gP based on certain assumptions, then our assumptions must have been wrong. If we can prove both P and ≠gP without making any assumptions, then proof by contradiction allows us to establish the truth of any randomly chosen proposition. Thus a single contradiction is sufficient, in Aristotelian logic, to invalidate the entire system. This goes by the Latin rubric ex falso quodlibet, meaning “from a falsehood, whatever you please.” Thus any contradiction proves the inconsistency of the entire system.
Proving consistency is harder. If you're mathematically sophisticated, you may be tempted to leap directly to Gödel's theorem, and state that nobody can ever prove the self-consistency of a mathematical system. This would be a misapplication of Gödel. Gödel's theorem only applies to mathematical systems that meet certain technical criteria, and some of the interesting systems we're dealing with don't meet those criteria; in particular, Gödel's theorem doesn't apply to Euclidean geometry, and Euclidean geometry was proved self-consistent by Tarski and his students around 1950. Furthermore, we usually don't require an absolute proof of self-consistency. Usually we're satisfied if we can prove that a certain system, such as elliptic geometry, is at least as self-consistent as another system, such as Euclidean geometry. This is called equiconsistency. The general technique for proving equiconsistency of two theories is to show that a model of one can be constructed within the other.
Suppose, for example, that we construct a geometry in which the space of points is the surface of a sphere, and lines are understood to be the geodesics, i.e., the great circles whose centers coincide at the sphere's center. This geometry, called spherical geometry, is useful in cartography and navigation. It is non-Euclidean, as we can demonstrate by exhibiting at least one proposition that is false in Euclidean geometry. For example, construct a triangle on the earth's surface with one corner at the north pole, and the other two at the equator, separated by 90 degrees of longitude. The sum of its interior angles is 270 degrees, contradicting Euclid, book I, proposition 32. Spherical geometry must therefore violate at least one of the axioms E1-E5, and indeed it violates both E1 (because no unique line is determined by two antipodal points such as the north and south poles) and E5 (because parallels don't exist at all).
A closely related construction gives a model of elliptic geometry, in which E1 holds, and only E5 is thrown overboard. To accomplish this, we model a point using a diameter of the sphere,3 and a line as the set of all diameters lying in a certain plane. This has the effect of identifying antipodal points, so that there is now no violation of E1. Roughly speaking, this is like lopping off half of the sphere, but making the edges wrap around. Since this model of elliptic geometry is embedded within a Euclidean space, all the axioms of elliptic geometry can now be proved as theorems in Euclidean geometry. If a contradiction arose from them, it would imply a contradiction in the axioms of Euclidean geometry. We conclude that elliptic geometry is equiconsistent with Euclidean geometry. This was known long before Tarski's 1950 proof of Euclidean geometry's self-consistency, but since nobody was losing any sleep over hidden contradictions in Euclidean geometry, mathematicians stopped wasting their time looking for contradictions in elliptic geometry.
A model of this system can be constructed within the real number system by defining d as the identity function
d(x)=x and forming the set of functions of the form f(d)=P(d)/Q(d),
where P and Q are polynomials with real coefficients.
The ordering of functions f and g is defined according to the sign of
.
Axioms 1-3 can all be proved from the real-number axioms. Therefore this system, which includes infinitesimals, is equiconsistent with the reals. More elaborate
constructions can extend this to systems that have more of the properties of the reals, and
a browser-based calculator that implements such a system is available at lightandmatter.com/calc/inf.
Abraham Robinson extended this in 1966 to all of analysis,
and thus there is nothing intrinsically nonrigorous about doing analysis in the style of
Gauss and Euler, with symbols like dx representing infinitesimally small
quantities.1
Besides proving consistency, these models give us insight into what's going on. The model of elliptic geometry suggests an insight into the reason that there is an upper limit on lengths and areas: it is because the space wraps around on itself. The model of infinitesimals suggests a fact that is not immediately obvious from the axioms: the infinitesimal quantities compose a hierarchy, so that for example 7d is in finite proportion to d, while d2 is like a “lesser flea” in Swift's doggerel: “Big fleas have little fleas/ On their backs to ride 'em,/ and little fleas have lesser fleas,/And so, ad infinitum.”
Spherical and elliptic geometry are not valid models of a general-relativistic spacetime, since they are locally Euclidean rather than Lorentzian, but they still provide us with enough conceptual guidance to come up with some ideas that might never have occurred to us otherwise:
Self-check: Prove from the axioms E' that elliptic geometry, unlike spherical geometry, cannot have a lune with two distinct vertices. Convince yourself nevertheless, using the spherical model of E', that it is possible in elliptic geometry for two lines to enclose a region of space, in the sense that from any point P in the region, a ray emitted in any direction must intersect one of the two lines. Summarize these observations with a characterization of lunes in elliptic geometry versus lunes in spherical geometry.
Models can be dangerous, because they can tempt us to impute physical reality to features that are purely extrinsic, i.e., that are only present in that particular model. This is as opposed to intrinsic features, which are present in all models, and which are therefore logically implied by the axioms of the system itself. The existence of lunes is clearly an intrinsic feature of non-Euclidean geometries, because intersection of lines was defined before any model has even been proposed.
a / Example 4.
Self-check: How would the ideas of example 4 apply to a cone?
Example 4 shows that it can be difficult to sniff out bogus extrinsic features that seem intrinsic,
and example 3 suggests the desirability of developing methods of calculation that never refer to
any extrinsic quantities, so that we never have to worry whether a symbol like R staring up at us
from a piece of paper is intrinsic. This is why it is unlikely to be helpful to a student of general
relativity to pick up a book on differential geometry that was written without general relativity
specifically in mind. Such books have a tendency to casually mix together intrinsic and extrinsic
notation. For example, a vector cross product
refers to a vector poking out of
the plane occupied by a and b, and the space outside the plane may be extrinsic;
it is not obvious how to generalize this operation to the 3+1 dimensions of relativity (since the
cross product is a three-dimensional beast), and even if it were, we could not be assured that it
would have any intrinsically well defined meaning.
To see how to proceed in creating a manifestly intrinsic notation, consider the two types of intrinsic observations that are available in general relativity:
Incidence measurements, for example detection of gravitational lensing, are global, but they are the only global observations we can do.4 If we were limited entirely to incidence, spacetime would be described by the austere system of projective geometry, a geometry without parallels or measurement. In projective geometry, all propositions are essentially statements about combinatorics, e.g., that it is impossible to plant seven trees so that they form seven lines of three trees each.
But:
This gives us more power, but not as much as we might expect. Suppose we define a coordinate such as t or x. In Newtonian mechanics, these coordinates would form a predefined background, a preexisting stage for the actors. In relativity, on the other hand, consider a completely arbitrary change of coordinates of the form x arrow x'=f(x), where f is a smooth one-to-one function. For example, we could have x arrow x+px3+qsin(rx) (with p and q chosen small enough so that the mapping is always one-to-one). Since the mapping is one-to-one, the new coordinate system preserves all the incidence relations. Since the mapping is smooth, the new coordinate system is still compatible with the existence of local Lorentz frames. The difference between the two coordinate systems is therefore entirely extrinsic, and we conclude that a manifestly intrinsic notation should avoid any explicit reference to a coordinate system. That is, if we write a calculation in which a symbol such as x appears, we need to make sure that nowhere in the notation is there any hidden assumption that x comes from any particular coordinate system. For example, the equation should still be valid if the generic symbol x is later taken to represent the distance r from some center of symmetry. This coordinate-independence property is also known as general covariance, and this type of smooth change of coordinates is also called a diffeomorphism.
As an exotic example of a change of coordinates, take a torus and label it with coordinates (θ,φ), where θ+2π is taken to be the same as θ, and similarly for φ. Now subject it to the coordinate transformation T defined by θarrowθ+φ, which is like opening the torus, twisting it by a full circle, and then joining the ends back together. T is known as the “Dehn twist,” and it is different from most of the coordinate transformations we do in relativity because it can't be done smoothly, i.e., there is no continuous function f(x) on 0≤ x≤ 1 such that every value of f is a smooth coordinate transformation, f(0) is the identity transformation, and f(1)=T.
A good application of these ideas is to the question of what the world would look like in a frame of reference moving at the speed of light. This question has a long and honorable history. As a young student, Einstein tried to imagine what an electromagnetic wave would look like from the point of view of a motorcyclist riding alongside it. We now know, thanks to Einstein himself, that it really doesn't make sense to talk about such observers.
The most straightforward argument is based on the positivist idea that concepts only mean something if you can define how to measure them operationally. If we accept this philosophical stance (which is by no means compatible with every concept we ever discuss in physics), then we need to be able to physically realize this frame in terms of an observer and measuring devices. But we can't. It would take an infinite amount of energy to accelerate Einstein and his motorcycle to the speed of light.
Since arguments from positivism can often kill off perfectly interesting and reasonable concepts, we might ask whether there are other reasons not to allow such frames. There are. Recall that we placed two technical conditions on coordinate transformations: they are supposed to be smooth and one-to-one. The smoothness condition is related to the inability to boost Einstein's motorcycle into the speed-of-light frame by any continuous, classical process. (Relativity is a classical theory.) But independent of that, we have a problem with the one-to-one requirement. Figure b shows what happens if we do a series of Lorentz boosts to higher and higher velocities. It should be clear that if we could do a boost up to a velocity of c, we would have effected a coordinate transformation that was not one-to-one. Every point in the plane would be mapped onto a single lightlike line.
a / The tick marks on the line define a coordinate measured along the line. It is not possible to set up such a coordinate system globally so that the coordinate is uniform everywhere. The arrows represent changes in the value coordinate; since the changes in the coordinate are all equal, the arrows are all the same length.
b / The vectors dxμ and dxμ are duals of each other.
Applying these considerations to the creation of a manifestly intrinsic notation, consider a coordinate x defined along a certain curve, which is not necessarily a geodesic. For concreteness, imagine this curve to exist in two spacelike dimensions, which we can visualize as the surface of a sphere embedded in Euclidean 3-space. These concrete features are not strictly necessary, but they drive home the point that we should not expect to be able to define x so that it varies at a steady rate with elapsed distance; it is not possible to define this type of uniform, Cartesian coordinate system on the surface of a sphere. In the figure, the tick marks are therefore not evenly spaced. This is perfectly all right, given the coordinate invariance of general relativity. Since the incremental changes in x are equal, I've represented them below the curve as little vectors of equal length. They are the wrong length to represent distances along the curve, but this wrongness is an inevitable fact of life in relativity.
Now suppose we want to integrate the arc length of a segment of this curve. The little vectors are
infinitesimal. In the integrated length, each little vector should contribute some amount, which is a scalar.
This scalar is not simply the magnitude of the vector,
, since the vectors are the wrong length. We therefore
need some mathematical rule, some function, that accepts a vector as its input and gives a scalar as its
output. This function is a locally adjustable fudge factor that compensates for the wrong lengths of the
little vectors. Since the space is locally flat and uniform, the function must be linear, and from linear algebra,
we know that the most general function of this kind is an inner product. If the little arrow is a row vector,
then the function would be represented by taking the row vector's inner product with some column vector to give ds2.
Of course the distinction between row and column vectors is pointless in a one-dimensional space, but
it should be clear that this will provide an appropriate foundation for the generalization to more than one
coordinate. The row and column vectors are referred to as one another's duals.
Figure b shows the resulting picture. Anticipating the generalization to four-dimensional
spacetime with coordinates (x0,x1,x2,x3), we'll start referring to x as xμ, although in our
present one-dimensional example μ=0 is fixed. The reason for the
use of the odd-looking superscripts, rather than subscripts, will become clear shortly.
The vectors drawn below the curve are called the contravariant vectors, notated dxμ, and the ones
above it are the covariant vectors, dxμ.
It's not particularly important to keep track of
which is which, since the relationship between them is symmetric, like the relationship between row
and column vectors. Each is the dual of the other. The arc length is given by
,
or, equivalently we say ds2=dxμ dxμ. (Remember, μ=0.)
Given a dxμ, how do we find its dual dxμ, and vice versa? In one dimension, we simply need to introduce a real number g as a correction factor. If one of the vectors is shorter than it should be in a certain region, the correction factor serves to compensate by making its dual proportionately longer. The two possible mappings (covariant to contravariant and contravariant to covariant) are accomplished with factors of g and 1/g. The number g is called the metric, and it encodes all the information about distances. For example, if φ represents longitude measured at the arctic circle, then the metric is the only source for the datum that a displacement dφ corresponds to 2540 km per radian.
Now let's generalize to more than one dimension. Because globally Cartesian coordinate systems can't be imposed on a curved space, the constant-coordinate lines will in general be neither evenly spaced nor perpendicular to one another. If we construct a local set of basis vectors lying along the intersections of the constant-coordinate surfaces, they will not form an orthonormal set. We would like to have an expression of the form ds2=Σdxμ dxμ for the squared arc length, and in differential geometry we practice the convenient notational convention, introduced by Einstein, of assuming a summation when an index is repeated, so this becomes
c / Example 8.
In a Euclidean plane, where the distinction between covariant and contravariant vectors is irrelevant, this expression for ds2 is simply the Pythagorean theorem, summed over two values of i for the two coordinates:
The symbols dx, dx0, and dx0 are all synonyms, and likewise for dy, dx1, and dx1.
In the non-Euclidean case, the Pythagorean theorem is false; dxμ and dxμ are no longer synonyms, so their product is no longer simply the square of a distance. To see this more explicitly, let's write the expression so that only the covariant quantities occur. By local flatness, the relationship between the covariant and contravariant vectors is linear, and the most general relationship of this kind is given by making the metric a symmetric matrix gμν. Substituting dxμ=gμνxν, we have
where there are now implied sums over both μ and ν. Notice how implied sums occur only when the repeated index occurs once as a superscript and once as a subscript; other combinations are ungrammatical.
Self-check: Why does it make sense to demand that the metric be symmetric?
In an introductory course in Newtonian mechanics, one makes a distinction between vectors, which have
a direction in space, and scalars, which do not. These are specific examples of tensors, which
can be expressed as objects with m superscripts and n subscripts. A scalar has m=n=0. A covariant vector has
(m,n)=(0,1), a contravariant vector (1,0), and the metric (0,2). We refer to the number of indices as the
rank of the tensor. Tensors are discussed in more detail, and defined
more rigorously, in chapter 4. For our present purposes, it is important to note that just because
we write a symbol with subscripts or superscripts, that doesn't mean it deserves to be called a tensor. This point
can be understood in the more elementary context of Newtonian scalars and vectors. For example, we can define
a Newtonian “vector”
, where m is the mass of the moon, T is the temperature in
Chicago, and e is the charge of the electron. This creature u doesn't deserve to be called a vector,
because it doesn't behave as a vector under rotation. Similarly, a tensor is required to behave in a certain
way under rotations and Lorentz boosts.
When discussing the symmetry of rank-2 tensors, it is convenient to introduce the following notation:

![T_{[ab]} = frac{1}{2}left(T_{ab}-T_{ba}right)](math/eq_d368700b.png)
Any Tab can be split into symmetric and antisymmetric parts. This is similar to writing an arbitrary
function as a sum of and odd function and an even function.
The metric has only a symmetric part: g(ab)=gab, and
. This notation is generalized to ranks greater than 2 on page
166.
Self-check: Characterize an antisymmetric rank-2 tensor in two dimensions.
◊ If we change our units of measurement so that xμ arrow α xμ, while demanding that ds2 come out the same, then we need gμν arrow α-2gμν.
Notice how in example 7 we started from the generally valid relation ds2=gμν dxμ dxν, but soon began writing down facts like gθθ=r2 that were only valid in this particular coordinate system. To make it clear when this is happening, we adopt a convention introduced by Roger Penrose known as the abstract index notation. In this convention, Latin superscripts and subscripts indicate that an equation is of general validity, without regard to any choice of coordinate system, while Greek ones are used for coordinate-dependent equations. For example, we can write the general expression for squared differential arc length with Latin indices,
because it holds regardless of the coordinate system, whereas the vanishing of the off-diagonal elements of the metric in Euclidean polar coordinates has to be written as gμν=0 for μ ≠ ν, since it would in general be false if we used a different coordinate system to describe the same Euclidean plane. The advantages of this notation became widely apparent to relativists starting around 1980, so for example it is used in the text by Wald (1984), but not in Misner, Thorne, and Wheeler (1970). Some of the older literature uses a notation in which the Greek and Latin indices are instead used to distinguish between timelike and spacelike components of a vector, but this usage is dying out, since it inappropriately singles out a distinction between time and space that is not actually preserved under a Lorentz boost.
◊ Since the coordinates differ from Cartesian coordinates only in the angle between the axes, not in their
scales, a displacement dxi along either axis, i=1 or 2, must give ds=dx, so for the diagonal
elements we have g11=g22=1. The metric is always symmetric, so
g12=g21. To fix these off-diagonal elements, consider a displacement by ds in the
direction perpendicular to axis 1. This changes the coordinates by
and
. We then have



In one dimension, g is a single number, and lengths are given by
.
The square root can also be understood through example 6 on
page 93, in which we saw that
a uniform rescaling x arrow α x is reflected in gμν arrow α-2gμν.
In two-dimensional Cartesian coordinates, multiplication of the width and height of a rectangle
gives the element of area
. Because the coordinates
are orthogonal, g is diagonal, and the factor of
is identified as the
square root of its determinant, so
. Note that the scales on
the two axes are not necessarily the same, g11≠ g22.
The same expression for the element of area holds even if the coordinates are not orthogonal.
In example 8, for instance, we have
,
which is the right correction factor corresponding to the fact that dx1 and dx2 form
a parallelepiped rather than a rectangle.
For coordinates (θ,φ) on the surface of a sphere of radius r, we have, by an argument similar to that of example 7 on page 93, gθθ=r2, gφφ=r2sin2θ, gθφ=0. The area of the sphere is




◊ Relate gij to gij.
◊ The notation is intended to treat covariant and contravariant vectors completely symmetrically. The metric with lower indices gij can be interpreted as a change-of-basis transformation from a contravariant basis to a covariant one, and if the symmetry of the notation is to be maintained, gij must be the corresponding inverse matrix, which changes from the covariant basis to the contravariant one. The metric must always be invertible.
In a locally Euclidean space, the Pythagorean theorem allows us to express the metric in local Cartesian coordinates in the simple form gμμ=+1, gμν=0, i.e., g=diag(+1,+1,…,+1). This is not the appropriate metric for a locally Lorentz space. The axioms of Euclidean geometry E3 (existence of circles) and E4 (equality of right angles) describe the theory's invariance under rotations, and the Pythagorean theorem is consistent with this, because it gives the same answer for the length of a vector even if its components are reexpressed in a new basis that is rotated with respect to the original one. In a Lorentzian geometry, however, we care about invariance under Lorentz boosts, which do not preserve the quantity t2+x2. It is not circles in the (t,x) plane that are invariant, but light cones, and this is described by giving gtt and gxx opposite signs and equal absolute values. A lightlike vector (t,x), with t=x, therefore has a magnitude of exactly zero,
and this remains true after the Lorentz boost (t,x) arrow (γ t,γ x). It is a matter of convention which element of the metric to make positive and which to make negative. In this book, I'll use gtt=+1 and gxx=-1, so that g=diag(+1,-1). This has the advantage that any line segment representing the timelike world-line of a physical object has a positive squared magnitude; the forward flow of time is represented as a positive number, in keeping with the philosophy that relativity is basically a theory of how causal relationships work. With this sign convention, spacelike vectors have positive squared magnitudes, timelike ones negative. The same convention is followed, for example, by Penrose. The opposite version, with g=diag(-1,+1) is used by authors such as Wald and Misner, Thorne, and Wheeler.
Our universe does not have just one spatial dimension, it has three, so the full metric in a Lorentz
frame is given by
g=diag(+1,-1,-1,-1).
In Euclidean geometry, the dot product of vectors a and b is given by
gxxaxbx+gyyayby+gzzazbz=axbx+ayby+azbz, and in the special case
where
we have the squared magnitude.
In the tensor notation, aμ bν=a1b1+a2b2+a3b3.
Like magnitudes, dot products are invariant under rotations. This is because knowing
the dot product of vectors a and b entails knowing the value of
,
and Euclid's E4 (equality of right angles) implies that the angle
is invariant.
the same axioms also entail invariance of dot products under translation; Euclid waits only until the
second proposition of the Elements to prove that line segments can be copied from one location
to another. This seeming triviality is actually false as a description of physical
space, because it amounts to a statement that space has the same properties everywhere.
The set of all transformations that can be built out of successive translations, rotations, and reflections is called the group of isometries. It can also be defined as the group6 that preserves dot products, or the group that preserves congruence of triangles.
In Lorentzian geometry, we usually avoid the Euclidean term dot product and refer to the corresponding operation by the more general term inner product. In a specific coordinate system we have aμ bν=a0b0-a1b1-a2b2-a3b3. The inner product is invariant under Lorentz boosts, and also under the Euclidean isometries. The group found by making all possible combinations of continuous transformations7 from these two sets is called the Poincaré group. The Poincaré group is not the symmetry group of all of spacetime, since curved spacetime has different properties in different locations. The equivalence principle tells us, however, that space can be approximated locally as being flat, so the Poincaré group is locally valid, just as the Euclidean isometries are locally valid as a description of geometry on the Earth's curved surface.
In Euclidean geometry, the triangle inequality
follows from

The reason this quantity always comes out positive is that for two vectors of fixed magnitude, the greatest dot product is always achieved in the case where they lie along the same direction.
In Lorentzian geometry, the situation is different. Let b and c be timelike vectors, so that
they represent possible world-lines. Then the relation
suggests the existence of two
observers who take two different paths from one event to another. A goes by a direct route while B takes
a detour. The magnitude of each timelike vector represents the time elapsed on a clock carried by the
observer moving along that vector. The triangle equality is now reversed, becoming
.
The difference from the Euclidean case arises because inner products are no longer necessarily maximized if vectors
are in the same direction.
E.g., for two lightlike vectors, bicj vanishes entirely if b and c
are parallel.
For timelike vectors, parallelism actually minimizes the inner product rather
than maximizing it.5
In his 1872 inaugural address at the University of Erlangen, Felix Klein used the idea of groups of transformations to lay out a general classification scheme, known as the Erlangen program, for all the different types of geometry. Each geometry is described by the group of transformations, called the principal group, that preserves the truth of geometrical statements. Euclidean geometry's principal group consists of the isometries combined with arbitrary changes of scale, since there is nothing in Euclid's axioms that singles out a particular distance as a unit of measurement. In other words, the principal group consists of the transformations that preserve similarity, not just those that preserve congruence. Affine geometry's principal group is the transformations that preserve parallelism; it includes shear transformations, and there is therefore no invariant notion of angular measure or congruence. Unlike Euclidean and affine geometry, elliptic geometry does not have scale invariance. This is because there is a particular unit of distance that has special status; as we saw in example 3 on page 87, a being living in an elliptic plane can determine, by entirely intrinsic methods, a distance scale R, which we can interpret in the hemispherical model as the radius of the sphere. General relativity breaks this symmetry even more severely. Not only is there a scale associated with curvature, but the scale is different from one point in space to another.
d / Observer A, rotating with the carousel, measures an azimuthal distance with a ruler.
e / Einstein and Ehrenfest.
The following example was historically important, because Einstein used it to convince himself that general relativity should be described by non-Euclidean geometry.8 Its interpretation is also fairly subtle, and the early relativists had some trouble with it.
Suppose that observer A is on a spinning carousel while observer B stands on the ground. B says that A is accelerating, but by the equivalence principle A can say that she is at rest in a gravitational field, while B is free-falling out from under her. B measures the radius and circumference of the carousel, and finds that their ratio is 2π. A carries out similar measurements, but when she puts her meter-stick in the azimuthal direction it becomes Lorentz-contracted by the factor γ=(1-ω2r2)-1/2, so she finds that the ratio is greater than 2π. In A's coordinates, the spatial geometry is non-Euclidean, and the metric differs from the Euclidean one found in example 7 on page 93.
Observer A feels a force that B considers to be fictitious, but that, by the equivalence principle, A can say is a perfectly real gravitational force. According to A, an observer like B is free-falling away from the center of the disk under the influence of this gravitational field. A also observes that the spatial geometry of the carousel is non-Euclidean. Therefore it seems reasonable to conjecture that gravity can be described by non-Euclidean geometry, rather than as a physical force in the Newtonian sense.
At this point, you know as much about this example as Einstein did in 1912, when he began using it as the seed from which general relativity sprouted, collaborating with his old schoolmate, mathematician Marcel Grossmann, who knew about differential geometry. The remainder of subsection 3.4.4, which you may want to skip on a first reading, goes into more detail on the interpretation and mathematical description of the rotating frame of reference. Even more detailed treatments are given by Gr\o{}n9 and Dieks.10.
Ehrenfest11 described the following paradox. Suppose that observer B, in the lab frame, measures the radius of the disk to be r when the disk is at rest, and r' when the disk is spinning. B can also measure the corresponding circumferences C and C'. Because B is in an inertial frame, the spatial geometry does not appear non-Euclidean according to measurements carried out with his meter sticks, and therefore the Euclidean relations C=2π r and C'=2π r' both hold. The radial lines are perpendicular to their own motion, and they therefore have no length contraction, r=r', implying C=C'. The outer edge of the disk, however, is everywhere tangent to its own direction of motion, so it is Lorentz contracted, and therefore C'<C. The resolution of the paradox is that it rests on the incorrect assumption that a rigid disk can be made to rotate. If a perfectly rigid disk was initially not rotating, one would have to distort it in order to set it into rotation, because once it was rotating its outer edge would no longer have a length equal to 2π times its radius. Therefore if the disk is perfectly rigid, it can never be rotated. As discussed on page 58, relativity does not allow the existence of infinitely rigid or infinitely strong materials. If it did, then one could violate causality. If a perfectly rigid disk existed, vibrations in the disk would propagate at infinite velocity, so tapping the disk with a hammer in one place would result in the transmission of information at v>c to other parts of the disk, and then there would exist frames of reference in which the information was received before it was transmitted. The same applies if the hammer tap is used to impart rotational motion to the disk.
Self-check: What if we build the disk by assembling the building materials so that they are already rotating properly before they are joined together?
What if we try to get around these problems by applying torque uniformly all over the disk, so that the rotation starts smoothly and simultaneously everywhere? We then run into issues identical to the ones raised by Bell's spaceship paradox (p. 59). In fact, Ehrenfest's paradox is nothing more than Bell's paradox wrapped around into a circle. The same question of time synchronization comes up.
To spell this out mathematically, let's find the metric according to observer A by applying the change of coordinates θ'=θ-ω t. First we take the Euclidean metric of example 7 on page 93 and rewrite it as a (globally) Lorentzian metric in spacetime for observer B,
Applying the transformation into A's coordinates, we find

Recognizing ω r as the velocity of one frame relative to another, and (1-ω2 r2)-1/2 as γ, we see that we do have a relativistic time dilation effect in the dt2 term. But the dr2 and d θ'2 terms look Euclidean. Why don't we see any Lorentz contraction of the length scale in the azimuthal direction?
The answer is that coordinates in general relativity are arbitrary, and just because we can write down a certain set of coordinates, that doesn't mean they have any special physical interpretation. The coordinates (t,r,θ') do not correspond physically to the quantities that A would measure with clocks and meter-sticks. The tip-off is the dθ'dt cross-term. Suppose that A sends two cars driving around the circumference of the carousel, one clockwise and one counterclockwise, from the same point. If (t,r,θ') coordinates corresponded to clock and meter-stick measurements, then we would expect that when the cars met up again on the far side of the disk, their dashboards would show equal values of the arc length rθ' on their odometers and equal proper times ds on their clocks. But this is not the case, because the sign of the dθ'dt term is opposite for the two world-lines. The same effect occurs if we send beams of light in both directions around the disk, and this is the Sagnac effect (p. 66).
This is a symptom of the fact that the coordinate t is not properly synchronized between different places on the disk. We already know that we should not expect to be able to find a universal time coordinate that will match up with every clock, regardless of the clock's state of motion. Suppose we set ourselves a more modest goal. Can we find a universal time coordinate that will match up with every clock, provided that the clock is at rest relative to the rotating disk?
A trick for improving the situation is to eliminate the dθ'dt cross-term by completing the square in the metric []. The result is
![ds^2=(1-omega^2 r^2)left[dt+frac{omega r^2}{1-omega^2 r^2}dertheta'right]^2 - dr^2 - frac{r^2}{1-omega^2r^2}der theta'^2 qquad .](math/eq_7b6d82a3.png)
The interpretation of the quantity in square brackets is as follows. Suppose that two observers situate themselves on the edge of the disk, separated by an infinitesimal angle dθ'. They then synchronize their clocks by exchanging light pulses. The time of flight, measured in the lab frame, for each light pulse is the solution of the equation ds2=0, and the only difference between the clockwise result dt1 and the counterclockwise one dt2 arises from the sign of dθ'. The quantity in square brackets is the same in both cases, so the amount by which the clocks must be adjusted is dt=(dt2-dt1)/2, or

Substituting this into the metric, we are left with the purely spatial metric

The factor of (1-ω2r2)-1=γ2 in the d θ'2 term is simply the expected Lorentz-contraction factor. In other words, the circumference is, as expected, greater than 2π r by a factor of γ.
Does the metric [] represent the same non-Euclidean spatial geometry that A, rotating with the disk, would determine by meter-stick measurements? Yes and no. It can be interpreted as the one that A would determine by radar measurements. That is, if A measures a round-trip travel time dt for a light signal between points separated by coordinate distances dr and dθ', then A can say that the spatial separation is dt/2, and such measurements will be described correctly by []. Physical meter-sticks, however, present some problems. Meter-sticks rotating with the disk are subject to Coriolis and centrifugal forces, and this problem can't be avoided simply by making the meter-sticks infinitely rigid, because infinitely rigid objects are forbidden by relativity. In fact, these forces will inevitably be strong enough to destroy any meter stick that is brought out to r=1/ω, where the speed of the disk becomes equal to the speed of light.
It might appear that we could now define a global coordinate

interpreted as a time coordinate that was synchronized in a consistent way for all points on the disk. The trouble with this interpretation becomes evident when we imagine driving a car around the circumference of the disk, at a speed slow enough so that there is negligible time dilation of the car's dashboard clock relative to the clocks tied to the disk. Once the car gets back to its original position, θ' has increased by 2π, so it is no longer possible for the car's clock to be synchronized with the clocks tied to the disk. We conclude that it is not possible to synchronize clocks in a rotating frame of reference; if we try to do it, we will inevitably have to have a discontinuity somewhere. This problem is present even locally, as demonstrated by the possibility of measuring the Sagnac effect with apparatus that is small compared to the disk. The only reason we were able to get away with time synchronization in order to establish the metric [] is that all the physical manifestations of the impossibility of synchronization, e.g., the Sagnac effect, are proportional to the area of the region in which synchronization is attempted. Since we were only synchronizing two nearby points, the area enclosed by the light rays was zero.
The system requires synchronization of the atomic clocks carried aboard the satellites, and this synchronization also needs to be extended to the (less accurate) clocks built into the receiver units. It is impossible to carry out such a synchronization globally in the rotating frame in order to create coordinates (T,r,θ',φ). If we tried, it would result in discontinuities (see problem 2, p. 109). Instead, the GPS system handles clock synchronization in coordinates (t,r,θ',φ), as in equation []. These are known as the Earth-Centered Inertial (ECI) coordinates. The t coordinate in this system is not the one that users at neighboring points on the earth's surface would establish if they carried out clock synchronization using electromagnetic signals. It is simply the time coordinate of the nonrotating frame of reference tied to the earth's center. Conceptually, we can imagine this time coordinate as one that is established by sending out an electromagnetic “tick-tock” signal from the earth's center, with each satellite correcting the phase of the signal based on the propagation time inferred from its own r. In reality, this is accomplished by communication with a master control station in Colorado Springs, which communicates with the satellites via relays at Kwajalein, Ascension Island, Diego Garcia, and Cape Canaveral.
The determination of the spatial metric with rulers at rest relative to the disk is appealing because of its conceptual simplicity compared to complicated procedures involving radar, and this was presumably why Einstein presented the concept using ruler measurements in his 1916 paper laying out the general theory of relativity.12 In an effort to recover this simplicity, we could propose using external forces to compensate for the centrifugal and Coriolis forces to which the rulers would be subjected, causing them to stay straight and maintain their correct lengths. Something of this kind is carried out with the large mirrors of some telescopes, which have active systems that compensate for gravitational deflections and other effects. The first issue to worry about is that one would need some way to monitor a ruler's length and straightness. The monitoring system would presumably be based on measurements with beams of light, in which case the physical rulers themselves would become superfluous.
In addition, we would need to be able to manipulate the rulers in order to place them where we wanted them, and these manipulations would include angular accelerations. If such a thing was possible, then it would also amount to a loophole in the resolution of the Ehrenfest paradox. Could Ehrenfest's rotating disk be accelerated and decelerated with help from external forces, which would keep it from contorting into a potato chip? The problem we run into with such a strategy is one of clock synchronization. When it was time to impart an angular acceleration to the disk, all of the control systems would have to be activated simultaneously. But we have already seen that global clock synchronization cannot be realized for an object with finite area, and therefore there is a logical contradiction in this proposal. This makes it impossible to apply rigid angular acceleration to the disk, but not necessarily the rulers, which could in theory be one-dimensional.
So far we've considered a variety of examples in which the metric is predetermined. This is not the case in general relativity. For example, Einstein published general relativity in 1915, but it was not until 1916 that Schwarzschild found the metric for a spherical, gravitating body such as the sun or the earth.
When masses are present, finding the metric is analogous to finding the electric field made by charges, but the interpretation is more difficult. In the electromagnetic case, the field is found on a preexisting background of space and time. In general relativity, there is no preexisting geometry of spacetime. The metric tells us how to find distances in terms of our coordinates, but the coordinates themselves are completely arbitrary. So what does the metric even mean? This was an issue that caused Einstein great distress and confusion, and at one point, in 1914, it even led him to publish an incorrect, dead-end theory of gravity in which he abandoned coordinate-independence.
With the benefit of hindsight, we can consider these issues in terms of the general description of measurements in relativity given on page 88:
a / Einstein's hole argument.
b / A paradox? Planet A has no equatorial bulge, but B does. What cause produces this effect? Einstein reasoned that the cause couldn't be B's rotation, because each planet rotates relative to the other.
The main factor that led Einstein to his false start is known as the hole argument.
Suppose that we know about the distribution of matter throughout all of spacetime,
including a particular region of finite size --- the “hole” --- which contains no matter. By analogy with other classical field
theories, such as electromagnetism, we expect that the metric will be a solution to some kind of
differential equation, in which matter acts as the source term. We find
a metric
that solves the field equations for this set of sources, where x is some set of coordinates.
Now if the field equations are coordinate-independent, we can introduce a new set of coordinates
, which is
identical to x outside the hole, but differs from it on the inside. If we reexpress the metric in terms of
these new coordinates as
, then we are guaranteed that
is also a solution. But furthermore,
we can substitute x for
, and
will still be a solution. For outside the hole there is no
difference between the primed and unprimed quantities, and inside the hole there is no mass distribution that
has to match the metric's behavior on a point-by-point basis.
We conclude that in any coordinate-invariant
theory, it is impossible to uniquely determine the metric inside such a hole. Einstein initially decided that
this was unacceptable, because it showed a lack of determinism; in a classical theory such as general
relativity, we ought to be able to predict the evolution of the fields, and it would seem that there is
no way to predict the metric inside the hole. He eventually realized that this was an incorrect interpretation.
The only type of global
observation that general relativity lets us do is measurements of the incidence of world-lines.
Relabeling all the points inside the hole doesn't change any of the incidence relations. For example,
if two test particles sent into the region collide at a point x inside the hole, then
changing the point's name to
doesn't change the observable fact that they collided.
Another type of argument that made Einstein suffer is also resolved by a correct understanding of measurements, this time the use of measurements in local Lorentz frames. The earth is in hydrostatic equilibrium, and its equator bulges due to its rotation. Suppose that the universe was empty except for two planets, each rotating about the line connecting their centers.13 Since there are no stars or other external points of reference, the inhabitants of each planet have no external reference points against which to judge their rotation or lack of rotation. They can only determine their rotation, Einstein said, relative to the other planet. Now suppose that one planet has an equatorial bulge and the other doesn't. This seems to violate determinism, since there is no cause that could produce the differing effect. The people on either planet can consider themselves as rotating and the other planet as stationary, or they can describe the situation the other way around. Einstein believed that this argument proved that there could be no difference between the sizes of the two planets' equatorial bulges.
The flaw in Einstein's argument was that measurements in local Lorentz frames do allow one to make a distinction between rotation and a lack of rotation. For example, suppose that scientists on planet A notice that their world has no equatorial bulge, while planet B has one. They send a space probe with a clock to B, let it stay on B's surface for a few years, and then order it to return. When the clock is back in the lab, they compare it with another clock that stayed in the lab on planet A, and they find that less time has elapsed according to the one that spent time on B's surface. They conclude that planet B is rotating more quickly than planet A, and that the motion of B's surface was the cause of the observed time dilation. This resolution of the apparent paradox depends specifically on the Lorentzian form of the local geometry of spacetime; it is not available in, e.g., Cartan's curved-spacetime description of Newtonian gravity (see page 41).
Einstein's original, incorrect use of this example sprang from his interest in the ideas of the physicist and philosopher Ernst Mach. Mach had a somewhat ill-defined idea that since motion is only a well-defined notion when we speak of one object moving relative to another object, the inertia of an object must be caused by the influence of all the other matter in the universe. Einstein referred to this as Mach's principle. Einstein's false starts in constructing general relativity were frequently related to his attempts to make his theory too “Machian.” Section 8.3 on p. 285 discusses an alternative, more Machian theory of gravity proposed by Brans and Dicke in 1951.
This section discusses some of the issues that arise in the interpretation of coordinate independence. It can be skipped on a first reading.
One often hears statements like the following from relativists: “Coordinate independence isn't really a physical principle. It's merely an obvious statement about the relationship between mathematics and the physical universe. Obviously the universe doesn't come equipped with coordinates. We impose those coordinates on it, and the way in which we do so can never be dictated by nature.” The impressionable reader who is tempted to say, “Ah, yes, that is obvious,” should consider that it was far from obvious to Newton (“Absolute, true and mathematical time, of itself, and from its own nature flows equably without regard to anything external ...”), nor was it obvious to Einstein. Levi-Civita nudged Einstein in the direction of coordinate independence in 1912. Einstein tried hard to make a coordinate-independent theory, but for reasons described in section 3.5.1 (p. 104), he convinced himself that that was a dead end. In 1914-15 he published theories that were not coordinate-independent, which you will hear relativists describe as “obvious” dead ends because they lack any geometrical interpretation. It seems to me that it takes a highly refined intuition to regard as intuitively “obvious” an issue that Einstein struggled with like Jacob wrestling with Elohim.
It has also been alleged that coordinate independence is trivial. To gauge the justice of this complaint, let's distinguish between two reasons for caring about coordinate independence:
Nobody questions the first justification. The second is a little trickier. Laying out the general theory systematically in a 1916 paper,14 Einstein wrote “The general laws of nature are to be expressed by equations which hold good for all the systems of coordinates, that is, are covariant with respect to any substitutions whatever (generally covariant).” In other words, he was explaining why, with hindsight, his 1914-1915 coordinate-dependent theory had to be a dead end.
The only trouble with this is that Einstein's way of posing the criterion didn't quite hit the nail on the head mathematically. As Hilbert famously remarked, “Every boy in the streets of Göttingen understands more about four-dimensional geometry than Einstein. Yet, in spite of that, Einstein did the work and not the mathematicians.” What Einstein had in mind was that a theory like Newtonian mechanics not only lacks coordinate independence, but would also be impossible to put into a coordinate-independent form without making it look hopelessly complicated and ugly, like putting lipstick on a pig. But Kretschmann showed in 1917 that any theory could be put in coordinate independent form, and Cartan demonstrated in 1923 that this could be done for Newtonian mechanics in a way that didn't come out particularly ugly. Physicists today are more apt to pose the distinction in terms of “background independence” (meaning that a theory should not be phrased in terms of an assumed geometrical background) or lack of a “prior geometry” (meaning that the curvature of spacetime should come from the solution of field equations rather than being imposed by fiat). But these concepts as well have resisted precise mathematical formulation.15 My feeling is that this general idea of coordinate independence or background independence is like the equivalence principle: a crucial conceptual principle that doesn't lose its importance just because we can't put it in a mathematical box with a ribbon and a bow. For example, string theorists take it as a serious criticism of their theory that it is not manifestly background independent, and one of their goals is to show that it has a background independence that just isn't obvious on the surface.
a / Since magnetic field lines can never intersect, a magnetic field pattern contains coordinate-independent information in the form of the knotting of the lines. This figure shows the magnetic field pattern of the star SU Aurigae, as measured by Zeeman-Doppler imaging (Petit at al.). White lines represent magnetic field lines that close upon themselves in the immediate vicinity of the star; blue lines are those that extend out into the interstellar medium.
It is instructive to consider coordinate independence from the point of view of a field theory. Newtonian gravity can be described in three equivalent ways: as a gravitational field g, as a gravitational potential φ, or as a set of gravitational field lines. The field lines are never incident on one another, and locally the field satisfies Poisson's equation.
The electromagnetic field has polarization
properties different from those of the gravitational field,
so we describe it using either the two fields
, a pair of potentials,16
or two sets of field
lines. There are similar incidence conditions and local field equations (Maxwell's equations).
Gravitational fields in relativity have polarization properties unknown to Newton, but the situation is qualitatively similar to the two foregoing cases. Now consider the analogy between electromagnetism and relativity. In electromagnetism, it is the fields that are directly observable, so we expect the potentials to have some extrinsic properties. We can, for example, redefine our electrical ground, Φ arrow Φ+C, without any observable consequences. As discussed in more detail in section 5.6.1 on page 156, it is even possible to modify the electromagnetic potentials in an entirely arbitrary and nonlinear way that changes from point to point in spacetime. This is called a gauge transformation. In relativity, the gauge transformations are the smooth coordinate transformations. These gauge transformations distort the field lines without making them cut through one another.
1.
Consider a spacetime that is locally exactly like the standard Lorentzian spacetime described in ch. 2, but that
has a global structure differing in the following way from the one we have implicitly assumed. This spacetime has global
property G: Let two material particles
have world-lines that coincide at event A, with some nonzero relative velocity; then there may be some event B in the future
light-cone of A at which the particles' world-lines coincide again. This sounds like a description of something that
we would expect to happen in curved spacetime, but let's see whether that is necessary. We want to know whether this violates
the flat-space properties L1-L5 on page 46, if those properties are taken as local.
(a) Demonstrate that it does not violate them, by using a model in which space “wraps around” like a cylinder.
(b) Now consider the possibility of interpreting L1-L5 as global statements.
Do spacetimes with property G always violate L3 if L3 is taken globally?
(solution in the pdf version of the book)
2. Example 13 on page 101 discusses the discontinuity that would result if one attempted to define a time coordinate for the GPS system that was synchronized globally according to observers in the rotating frame, in the sense that neighboring observers could verify the synchronization by exchanging electromagnetic signals. Calculate this discontinuity at the equator, and estimate the resulting error in position that would be experienced by GPS users.
3. Resolve the following paradox.
Equation [] on page claims to give the metric obtained by an observer on the surface of a rotating disk. This metric is shown to lead to a non-Euclidean value for the ratio of the circumference of a circle to its radius, so the metric is clearly non-Euclidean. Therefore a local observer should be able to detect violations of the Pythagorean theorem.
And yet this metric was originally derived by a series of changes of coordinates, starting from the Euclidean metric in polar coordinates, as derived in example 7 on page 93. Section 3.3 (p. 87) argued that the intrinsic measurements available in relativity are not capable of detecting an arbitrary smooth, one-to-one change of coordinates. This contradicts our earlier conclusion that there are locally detectable violations of the Pythagorean theorem. (solution in the pdf version of the book)
4. This problem deals with properties of the metric [] on page . (a) A pulse of collimated light is emitted from the center of the disk in a certain direction. Does the spatial track of the pulse form a geodesic of this metric? (b) Characterize the behavior of the geodesics near r=1/ω. (c) An observer at rest with respect to the surface of the disk proposes to verify the non-Euclidean nature of the metric by doing local tests in which right triangles are formed out of laser beams, and violations of the Pythagorean theorem are detected. Will this work? (solution in the pdf version of the book)
5. In the early decades of relativity, many physicists were in the habit of speaking as if the Lorentz transformation described what an observer would actually “see” optically, e.g., with an eye or a camera. This is not the case, because there is an additional effect due to optical aberration: observers in different states of motion disagree about the direction from which a light ray originated. This is analogous to the situation in which a person driving in a convertible observes raindrops falling from the sky at an angle, even if an observer on the sidewalk sees them as falling vertically. In 1959, Terrell and Penrose independently provided correct analyses,17 showing that in reality an object may appear contracted, expanded, or rotated, depending on whether it is approaching the observer, passing by, or receding. The case of a sphere is especially interesting. Consider the following four cases:
Penrose showed that in case A, the outline of the sphere is still seen to be a circle, although regions on the sphere's surface appear distorted.
What can we say about the generalization to cases B, C, and D? (solution in the pdf version of the book)
6.
This problem involves a relativistic particle of mass m which is also a wave, as described by quantum mechanics.
Let c=1 and
throughout. Starting from the de Broglie
relations E=ω and p=k, where k is the wavenumber, find the dispersion
relation connecting ω to k. Calculate the group velocity, and verify that
it is consistent with the usual relations p=mγ v and E=mγ for m>0.
What goes wrong if you instead try to associate v with the phase velocity?
(solution in the pdf version of the book)
,
while the magnetic field is the curl of A. This is introduced at greater
length in section 4.2.5 on page 122.