You are viewing the html version of |

Contents

Section 3.1 - Tangent vectors

Section 3.2 - Affine notions and parallel transport

Section 3.3 - Models

Section 3.4 - Intrinsic quantities

Section 3.5 - The metric

Section 3.6 - The metric in general relativity

Section 3.7 - Interpretation of coordinate independence

Section 3.1 - Tangent vectors

Section 3.2 - Affine notions and parallel transport

Section 3.3 - Models

Section 3.4 - Intrinsic quantities

Section 3.5 - The metric

Section 3.6 - The metric in general relativity

Section 3.7 - Interpretation of coordinate independence

General relativity is described mathematically in the language of *differential geometry*. Let's take those two
terms in reverse order.

The *geometry* of spacetime is non-Euclidean, not just in the sense that the 3+1-dimensional geometry of
Lorentz frames is different than that of 4 interchangeable Euclidean dimensions, but also in the sense that
parallels do not behave in the way described by E5 or A1-A3. In a Lorentz frame, which describes space without
any gravitational fields, particles whose world-lines are initially parallel will continue along their parallel
world-lines forever. But in the presence of gravitational fields, initially parallel world-lines of free-falling particles
will in general diverge, approach,
or even cross. Thus, neither the existence nor the uniqueness of parallels can be assumed. We can't describe this
lack of parallelism as arising from the curvature of the world-lines, because we're using the world-lines of
free-falling particles as our definition of a “straight” line. Instead, we describe the effect as coming
from the curvature of spacetime itself. The Lorentzian geometry is a description of the case in which this
curvature is negligible.

What about the word *differential*? The equivalence principle states that even in the presence of gravitational
fields, local Lorentz frames exist. How local is “local?” If we use a microscope to zoom in on smaller and smaller
regions of spacetime, the Lorentzian approximation becomes better and better. Suppose we want to do experiments in
a laboratory, and we want to ensure that when we compare some physically observable quantity against predictions
made based on the Lorentz geometry, the resulting discrepancy will not be too large. If the acceptable error is
\(\epsilon\), then we should be able to get the error down that low if we're willing to make the size of our
laboratory no bigger than \(\delta\). This is clearly very similar to the Weierstrass style of defining limits
and derivatives in calculus. In calculus, the idea expressed by differentiation is that every smooth curve can be
approximated locally by a line; in general relativity, the equivalence principle tells us that curved spacetime
can be approximated locally by flat spacetime. But consider that no practitioner of calculus habitually solves
problems by filling sheets of scratch paper with epsilons and deltas. Instead, she uses the Leibniz notation,
in which \(dy\) and \(dx\) are interpreted as infinitesimally small numbers.
You may be inclined, based on your previous training, to dismiss infinitesimals as neither rigorous nor necessary.
In 1966, Abraham Robinson demonstrated that concerns about rigor had been unfounded; we'll come back
to this point in section 3.3. Although it is true that any calculation written using infinitesimals can also
be carried out using limits, the following example shows how much more well suited the infinitesimal language is to
differential geometry.

The area of a region S in the Cartesian plane can be calculated as \(\int_S dA\), where \(dA=dxdy\) is the area of an infinitesimal rectangle of width \(dx\) and height \(dy\). A curved surface such as a sphere does not admit a global Cartesian coordinate system in which the constant coordinate curves are both uniformly spaced and perpendicular to one another. For example, lines of longitude on the earth's surface grow closer together as one moves away from the equator. Letting \(\theta\) be the angle with respect to the pole, and \(\phi\) the azimuthal angle, the approximately rectangular patch bounded by \(\theta\), \(\theta+d\theta\), \(\phi\), and \(\phi+d\phi\) has width \(r\sin\thetad\theta\) and height \(rd\phi\), giving \(dA=r^2\sin\thetad\thetad\phi\). If you look at the corresponding derivation in an elementary calculus textbook that strictly eschews infinitesimals, the technique is to start from scratch with Riemann sums. This is extremely laborious, and moreover must be carried out again for every new case. In differential geometry, the curvature of the space varies from one point to the next, and clearly we don't want to reinvent the wheel with Riemann sums an infinite number of times, once at each point in space.

A more formal definition of the notion of a tangent vector is given on p. 198.

An important example of the differential, i.e., local, nature of our geometry is the generalization of the affine parameter to a context broader than affine geometry.

Our construction of the affine parameter with a scaffolding of parallelograms depended on the existence and uniqueness of parallels expressed by A1, so we might imagine that there was no point in trying to generalize the construction to curved spacetime. But the equivalence principle tells us that spacetime is locally affine to some approximation. Concretely, clock-time is one example of an affine parameter, and the curvature of spacetime clearly can't prevent us from building a clock and releasing it on a free-fall trajectory. To generalize the recipe for the construction (figure a), the first obstacle is the ambiguity of the instruction to construct parallelogram \(01\text{q}_0\text{q}_1\), which requires us to draw \(1\text{q}_1\) parallel to \(0\text{q}_0\). Suppose we construe this as an instruction to make the two segments initially parallel, i.e., parallel as they depart the line at 0 and 1. By the time they get to \(\text{q}_0\) and \(\text{q}_1\), they may be converging or diverging.

Because parallelism is only approximate here,
there will be a certain amount of error in the construction of the affine parameter. One way of detecting such an
error is that lattices constructed with different initial distances will get out of step with one another.
For example, we can define \(\frac{1}{2}\) as before by requiring that the lattice constructed with initial segment
\(0\frac{1}{2}\) line up with the original lattice at 1. We will find, however, that they do *not* quite line up at
other points, such as 2. Let's use this discrepancy \(\epsilon=2-2'\) as a numerical measure of the error.
It will depend on both \(\delta_1\), the distance 01, and on \(\delta_2\), the
distance between 0 and \(\text{q}_0\). Since \(\epsilon\) vanishes for either \(\delta_1=0\) or \(\delta_2=0\), and since the equivalence
principle guarantees smooth behavior on small scales, the leading term in the error will in general
be proportional to the product \(\delta_1\delta_2\). In the language of infinitesimals, we can replace \(\delta_1\) and \(\delta_2\)
with infinitesimally short distances, which for simplicity we assume to be equal, and which we call \(d\lambda\).
Then the affine parameter \(\lambda\) is defined as \(\lambda=\int d\lambda\), where the error of order \(d\lambda^2\)
is, as usual, interpreted as the negligible discrepancy between the integral and its approximation as a Riemann sum.

If you were alert, you may have realized that I cheated you at a crucial point in this construction. We were to make \(1\text{q}_1\) and \(0\text{q}_0\) “initially parallel” as they left 01. How should we even define this idea of “initially parallel?” We could try to do it by making angles \(\text{q}_001\) and \(\text{q}_112\) equal, but this doesn't quite work, because it doesn't specify whether the angle is to the left or the right on the two-dimensional plane of the page. In three or more dimensions, the issue becomes even more serious. The construction workers building the lattice need to keep it all in one plane, but how do they do that in curved spacetime?

A mathematician's answer would be that our geometry lacks some additional structure called a *connection*,
which is a rule that specifies how one locally flat neighborhood is to be joined seamlessly onto another
locally flat neighborhood nearby. If you've ever bought two maps and tried to tape them together to make a
big map, you've formed a connection. If the maps were on a large enough scale, you also probably noticed that
this was impossible to do perfectly, because of the curvature of the earth.

Physically, the idea is that in flat spacetime, it is possible to construct inertial guidance systems like the ones discussed on page 74. Since they are possible in flat spacetime, they are also possible in locally flat neighborhoods of spacetime, and they can then be carried from one neighborhood to another.

In three space dimensions, a gyroscope's angular momentum vector maintains its direction, and we can orient other vectors, such as \(1\text{q}_1\), relative to it. Suppose for concreteness that the construction of the affine parameter above is being carried out in three space dimensions. We place a gyroscope at 0, orient its axis along \(0\text{q}_0\), slide it along the line to 1, and then construct \(1\text{q}_1\) along that axis.

In 3+1 dimensions, a gyroscope only does part of the job. We now have to maintain the direction of
a four-dimensional vector. Four-vectors will not be discussed in detail until section 4.2,
but similar devices can be used to maintain their orientations in spacetime.
These physical devices are ways of defining
a mathematical notion known as *parallel transport*,
which allows us to take a vector from one point to another in space.
In general, specifying a notion of parallel transport is equivalent to specifying a connection.

Parallel transport is path-dependent, as shown in figure b.

In the context of flat spacetime, the affine parameter was defined only along lines, not arbitrary curves, and could not be compared between lines running in different directions. In curved spacetime, the same limitation is present, but with “along lines” replaced by “along geodesics.” Figure c shows what goes wrong if we try to apply the construction to a world-line that isn't a geodesic. One definition of a geodesic is that it's the course we'll end up following if we navigate by keeping a fixed bearing relative to an inertial guidance device such as gyroscope; that is, the tangent to a geodesic, when parallel-transported farther along the geodesic, is still tangent. A non-geodesic curve lacks this property, and the effect on the construction of the affine parameter is that the segments \(n\text{q}_n\) drift more and more out of alignment with the curve.

A typical first reaction to the phrase “curved spacetime” --- or even “curved space,” for that matter --- is that it sounds like nonsense. How can featureless, empty space itself be curved or distorted? The concept of a distortion would seem to imply taking all the points and shoving them around in various directions as in a Picasso painting, so that distances between points are altered. But if space has no identifiable dents or scratches, it would seem impossible to determine which old points had been sent to which new points, and the distortion would have no observable effect at all. Why should we expect to be able to build differential geometry on such a logically dubious foundation? Indeed, historically, various mathematicians have had strong doubts about the logical self-consistency of both non-Euclidean geometry and infinitesimals. And even if an authoritative source assures you that the resulting system is self-consistent, its mysterious and abstract nature would seem to make it difficult for you to develop any working picture of the theory that could play the role that mental sketches of graphs play in organizing your knowledge of calculus.

*Models* provide a way of dealing with both the logical issues and the conceptual ones.
Figure a on page 89 “pops”
off of the page, presenting a strong psychological impression of a curved surface rendered in perspective.
This suggests finding an actual mathematical object, such as a curved surface, that satisfies all
the axioms of a certain logical system, such as non-Euclidean geometry. Note that the model may contain
extrinsic elements, such as the existence of a third dimension, that are not connected to the system
being modeled.

Let's focus first on consistency. In general, what can we say about the self-consistency of a mathematical system?
To start with, we can never prove anything about the consistency or lack of consistency of something that
is not a well-defined formal system, e.g., the Bible. Even Euclid's *Elements*, which was a model of
formal rigor for thousands of years, is loose enough to allow considerable ambiguity. If you're inclined
to scoff at the silly Renaissance mathematicians who kept trying to prove the parallel postulate E5 from
postulates E1-E4, consider the following argument. Suppose that we replace E5 with \(\text{E}5'\), which states that
parallels *don't* exist: given a line and a point not on the line, no line can ever be drawn through the point
and parallel to the given line. In the new system of plane geometry \(\text{E}'\) consisting of E1-E4 plus \(\text{E}5'\), we can prove a
variety of theorems, and one of them is that there is an upper limit on the area of any figure. This imposes
a limit on the size of circles, and that appears to contradict E3, which says we can construct a circle with
any radius. We therefore conclude that \(\text{E}'\) lacks self-consistency. Oops! As your high school geometry text
undoubtedly mentioned in passing, \(\text{E}'\) is a perfectly respectable system called elliptic geometry.
So what's wrong with this supposed proof of its lack of self-consistency? The issue is the exact statement of E3.
E3 does not say that we can construct a circle given any real number as its radius. Euclid could not have intended any
such interpretation, since he had no notion of
real numbers. To Euclid, geometry was primary, and numbers were geometrically constructed objects, being represented
as lengths, angles, areas, and volumes. A literal translation of Euclid's statement of the
axiom is “To describe a circle with any center and distance.”^{2} “Distance” means
a line segment. There is therefore no contradiction in \(\text{E}'\), because \(\text{E}'\) has a limit on the lengths of line segments.

Now suppose that such ambiguities have been eliminated from the system's basic definitions and axioms.
In general, we expect it to be easier to prove an inconsistent system's inconsistency than to demonstrate the
consistency of a consistent one. In the former case, we can start cranking out theorems, and if we can
find a way to prove both proposition P and its negation \(\neg\text{P}\), then obviously something is wrong
with the system. One might wonder whether such a contradiction could remain contained within one corner
of the system, like nuclear waste. It can't. Aristotelian logic allows proof by contradiction: if we prove both P and
\(\neg\text{P}\) based on certain assumptions, then our assumptions must have been wrong. If we can prove
both P and \(\neg\text{P}\) *without* making any assumptions, then proof by contradiction allows us
to establish the truth of *any* randomly chosen proposition. Thus a single contradiction
is sufficient, in Aristotelian logic, to invalidate the entire system. This goes by the Latin
rubric *ex falso quodlibet*, meaning “from a falsehood, whatever you please.” Thus any
contradiction proves the inconsistency of the entire system.

Proving consistency is harder. If you're mathematically sophisticated, you may be tempted to leap directly to Gödel's theorem, and state that nobody can ever prove the self-consistency of a mathematical system. This would be a misapplication of Gödel. Gödel's theorem only applies to mathematical systems that meet certain technical criteria, and some of the interesting systems we're dealing with don't meet those criteria; in particular, Gödel's theorem doesn't apply to Euclidean geometry, and Euclidean geometry was proved self-consistent by Tarski and his students around 1950. Furthermore, we usually don't require an absolute proof of self-consistency. Usually we're satisfied if we can prove that a certain system, such as elliptic geometry, is at least as self-consistent as another system, such as Euclidean geometry. This is called equiconsistency. The general technique for proving equiconsistency of two theories is to show that a model of one can be constructed within the other.

Suppose, for example, that we construct a geometry in which the space of points is the surface of a sphere, and lines are understood to be the geodesics, i.e., the great circles whose centers coincide at the sphere's center. This geometry, called spherical geometry, is useful in cartography and navigation. It is non-Euclidean, as we can demonstrate by exhibiting at least one proposition that is false in Euclidean geometry. For example, construct a triangle on the earth's surface with one corner at the north pole, and the other two at the equator, separated by 90 degrees of longitude. The sum of its interior angles is 270 degrees, contradicting Euclid, book I, proposition 32. Spherical geometry must therefore violate at least one of the axioms E1-E5, and indeed it violates both E1 (because no unique line is determined by two antipodal points such as the north and south poles) and E5 (because parallels don't exist at all).

A closely related construction gives a model of elliptic
geometry, in which E1 holds, and only E5 is thrown overboard. To accomplish this, we model a point using a diameter
of the sphere,^{3} and a line as the set of all diameters lying in a certain plane. This has the effect of identifying
antipodal points, so that there is now no violation of E1. Roughly speaking, this is like lopping off half of the sphere, but
making the edges wrap around.
Since this model of elliptic geometry is embedded within
a Euclidean space, all the axioms of elliptic geometry can now be proved as theorems in Euclidean geometry.
If a contradiction arose from them, it would imply a contradiction in the axioms of Euclidean geometry. We conclude
that elliptic geometry is equiconsistent with Euclidean geometry. This was known long before Tarski's 1950 proof of
Euclidean geometry's self-consistency, but since nobody was losing any sleep over hidden contradictions in Euclidean
geometry, mathematicians stopped wasting their time looking for contradictions in elliptic geometry.

- It is a field, i.e., it has addition, subtraction, multiplication, and division with the usual properties.
- It is an ordered geometry in the sense of O1-O4 on p. 19, and the ordering relates to addition and multiplication in the usual way.
- Existence of infinitesimals: There exists a positive number \(d\) such that \(d\lt1\), \(d\lt1/2\), \(d\lt1/3\), ...

A model of this system can be constructed within the real number system by defining \(d\) as the identity function
\(d(x)=x\) and forming the set of functions of the form \(f(d)=P(d)/Q(d)\),
where \(P\) and \(Q\) are polynomials with real coefficients.
The ordering of functions \(f\) and \(g\) is defined according to the sign of \(\lim_{x\rightarrow 0^+}f(x)-g(x)\).
Axioms 1-3 can all be proved from the real-number axioms. Therefore this system, which includes infinitesimals, is equiconsistent with the reals. More elaborate
constructions can extend this to systems that have more of the properties of the reals, and
a browser-based calculator that implements such a system is available at lightandmatter.com/calc/inf.
Abraham Robinson extended this in 1966 to all of analysis,
and thus there is nothing intrinsically nonrigorous about doing analysis in the style of
Gauss and Euler, with symbols like \(dx\) representing infinitesimally small
quantities.^{1}

Besides proving consistency, these models give us insight into what's going on. The model of elliptic geometry suggests an insight into the reason that there is an upper limit on lengths and areas: it is because the space wraps around on itself. The model of infinitesimals suggests a fact that is not immediately obvious from the axioms: the infinitesimal quantities compose a hierarchy, so that for example \(7d\) is in finite proportion to \(d\), while \(d^2\) is like a “lesser flea” in Swift's doggerel: “Big fleas have little fleas/ On their backs to ride 'em,/ and little fleas have lesser fleas,/And so, ad infinitum.”

Spherical and elliptic geometry are not valid models of a general-relativistic spacetime, since they are locally Euclidean rather than Lorentzian, but they still provide us with enough conceptual guidance to come up with some ideas that might never have occurred to us otherwise:

- In spherical geometry, we can have a two-sided polygon called a lune that encloses a nonzero area. In general relativity, a lune formed by the world-lines of two particles represents motion in which the particles separate but are later reunited, presumably because of some mass between them that created a gravitational field. An example is gravitational lensing.
- Both spherical models wraps around on themselves, so that they are not topologically equivalent to infinite planes. We therefore form a conjecture there may be a link between curvature, which is a local property, and topology, which is global. Such a connection is indeed observed in relativity. For example, cosmological solutions of the equations of general relativity come in two flavors. One type has enough matter in it to produce more than a certain critical amount of curvature, and this type is topologically closed. It describes a universe that has finite spatial volume, and that will only exist for a finite time before it recontracts in a Big Crunch. The other type, corresponding to the universe we actually inhabit, has infinite spatial volume, will exist for infinite time, and is topologically open.
- There is a distance scale set by the size of the sphere, with its inverse being a measure of curvature. In general relativity, we expect there to be a similar way to measure curvature numerically, although the curvature may vary from point to point.

Self-check: Prove from the axioms \(\text{E}'\) that elliptic geometry, unlike spherical geometry, cannot have a lune with two distinct vertices. Convince yourself nevertheless, using the spherical model of \(\text{E}'\), that it is possible in elliptic geometry for two lines to enclose a region of space, in the sense that from any point P in the region, a ray emitted in any direction must intersect one of the two lines. Summarize these observations with a characterization of lunes in elliptic geometry versus lunes in spherical geometry.

Models can be dangerous, because they can tempt us to impute physical reality to features that are purely extrinsic, i.e., that are only present in that particular model. This is as opposed to intrinsic features, which are present in all models, and which are therefore logically implied by the axioms of the system itself. The existence of lunes is clearly an intrinsic feature of non-Euclidean geometries, because intersection of lines was defined before any model has even been proposed.

Self-check: How would the ideas of example 4 apply to a cone?

Example 4 shows that it can be difficult to sniff out bogus extrinsic features that seem intrinsic, and example 3 suggests the desirability of developing methods of calculation that never refer to any extrinsic quantities, so that we never have to worry whether a symbol like \(R\) staring up at us from a piece of paper is intrinsic. This is why it is unlikely to be helpful to a student of general relativity to pick up a book on differential geometry that was written without general relativity specifically in mind. Such books have a tendency to casually mix together intrinsic and extrinsic notation. For example, a vector cross product \(\mathbf{a}\times\mathbf{b}\) refers to a vector poking out of the plane occupied by \(\mathbf{a}\) and \(\mathbf{b}\), and the space outside the plane may be extrinsic; it is not obvious how to generalize this operation to the 3+1 dimensions of relativity (since the cross product is a three-dimensional beast), and even if it were, we could not be assured that it would have any intrinsically well defined meaning.

To see how to proceed in creating a manifestly intrinsic notation, consider the two types of intrinsic observations that are available in general relativity:

- 1. We can tell whether events and world-lines are
*incident*: whether or not two lines intersect, two events coincide, or an event lies on a certain line.

Incidence measurements, for example detection of gravitational lensing, are global, but they are the *only*
global observations we can do.^{4} If we were limited entirely to incidence, spacetime would be described by
the austere system of projective geometry, a geometry without parallels or measurement.
In projective geometry, all propositions are essentially statements about combinatorics, e.g., that it is impossible to
plant seven trees so that they form seven lines of three trees each.

But:

- 2. We can also do measurements in local Lorentz frames.

This gives us more power, but not as much as we might expect. Suppose we define a coordinate such as \(t\) or \(x\). In Newtonian mechanics, these coordinates would form a predefined background, a preexisting stage for the actors. In relativity, on the other hand, consider a completely arbitrary change of coordinates of the form \(x \rightarrow x'=f(x)\), where \(f\) is a smooth one-to-one function. For example, we could have \(x \rightarrow x+px^3+q\sin(rx)\) (with \(p\) and \(q\) chosen small enough so that the mapping is always one-to-one). Since the mapping is one-to-one, the new coordinate system preserves all the incidence relations. Since the mapping is smooth, the new coordinate system is still compatible with the existence of local Lorentz frames. The difference between the two coordinate systems is therefore entirely extrinsic, and we conclude that a manifestly intrinsic notation should avoid any explicit reference to a coordinate system. That is, if we write a calculation in which a symbol such as \(x\) appears, we need to make sure that nowhere in the notation is there any hidden assumption that \(x\) comes from any particular coordinate system. For example, the equation should still be valid if the generic symbol \(x\) is later taken to represent the distance \(r\) from some center of symmetry. This coordinate-independence property is also known as general covariance, and this type of smooth change of coordinates is also called a diffeomorphism.

As an exotic example of a change of coordinates, take a torus and label it with coordinates \((\theta,\phi)\), where \(\theta+2\pi\) is taken to be the same as \(\theta\), and similarly for \(\phi\). Now subject it to the coordinate transformation T defined by \(\theta\rightarrow\theta+\phi\), which is like opening the torus, twisting it by a full circle, and then joining the ends back together. T is known as the “Dehn twist,” and it is different from most of the coordinate transformations we do in relativity because it can't be done smoothly, i.e., there is no continuous function \(f(x)\) on \(0\le x\le 1\) such that every value of \(f\) is a smooth coordinate transformation, \(f(0)\) is the identity transformation, and \(f(1)=T\).

The most straightforward argument is based on the positivist idea that concepts only mean something if you can define how to measure them operationally. If we accept this philosophical stance (which is by no means compatible with every concept we ever discuss in physics), then we need to be able to physically realize this frame in terms of an observer and measuring devices. But we can't. It would take an infinite amount of energy to accelerate Einstein and his motorcycle to the speed of light.

Since arguments from positivism can often kill off perfectly interesting and reasonable concepts, we might ask whether there are other reasons not to allow such frames. There are. Recall that we placed two technical conditions on coordinate transformations: they are supposed to be smooth and one-to-one. The smoothness condition is related to the inability to boost Einstein's motorcycle into the speed-of-light frame by any continuous, classical process. (Relativity is a classical theory.) But independent of that, we have a problem with the one-to-one requirement. Figure b shows what happens if we do a series of Lorentz boosts to higher and higher velocities. It should be clear that if we could do a boost up to a velocity of \(c\), we would have effected a coordinate transformation that was not one-to-one. Every point in the plane would be mapped onto a single lightlike line.

Consider a coordinate
\(x\) defined along a certain curve, which is not necessarily a geodesic. For concreteness, imagine this curve
to exist in two spacelike dimensions, which we can visualize as the surface of a sphere embedded in Euclidean
3-space. These concrete features are not strictly necessary, but they drive home the point that we should
not expect to be able to define \(x\) so that it varies at a steady rate with elapsed distance; for example,
we know that it will not be possible to define a two-dimensional Cartesian grid on the surface of
a sphere. In
the figure, the tick marks are therefore not evenly spaced. This is perfectly all right, given the coordinate
invariance of general relativity. Since the incremental changes in \(x\) *are*
equal, I've represented them below the curve as little vectors of equal length. They are the wrong length
to represent distances along the curve, but this wrongness is an inevitable fact of life in relativity.

Now suppose we want to integrate the arc length of a segment of this curve. The little vectors are
infinitesimal. In the integrated length, each little vector should contribute some amount, which is a scalar.
This scalar is not simply the magnitude of the vector,
\(ds \ne \sqrt{d\mathbf{x}\cdotd\mathbf{x}}\), since the vectors are the wrong length.
Figure a
is clearly reminiscent of the geometrical picture of vectors and dual vectors developed
on p. 48.
But the purely affine
notion of vectors and their duals is not enough to define the length of a vector in general; it is only
sufficient to define a length relative to other lengths along the same geodesic.
When vectors lie along different geodesics, we need to be able to specify the additional conversion
factor that allows us to compare one to the other. The piece of machinery that allows us to do
this is called a *metric*.

Fixing a metric allows us to define the proper scaling of the tick marks relative to the arrows at a given point, i.e., in the birdtracks notation it gives us a natural way of taking a displacement vector such as \(\rightarrow s\), with the arrow pointing into the symbol, and making a corresponding dual vector \(s \rightarrow\), with the arrow coming out. This is a little like cloning a person but making the clone be of the opposite sex. Hooking them up like \(s \rightarrow s\) then tells us the squared magnitude of the vector. For example, if \(\rightarrow dx\) is an infinitesimal timelike displacement, then \(dx \rightarrow dx\) is the squared time interval \(dx^2\) measured by a clock traveling along that displacement in spacetime. (Note that in the notation \(dx^2\), it's clear that \(dx\) is a scalar, because unlike \(\rightarrow dx\) and \(dx \rightarrow\) it doesn't have any arrow coming in or out of it.) Figure b shows the resulting picture.

In the abstract index notation introduced on p. 51, the vectors \(\rightarrow dx\) and \(dx \rightarrow\) are written \(dx^a\) and \(dx_a\). When a specific coordinate system has been fixed, we write these with concrete, Greek indices, \(dx^\mu\) and \(dx_\mu\). In an older and conceptually incompatible notation and terminology due to Sylvester (1853), one refers to \(dx^\mu\) as a contravariant vector, and \(dx_\mu\) as covariant. The confusing terminology is summarized on p. .

The assumption that a metric exists is nontrivial. There is no metric in Galilean spacetime, for example, since in the limit \(c\rightarrow\infty\) the units used to measure timelike and spacelike displacements are not comparable. Assuming the existence of a metric is equivalent to assuming that the universe holds at least one physically manipulable clock or ruler that can be moved over long distances and accelerated as desired. In the distant future, large and causally isolated regions of the cosmos may contain only massless particles such as photons, which cannot be used to build clocks (or, equivalently, rulers); the physics of these regions will be fully describable without a metric. If, on the other hand, our world contains not just zero or one but two or more clocks, then the metric hypothesis requires that these clocks maintain a consistent relative rate when accelerated along the same world-line. This consistency is what allows us to think of relativity as a theory of space and time rather than a theory of clocks and rulers. There are other relativistic theories of gravity besides general relativity, and some of these violate this hypothesis.

Given a \(dx^\mu\), how do we find its dual \(dx_\mu\), and vice versa? In one dimension, we simply need to introduce a real number \(g\) as a correction factor. If one of the vectors is shorter than it should be in a certain region, the correction factor serves to compensate by making its dual proportionately longer. The two possible mappings (covariant to contravariant and contravariant to covariant) are accomplished with factors of \(g\) and \(1/g\). The number \(g\) is the metric, and it encodes all the information about distances. For example, if \(\phi\) represents longitude measured at the arctic circle, then the metric is the only source for the datum that a displacement \(d\phi\) corresponds to 2540 km per radian.

Now let's generalize to more than one dimension. Because globally Cartesian coordinate systems can't be imposed on a curved space, the constant-coordinate lines will in general be neither evenly spaced nor perpendicular to one another. If we construct a local set of basis vectors lying along the intersections of the constant-coordinate surfaces, they will not form an orthonormal set. We would like to have an expression of the form \(ds^2=\Sigmadx^\mu dx_\mu\) for the squared arc length, and using the Einstein summation notation this becomes

\[\begin{equation*}
ds^2=dx^\mu dx_\mu .
\end{equation*}\]

For Cartesian coordinates in a Euclidean plane, where one doesn't normally bother with the distinction between covariant and contravariant vectors, this expression for \(ds^2\) is simply the Pythagorean theorem, summed over two values of \(\mu\) for the two coordinates:

\[\begin{equation*}
ds^2 = dx^\mu dx_\mu = dx^2 + dy^2
\end{equation*}\]

The symbols \(dx\), \(ds^0\), \(dx^0\), and \(dx_0\) are all synonyms, and likewise for \(dy\), \(ds^1\), \(dx^1\), and \(dx_1\). (Because notations such as \(ds^1\) force the reader to keep track of which digits have been assigned to which letters, it is better practice to use notation such as \(dy\) or \(ds^y\); the latter notation could in principle be confused with one in which \(y\) was a variable taking on values such as 0 or 1, but in reality we understand it from context, just as we understand that the \(d\)'s in \(dy/dx\) are not referring to some variable \(d\) that stands for a number.)

In the non-Euclidean case, the Pythagorean theorem is false; \(dx^\mu\) and \(dx_\mu\) are no longer synonyms, so their product is no longer simply the square of a distance. To see this more explicitly, let's write the expression so that only the covariant quantities occur. By local flatness, the relationship between the covariant and contravariant vectors is linear, and the most general relationship of this kind is given by making the metric a symmetric matrix \(g_{\mu\nu}\). Substituting \(dx_\mu=g_{\mu\nu}x^\nu\), we have

\[\begin{equation*}
ds^2=g_{\mu\nu} dx^\mu dx^\nu ,
\end{equation*}\]

where there are now implied sums over both \(\mu\) and \(\nu\). Notice how implied sums occur only when the repeated index occurs once as a superscript and once as a subscript; other combinations are ungrammatical.

Self-check: Why does it make sense to demand that the metric be symmetric?

On p. 46 we encountered the distinction among scalars, vectors, and dual
vectors. These are specific examples of *tensors*, which
can be expressed in the birdtracks notation as objects with \(m\) arrows coming in and \(n\) coming out,
or. In index notation, we have \(m\) superscripts and \(n\) subscripts. A scalar has \(m=n=0\). A dual vector has
\((m,n)=(0,1)\), a vector \((1,0)\), and the metric \((0,2)\). We refer to the number of indices as the
rank of the tensor. Tensors are discussed in more detail, and defined
more rigorously, in chapter 4. For our present purposes, it is important to note that just because
we write a symbol with subscripts or superscripts, that doesn't mean it deserves to be called a tensor. This point
can be understood in the more elementary context of Newtonian scalars and vectors. For example, we can define
a Euclidean “vector” \(\mathbf{u}=(m,T,e)\), where \(m\) is the mass of the moon, \(T\) is the temperature in
Chicago, and \(e\) is the charge of the electron. This creature \(\mathbf{u}\) doesn't deserve to be called a vector,
because it doesn't behave as a vector under rotation.
The general philosophy is that a tensor is something that has certain properties under changes of coordinates.
For example, we've already seen on p. 48
the different scaling behavior of tensors with ranks \((1,0)\), \((0,0)\), and \((0,1)\).

When discussing the symmetry of rank-2 tensors, it is convenient to introduce the following notation:

\[\begin{align*}
T_{(ab)} &= \frac{1}{2}\left(T_{ab}+T_{ba}\right) \\
T_{[ab]} &= \frac{1}{2}\left(T_{ab}-T_{ba}\right)
\end{align*}\]

Any \(T_{ab}\) can be split into symmetric and antisymmetric parts. This is similar to writing an arbitrary function as a sum of and odd function and an even function. The metric has only a symmetric part: \(g_{(ab)}=g_{ab}\), and \(g_{[ab]}=0\). This notation is generalized to ranks greater than 2 on page 184.

Self-check: Characterize an antisymmetric rank-2 tensor in two dimensions.

\(\triangleright\) If we change our units of measurement so that \(x^\mu \rightarrow \alpha x^\mu\), while demanding that \(ds^2\) come out the same, then we need \(g_{\mu\nu} \rightarrow \alpha^{-2}g_{\mu\nu}\).

Comparing with p. 48, we deduce the general rule that a tensor of rank \((m,n)\) transforms under scaling by picking up a factor of \(\alpha^{m-n}\).

Notice how in example 7 we started from the generally valid relation \(ds^2=g_{\mu\nu} dx^\mu dx^\nu\), but soon began writing down facts like \(g_{\theta\theta}=r^2\) that were only valid in this particular coordinate system. To make it clear when this is happening, we maintain the distinction between abtract Latin indices and concrete Greek indices introduced on p. 51. For example, we can write the general expression for squared differential arc length with Latin indices,

\[\begin{equation*}
ds^2=g_{ij} dx^i dx^j ,
\end{equation*}\]

because it holds regardless of the coordinate system, whereas the vanishing of the off-diagonal elements of the metric in Euclidean polar coordinates has to be written as \(g_{\mu\nu}=0\) for \(\mu \ne \nu\), since it would in general be false if we used a different coordinate system to describe the same Euclidean plane.

\(\triangleright\) Since the coordinates differ from Cartesian coordinates only in the angle between the axes, not in their scales, a displacement \(dx^i\) along either axis, \(i=1\) or 2, must give \(ds=dx\), so for the diagonal elements we have \(g_{11}=g_{22}=1\). The metric is always symmetric, so \(g_{12}=g_{21}\). To fix these off-diagonal elements, consider a displacement by \(ds\) in the direction perpendicular to axis 1. This changes the coordinates by \(dx^1=-ds \cot\phi\) and \(dx^2 = ds \csc\phi\). We then have

\[\begin{align*}
ds^2 &= g_{ij} dx^i dx^j \\
&= ds^2 (\cot^2\phi+\csc^2\phi-2g_{12}\cos\phi\csc\phi) \\
g_{12} &= \cos\phi .
\end{align*}\]

In one dimension, \(g\) is a single number, and lengths are given by \(ds=\sqrt{g}dx\). The square root can also be understood through example 6 on page 102, in which we saw that a uniform rescaling \(x \rightarrow \alpha x\) is reflected in \(g_{\mu\nu} \rightarrow \alpha^{-2}g_{\mu\nu}\).

In two-dimensional Cartesian coordinates, multiplication of the width and height of a rectangle gives the element of area \(dA=\sqrt{g_{11}g_{22}}dx^1dx^2\). Because the coordinates are orthogonal, \(g\) is diagonal, and the factor of \(\sqrt{g_{11}g_{22}}\) is identified as the square root of its determinant, so \(dA=\sqrt{|g|}dx^1dx^2\). Note that the scales on the two axes are not necessarily the same, \(g_{11}\ne g_{22}\).

The same expression for the element of area holds even if the coordinates are not orthogonal. In example 8, for instance, we have \(\sqrt{|g|}=\sqrt{1-\cos^2\phi}=\sin\phi\), which is the right correction factor corresponding to the fact that \(dx^1\) and \(dx^2\) form a parallelepiped rather than a rectangle.

For coordinates \((\theta,\phi)\) on the surface of a sphere of radius \(r\), we have, by an argument similar to that of example 7 on page 102, \(g_{\theta\theta}=r^2\), \(g_{\phi\phi}=r^2\sin^2\theta\), \(g_{\theta\phi}=0\). The area of the sphere is

\[\begin{align*}
A &= \int dA \\
&= \int \int \sqrt{|g|}d \thetad\phi \\
&= r^2 \int \int \sin\theta d \thetad\phi \\
&= 4\pi r^2
\end{align*}\]

\(\triangleright\) The notation is intended to treat covariant and contravariant vectors completely symmetrically. The metric with lower indices \(g_{ij}\) can be interpreted as a change-of-basis transformation from a contravariant basis to a covariant one, and if the symmetry of the notation is to be maintained, \(g^{ij}\) must be the corresponding inverse matrix, which changes from the covariant basis to the contravariant one. The metric must always be invertible.

In the one-dimensional case, p. 99, the metric at any given point was simply some number \(g\), and we used factors of \(g\) and \(1/g\) to convert back and forth between covariant and contravariant vectors. Example 11 makes it clear how to generalize this to more dimensions:

\[\begin{align*}
x_a &= g_{ab}x^b \\
x^a &= g^{ab}x_b
\end{align*}\]

This is referred to as raising and lowering indices. There is no need to memorize the positions of the indices in these rules; they are the only ones possible based on the grammatical rules, which are that summation only occurs over top-bottom pairs, and upper and lower indices have to match on both sides of the equals sign. This whole system, introduced by Einstein, is called “index-gymnastics” notation.

\[\begin{equation*}
A^a_b = g^{ac}A_{cb}
\end{equation*}\]

and
\[\begin{equation*}
A_{ab} = g_{ac}g_{bd}A^{cd} .
\end{equation*}\]

\[\begin{equation*}
q^a = U^..._...p^b ,
\end{equation*}\]

where we want to figure out the correct placement of the indices on \(U\). Grammatically, the only possible
placement is
\[\begin{equation*}
q^a = U^a_bp^b .
\end{equation*}\]

This shows that the natural way to represent a column-vector-to-column-vector linear operator is
as a rank-2 tensor with one upper index and one lower index.
In birdtracks notation, a rank-2 tensor is something that has two arrows connected to it. Our example becomes \(\rightarrow q = \rightarrow U \rightarrow p\). That the result is itself an upper-index vector is shown by the fact that the right-hand-side taken as a whole has a single external arrow coming into it.

The distinction between vectors and their duals may seem irrelevant if we can always raise and lower indices at will. We can't always do that, however, because in many perfectly ordinary situations there is no metric. See example 6, p. 49.

In a locally Euclidean space, the Pythagorean theorem allows us to express the metric in local Cartesian coordinates in the simple form \(g_{\mu\mu}=+1\), \(g_{\mu\nu}=0\), i.e., \(g=\operatorname{diag}(+1,+1,...,+1)\). This is not the appropriate metric for a locally Lorentz space. The axioms of Euclidean geometry E3 (existence of circles) and E4 (equality of right angles) describe the theory's invariance under rotations, and the Pythagorean theorem is consistent with this, because it gives the same answer for the length of a vector even if its components are reexpressed in a new basis that is rotated with respect to the original one. In a Lorentzian geometry, however, we care about invariance under Lorentz boosts, which do not preserve the quantity \(t^2+x^2\). It is not circles in the \((t,x)\) plane that are invariant, but light cones, and this is described by giving \(g_{tt}\) and \(g_{xx}\) opposite signs and equal absolute values. A lightlike vector \((t,x)\), with \(t=x\), therefore has a magnitude of exactly zero,

\[\begin{equation*}
s^2 = g_{tt}t^2+g_{xx}x^2 = 0 ,
\end{equation*}\]

and this remains true after the Lorentz boost \((t,x) \rightarrow (\gamma t,\gamma x)\). It is a matter of convention which element of the metric to make positive and which to make negative. In this book, I'll use \(g_{tt}=+1\) and \(g_{xx}=-1\), so that \(g=\operatorname{diag}(+1,-1)\). This has the advantage that any line segment representing the timelike world-line of a physical object has a positive squared magnitude; the forward flow of time is represented as a positive number, in keeping with the philosophy that relativity is basically a theory of how causal relationships work. With this sign convention, spacelike vectors have positive squared magnitudes, timelike ones negative. The same convention is followed, for example, by Penrose. The opposite version, with \(g=\operatorname{diag}(-1,+1)\) is used by authors such as Wald and Misner, Thorne, and Wheeler.

Our universe does not have just one spatial dimension, it has three, so the full metric in a Lorentz
frame is given by

\(g=\operatorname{diag}(+1,-1,-1,-1)\).

\[\begin{equation*}
A^a_b = g^{ac}A_{cb} ,
\end{equation*}\]

and substituting \(g\) for \(A\) gives
\[\begin{equation*}
g^a_b = g^{ac}g_{cb} .
\end{equation*}\]

But we already know that \(g^{...}\) is simply the inverse matrix of \(g_{...}\) (example 11,
p. 104), which means that \(g^a_b\) is simply the identity matrix.
That is, whereas a quantity like \(g_{ab}\) or \(g^{ab}\) carries all the information about our
system of measurement at a given point, \(g^a_b\) carries no information at all.
Where \(g_{ab}\) or \(g^{ab}\) can have both positive and negative elements, elements that have units,
and off-diagonal elements, \(g^a_b\) is just a generic symbol carrying no information other than
the dimensionality of the space.
The metric tensor is so commonly used that it is simply left out of birdtrack diagrams. Consistency is maintained because because \(g^a_b\) is the identity matrix, so \(\rightarrow g \rightarrow\) is the same as \(\rightarrow \rightarrow\).

In Euclidean geometry, the dot product of vectors \(\mathbf{a}\) and \(\mathbf{b}\) is given by
\(g_{xx}a_xb_x+g_{yy}a_yb_y+g_{zz}a_zb_z=a_xb_x+a_yb_y+a_zb_z\), and in the special case
where \(\mathbf{a}=\mathbf{b}\) we have the squared magnitude.
In the tensor notation, \(a^\mu b_\nu=a^1b_1+a^2b_2+a^3b_3\).
Like magnitudes, dot products are invariant under rotations. This is because knowing
the dot product of vectors \(\mathbf{a}\) and \(\mathbf{b}\) entails knowing the value of \(\mathbf{a}\cdot\mathbf{b}=|\mathbf{a}||\mathbf{a}|\cos\theta_{\mathbf{a}\mathbf{b}}\),
and Euclid's E4 (equality of right angles) implies that the angle \(\theta_{\mathbf{a}\mathbf{b}}\) is invariant.
the same axioms also entail invariance of dot products under translation; Euclid waits only until the
second proposition of the *Elements* to prove that line segments can be copied from one location
to another. This seeming triviality is actually false as a description of physical
space, because it amounts to a statement that space has the same properties everywhere.

The set of all transformations that can be built out of successive translations, rotations, and
reflections is called the group of isometries. It can also be defined as the
group^{6} that preserves dot products, or the group that preserves congruence of triangles.

In Lorentzian geometry, we usually avoid the Euclidean term dot product and
refer to the corresponding operation by the more general term inner product.
In a specific coordinate system
we have \(a^\mu b_\nu=a^0b_0-a^1b_1-a^2b_2-a^3b_3\). The inner product is invariant under Lorentz
boosts, and also under the Euclidean isometries. The group found by
making all possible combinations of continuous transformations^{7}
from these two sets is called the Poincaré group.
The Poincaré group is not the symmetry group of all of spacetime, since curved spacetime has
different properties in different locations. The equivalence principle tells us, however, that
space can be approximated locally as being flat, so the Poincaré group is locally valid, just as
the Euclidean isometries are locally valid as a description of geometry on the Earth's curved surface.

In Euclidean geometry, the triangle inequality \(|\mathbf{b}+\mathbf{c}|\lt|\mathbf{b}|+|\mathbf{c}|\) follows from

\[\begin{equation*}
(|\mathbf{b}|+|\mathbf{c}|)^2-(\mathbf{b}+\mathbf{c})\cdot(\mathbf{b}+\mathbf{c})=2(|\mathbf{b}||\mathbf{c}|-\mathbf{b}\cdot\mathbf{c}) \ge 0 .
\end{equation*}\]

The reason this quantity always comes out positive is that for two vectors of fixed magnitude, the greatest dot product is always achieved in the case where they lie along the same direction.

In Lorentzian geometry, the situation is different. Let \(\mathbf{b}\) and \(\mathbf{c}\) be timelike vectors, so that
they represent possible world-lines. Then the relation \(\mathbf{a}=\mathbf{b}+\mathbf{c}\) suggests the existence of two
observers who take two different paths from one event to another. A goes by a direct route while B takes
a detour. The magnitude of each timelike vector represents the time elapsed on a clock carried by the
observer moving along that vector. The triangle equality is now reversed, becoming \(|\mathbf{b}+\mathbf{c}|>|\mathbf{b}|+|\mathbf{c}|\).
The difference from the Euclidean case arises because inner products are no longer necessarily maximized if vectors
are in the same direction.
E.g., for two lightlike vectors, \(b^ic_j\) vanishes entirely if \(\mathbf{b}\) and \(\mathbf{c}\)
are parallel.
For timelike vectors, parallelism actually minimizes the inner product rather
than maximizing it.^{5}

In his 1872 inaugural address at the University of Erlangen, Felix Klein used the idea of groups of transformations to lay out a general classification scheme, known as the Erlangen program, for all the different types of geometry. Each geometry is described by the group of transformations, called the principal group, that preserves the truth of geometrical statements. Euclidean geometry's principal group consists of the isometries combined with arbitrary changes of scale, since there is nothing in Euclid's axioms that singles out a particular distance as a unit of measurement. In other words, the principal group consists of the transformations that preserve similarity, not just those that preserve congruence. Affine geometry's principal group is the transformations that preserve parallelism; it includes shear transformations, and there is therefore no invariant notion of angular measure or congruence. Unlike Euclidean and affine geometry, elliptic geometry does not have scale invariance. This is because there is a particular unit of distance that has special status; as we saw in example 3 on page 95, a being living in an elliptic plane can determine, by entirely intrinsic methods, a distance scale \(R\), which we can interpret in the hemispherical model as the radius of the sphere. General relativity breaks this symmetry even more severely. Not only is there a scale associated with curvature, but the scale is different from one point in space to another.

The following example was historically important, because Einstein used it to convince himself that general relativity should be
described by non-Euclidean geometry.^{8} Its interpretation is also fairly subtle, and the early relativists
had some trouble with it.

Suppose that observer A is on a spinning carousel while observer B stands on the ground. B says that A is accelerating, but by the equivalence principle A can say that she is at rest in a gravitational field, while B is free-falling out from under her. B measures the radius and circumference of the carousel, and finds that their ratio is \(2\pi\). A carries out similar measurements, but when she puts her meter-stick in the azimuthal direction it becomes Lorentz-contracted by the factor \(\gamma=(1-\omega^2r^2)^{-1/2}\), so she finds that the ratio is greater than \(2\pi\). In A's coordinates, the spatial geometry is non-Euclidean, and the metric differs from the Euclidean one found in example 7 on page 102.

Observer A feels a force that B considers to be fictitious, but that, by the equivalence principle, A can say is a perfectly real gravitational force. According to A, an observer like B is free-falling away from the center of the disk under the influence of this gravitational field. A also observes that the spatial geometry of the carousel is non-Euclidean. Therefore it seems reasonable to conjecture that gravity can be described by non-Euclidean geometry, rather than as a physical force in the Newtonian sense.

At this point, you know as much about this example as Einstein did in 1912, when he began using it as the seed from which
general relativity sprouted, collaborating with his old schoolmate, mathematician Marcel Grossmann, who knew about
differential geometry.
The remainder of subsection 3.5.4, which you may want to skip on a first reading,
goes into more detail on the interpretation and mathematical description of the rotating frame of reference.
Even more detailed treatments are given by Gr\o{}n^{9}
and Dieks.^{10}.

Self-check: What if we build the disk by assembling the building materials so that they are already rotating properly before they are joined together?

What if we try to get around these problems by applying torque uniformly all over the disk, so that the rotation starts smoothly and simultaneously everywhere? We then run into issues identical to the ones raised by Bell's spaceship paradox (p. 66). In fact, Ehrenfest's paradox is nothing more than Bell's paradox wrapped around into a circle. The same question of time synchronization comes up.

To spell this out mathematically, let's find the metric according to observer A by applying the change of coordinates \(\theta'=\theta-\omega t\). First we take the Euclidean metric of example 7 on page 102 and rewrite it as a (globally) Lorentzian metric in spacetime for observer B,

\[\begin{equation*}
ds^2=dt^2 - dr^2 - r^2d \theta^2 .
\end{equation*}\]

Applying the transformation into A's coordinates, we find

\[\begin{equation*}
ds^2=(1-\omega^2 r^2)dt^2 - dr^2 - r^2d \theta'^2 - 2\omega r^2d\theta'dt .
\end{equation*}\]

Recognizing \(\omega r\) as the velocity of one frame relative to another, and \((1-\omega^2 r^2)^{-1/2}\) as \(\gamma\), we see that we do have a relativistic time dilation effect in the \(dt^2\) term. But the \(dr^2\) and \(d \theta'^2\) terms look Euclidean. Why don't we see any Lorentz contraction of the length scale in the azimuthal direction?

The answer is that coordinates in general relativity are arbitrary, and just because we can write down a certain set of coordinates, that doesn't mean they have any special physical interpretation. The coordinates \((t,r,\theta')\) do not correspond physically to the quantities that A would measure with clocks and meter-sticks. The tip-off is the \(d\theta'dt\) cross-term. Suppose that A sends two cars driving around the circumference of the carousel, one clockwise and one counterclockwise, from the same point. If \((t,r,\theta')\) coordinates corresponded to clock and meter-stick measurements, then we would expect that when the cars met up again on the far side of the disk, their dashboards would show equal values of the arc length \(r\theta'\) on their odometers and equal proper times \(ds\) on their clocks. But this is not the case, because the sign of the \(d\theta'dt\) term is opposite for the two world-lines. The same effect occurs if we send beams of light in both directions around the disk, and this is the Sagnac effect (p. 74).

This is a symptom of the fact that the coordinate \(t\) is not properly synchronized between different places on the disk. We already know that we should not expect to be able to find a universal time coordinate that will match up with every clock, regardless of the clock's state of motion. Suppose we set ourselves a more modest goal. Can we find a universal time coordinate that will match up with every clock, provided that the clock is at rest relative to the rotating disk?

\[\begin{equation*}
ds^2=(1-\omega^2 r^2)\left[dt+\frac{\omega r^2}{1-\omega^2 r^2}d\theta'\right]^2 - dr^2 - \frac{r^2}{1-\omega^2r^2}d \theta'^2 .
\end{equation*}\]

The interpretation of the quantity in square brackets is as follows. Suppose that two observers situate themselves on the
edge of the disk, separated by an infinitesimal angle \(d\theta'\). They then synchronize their clocks by exchanging
light pulses. The time of flight, measured in the lab frame, for each light pulse is the solution of the equation
\(ds^2=0\), and the only difference between the clockwise result \(dt_1\) and the counterclockwise one \(dt_2\) arises from the sign of \(d\theta'\). The quantity in
square brackets is the same in both cases, so the amount by which the clocks must be adjusted is \(dt=(dt_2-dt_1)/2\), or
\[\begin{equation*}
dt = \frac{\omega r^2}{1-\omega^2 r^2}d\theta' .
\end{equation*}\]

Substituting this into the metric, we are left with the purely spatial metric
The factor of
\((1-\omega^2r^2)^{-1}=\gamma^2\) in the \(d \theta'^2\) term is simply the expected Lorentz-contraction factor.
In other words, the circumference is, as expected, greater than \(2\pi r\) by a factor of \(\gamma\).
Does the metric [] represent the same
non-Euclidean spatial geometry that A, rotating with the disk, would determine by meter-stick measurements?
Yes and no. It *can* be interpreted as the one that A would determine by radar measurements.
That is, if A measures a round-trip travel time \(dt\) for a light signal between points separated by
coordinate distances \(dr\) and \(d\theta'\), then A can say that the spatial separation is \(dt/2\),
and such measurements will be described correctly by [].
Physical meter-sticks, however, present some problems. Meter-sticks rotating with the disk are subject to
Coriolis and centrifugal forces, and this problem can't be avoided simply by making the meter-sticks
infinitely rigid, because infinitely rigid objects are forbidden by relativity. In fact, these forces
will inevitably be strong enough to destroy any meter stick that is brought out to \(r=1/\omega\), where
the speed of the disk becomes equal to the speed of light.

It might appear that we could now define a global coordinate

\[\begin{equation*}
T = t + \frac{\omega r^2}{1-\omega^2 r^2}\theta' ,
\end{equation*}\]

interpreted as a time coordinate that was synchronized in a consistent way for all points on the disk. The trouble with this interpretation becomes evident when we imagine driving a car around the circumference of the disk, at a speed slow enough so that there is negligible time dilation of the car's dashboard clock relative to the clocks tied to the disk. Once the car gets back to its original position, \(\theta'\) has increased by \(2\pi\), so it is no longer possible for the car's clock to be synchronized with the clocks tied to the disk. We conclude that it is not possible to synchronize clocks in a rotating frame of reference; if we try to do it, we will inevitably have to have a discontinuity somewhere. This problem is present even locally, as demonstrated by the possibility of measuring the Sagnac effect with apparatus that is small compared to the disk. The only reason we were able to get away with time synchronization in order to establish the metric [] is that all the physical manifestations of the impossibility of synchronization, e.g., the Sagnac effect, are proportional to the area of the region in which synchronization is attempted. Since we were only synchronizing two nearby points, the area enclosed by the light rays was zero.

The system requires synchronization of the atomic clocks carried aboard the satellites, and this synchronization also needs to be extended to the (less accurate) clocks built into the receiver units. It is impossible to carry out such a synchronization globally in the rotating frame in order to create coordinates \((T,r,\theta',\phi)\). If we tried, it would result in discontinuities (see problem 8, p. 120). Instead, the GPS system handles clock synchronization in coordinates \((t,r,\theta',\phi)\), as in equation []. These are known as the Earth-Centered Inertial (ECI) coordinates. The \(t\) coordinate in this system is not the one that users at neighboring points on the earth's surface would establish if they carried out clock synchronization using electromagnetic signals. It is simply the time coordinate of the nonrotating frame of reference tied to the earth's center. Conceptually, we can imagine this time coordinate as one that is established by sending out an electromagnetic “tick-tock” signal from the earth's center, with each satellite correcting the phase of the signal based on the propagation time inferred from its own \(r\). In reality, this is accomplished by communication with a master control station in Colorado Springs, which communicates with the satellites via relays at Kwajalein, Ascension Island, Diego Garcia, and Cape Canaveral.

In addition, we would need to be able to manipulate the rulers in order to place them where we wanted them, and these manipulations would include angular accelerations. If such a thing was possible, then it would also amount to a loophole in the resolution of the Ehrenfest paradox. Could Ehrenfest's rotating disk be accelerated and decelerated with help from external forces, which would keep it from contorting into a potato chip? The problem we run into with such a strategy is one of clock synchronization. When it was time to impart an angular acceleration to the disk, all of the control systems would have to be activated simultaneously. But we have already seen that global clock synchronization cannot be realized for an object with finite area, and therefore there is a logical contradiction in this proposal. This makes it impossible to apply rigid angular acceleration to the disk, but not necessarily the rulers, which could in theory be one-dimensional.

So far we've considered a variety of examples in which the metric is predetermined. This is not the case in general relativity. For example, Einstein published general relativity in 1915, but it was not until 1916 that Schwarzschild found the metric for a spherical, gravitating body such as the sun or the earth.

When masses are present, finding the metric is analogous to finding the electric field made by charges, but the interpretation is more difficult. In the electromagnetic case, the field is found on a preexisting background of space and time. In general relativity, there is no preexisting geometry of spacetime. The metric tells us how to find distances in terms of our coordinates, but the coordinates themselves are completely arbitrary. So what does the metric even mean? This was an issue that caused Einstein great distress and confusion, and at one point, in 1914, it even led him to publish an incorrect, dead-end theory of gravity in which he abandoned coordinate-independence.

With the benefit of hindsight, we can consider these issues in terms of the general description of measurements in relativity given on page 97:

- We can tell whether events and world-lines are incident.
- We can do measurements in local Lorentz frames.

We conclude that in any coordinate-invariant theory, it is impossible to uniquely determine the metric inside such a hole. Einstein initially decided that this was unacceptable, because it showed a lack of determinism; in a classical theory such as general relativity, we ought to be able to predict the evolution of the fields, and it would seem that there is no way to predict the metric inside the hole. He eventually realized that this was an incorrect interpretation. The only type of global observation that general relativity lets us do is measurements of the incidence of world-lines. Relabeling all the points inside the hole doesn't change any of the incidence relations. For example, if two test particles sent into the region collide at a point \(\mathbf{x}\) inside the hole, then changing the point's name to \(\mathbf{x}'\) doesn't change the observable fact that they collided.

Another type of argument that made Einstein suffer is also resolved by a correct understanding of measurements,
this time the use of measurements in local Lorentz frames. The earth is in hydrostatic equilibrium, and its
equator bulges due to its rotation. Suppose that the universe was empty except for two planets, each rotating
about the line connecting their centers.^{13} Since there are no stars or other external points of reference,
the inhabitants of each planet have no external reference points against which to judge their rotation or lack
of rotation. They can only determine their rotation, Einstein said, relative to the other planet. Now suppose
that one planet has an equatorial bulge and the other doesn't. This seems to violate determinism, since there is
no cause that could produce the differing effect. The people on either planet can consider themselves
as rotating and the other planet as stationary, or they can describe the situation the other way around.
Einstein believed that this argument proved that there could be no difference between the sizes of the
two planets' equatorial bulges.

The flaw in Einstein's argument was that measurements in local Lorentz frames do allow one to make a distinction between rotation and a lack of rotation. For example, suppose that scientists on planet A notice that their world has no equatorial bulge, while planet B has one. They send a space probe with a clock to B, let it stay on B's surface for a few years, and then order it to return. When the clock is back in the lab, they compare it with another clock that stayed in the lab on planet A, and they find that less time has elapsed according to the one that spent time on B's surface. They conclude that planet B is rotating more quickly than planet A, and that the motion of B's surface was the cause of the observed time dilation. This resolution of the apparent paradox depends specifically on the Lorentzian form of the local geometry of spacetime; it is not available in, e.g., Cartan's curved-spacetime description of Newtonian gravity (see page 41).

Einstein's original, incorrect use of this example sprang from his interest in the ideas of the physicist and philosopher Ernst Mach. Mach had a somewhat ill-defined idea that since motion is only a well-defined notion when we speak of one object moving relative to another object, the inertia of an object must be caused by the influence of all the other matter in the universe. Einstein referred to this as Mach's principle. Einstein's false starts in constructing general relativity were frequently related to his attempts to make his theory too “Machian.” Section 8.3 on p. 330 discusses an alternative, more Machian theory of gravity proposed by Brans and Dicke in 1951.

This section discusses some of the issues that arise in the interpretation of coordinate independence. It can be skipped on a first reading.

One often hears statements like the following from relativists: “Coordinate independence isn't really a physical principle.
It's merely an obvious statement about the relationship between mathematics and the physical universe. Obviously the universe
doesn't come equipped with coordinates. We impose those coordinates on it, and the way in which we do so can never be dictated
by nature.” The impressionable reader who is tempted to say, “Ah, yes, that *is* obvious,” should consider that it was
far from obvious to Newton (“Absolute, true and mathematical time, of itself, and
from its own nature flows equably without regard to anything external ...”), nor was it obvious to Einstein.
Levi-Civita nudged Einstein in the direction of coordinate independence in 1912. Einstein
tried hard to make a coordinate-independent theory, but for reasons described in section 3.6.1 (p. 114),
he convinced himself that that was a dead end. In 1914-15 he published theories that were not coordinate-independent, which
you will hear relativists describe as “obvious” dead ends because they lack any geometrical interpretation. It seems to me that it takes a highly
refined intuition to regard as intuitively “obvious” an issue that Einstein struggled with like Jacob
wrestling with Elohim.

- Coordinate independence tells us that when we solve problems, we should avoid writing down any equations in notation that isn't manifestly intrinsic, and avoid interpreting those equations as if the coordinates had intrinsic meaning. Violating this advice doesn't guarantee that you've made a mistake, but it makes it much harder to tell whether or not you have.
- Coordinate independence can be used as a criterion for judging whether a particular theory is likely to be successful.

Nobody questions the first justification. The second is a little trickier. Laying out the general
theory systematically in a 1916 paper,^{14} Einstein wrote
“The general laws of nature are to be expressed by equations which
hold good for all the systems of coordinates, that is, are
covariant with respect to any substitutions whatever (generally covariant).”
In other words, he was explaining why, with hindsight, his 1914-1915 coordinate-dependent theory had to be a dead end.

The only trouble with this is that Einstein's way of posing the criterion didn't quite hit the nail on the head mathematically.
As Hilbert famously remarked, “Every boy in the streets of Göttingen understands more
about four-dimensional geometry than Einstein. Yet, in spite of that, Einstein did the work and not the mathematicians.”
What Einstein had in mind was that a theory like Newtonian mechanics not only lacks coordinate independence,
but would also be impossible to put into a coordinate-independent form without making it look hopelessly
complicated and ugly, like putting lipstick on a pig. But Kretschmann showed in 1917 that any theory could be
put in coordinate independent form, and Cartan demonstrated in 1923 that this could be done for Newtonian mechanics
in a way that didn't come out particularly ugly. Physicists today are more apt to pose the distinction in terms
of “background independence” (meaning that a theory should not be phrased in terms of an assumed geometrical
background) or lack of a “prior geometry” (meaning that the curvature of spacetime should come from the solution of field
equations rather than being imposed by fiat).
But these concepts as well have resisted precise mathematical
formulation.^{15}
My feeling is that this general idea of coordinate independence or background independence is like the equivalence principle: a crucial conceptual
principle that doesn't lose its importance just because we can't put it in a mathematical box with a ribbon and a bow.
For example, string theorists take it as a serious criticism of their theory that it is not manifestly background independent,
and one of their goals is to show that it has a background independence that just isn't obvious on the surface.

It is instructive to consider coordinate independence from the point of view of a field theory. Newtonian gravity can be described in three equivalent ways: as a gravitational field \(\mathbf{g}\), as a gravitational potential \(\phi\), or as a set of gravitational field lines. The field lines are never incident on one another, and locally the field satisfies Poisson's equation.

The electromagnetic field has polarization
properties different from those of the gravitational field,
so we describe it using either the two fields \((\mathbf{E},\mathbf{B})\), a pair of potentials,^{16}
or two sets of field
lines. There are similar incidence conditions and local field equations (Maxwell's equations).

Gravitational fields in relativity have polarization properties unknown to Newton, but the situation is qualitatively similar to the two foregoing cases. Now consider the analogy between electromagnetism and relativity. In electromagnetism, it is the fields that are directly observable, so we expect the potentials to have some extrinsic properties. We can, for example, redefine our electrical ground, \(\Phi \rightarrow \Phi+C\), without any observable consequences. As discussed in more detail in section 5.6.1 on page 173, it is even possible to modify the electromagnetic potentials in an entirely arbitrary and nonlinear way that changes from point to point in spacetime. This is called a gauge transformation. In relativity, the gauge transformations are the smooth coordinate transformations. These gauge transformations distort the field lines without making them cut through one another.

**1**.
Consider a spacetime that is locally exactly like the standard Lorentzian spacetime described in ch. 2, but that
has a global structure differing in the following way from the one we have implicitly assumed. This spacetime has global
property G: Let two material particles
have world-lines that coincide at event A, with some nonzero relative velocity; then there may be some event B in the future
light-cone of A at which the particles' world-lines coincide again. This sounds like a description of something that
we would expect to happen in curved spacetime, but let's see whether that is necessary. We want to know whether this violates
the flat-space properties L1-L5 on page 52, if those properties are taken as local.

(a) Demonstrate that it does not violate them, by using a model in which space “wraps around” like a cylinder.

(b) Now consider the possibility of interpreting L1-L5 as *global* statements.
Do spacetimes with property G always violate L3 if L3 is taken globally?
(solution in the pdf version of the book)

**2**.
Usually in relativity we pick units in which \(c=1\). Suppose, however, that we want to use SI units. The
convention is that coordinates are written with upper indices, so that, fixing the usual Cartesian
coordinates in 1+1 dimensions of spacetime, an infinitesimal displacement
between two events is notated \((ds^t,ds^x)\). In SI units,
the two components of this vector have different units, which may seem strange but is perfectly legal.
Describe the form of the metric, including the units of its elements. Describe the lower-index
vector \(ds_a\).
(solution in the pdf version of the book)

**3**.
(a) Explain why the following expressions ain't got good grammar:
\(U_{aa}\), \(x^a y^a\), \(p^a-q_a\). (Recall our notational convention that Latin indices represent
abstract indices, so that it would not make sense, for example, to interpret \(U_{aa}\) as \(U\)'s
\(a\)th diagonal element rather than as an implied sum.)

(b) Which of these could also be nonsense in terms of units?
(solution in the pdf version of the book)

**4**.
Suppose that a mountaineer describes her location using coordinates \((\theta,\phi,h)\),
representing colatitude, longitude, and altitude. Infer the units of the components of
\(ds^a\) and of the elements of \(g_{ab}\) and \(g^{ab}\). Given that the units of mechanical
work should be newton-meters
(cf 5, p. 48), infer
the components of a force vector \(F_a\)
and its upper-index version \(F^a\).
(solution in the pdf version of the book)

**5**.
Generalize figure h/2
on p. 48 to three dimensions.
(solution in the pdf version of the book)

**6**.
Suppose you have a collection of pencils, some of which have been sharpened more times than others so that
they they're shorter. You toss them all on the floor in random orientations, and you're then allowed to slide
them around but not to rotate them. Someone asks you to make up a definition of whether or not a given
set of three pencils “cancels.” If all pencils are treated equally (i.e., order doesn't matter),
and if we respect the rotational invariance of Euclidean geometry, then you will be forced to reinvent
vector addition and define cancellation of pencils \(\mathbf{p}\), \(\mathbf{q}\), and \(\mathbf{r}\) as \(\mathbf{p}+\mathbf{q}+\mathbf{r}=0\).
Do something similar with “pencil” replaced by “an oriented pairs of lines as in figure
h/2
on p. 48.

**7**.
Describe the quantity \(g^a_a\). (Note the repeated index.)
(solution in the pdf version of the book)

**8**.
Example 16 on page 112 discusses the discontinuity that
would result if one attempted to define a time coordinate for the GPS system that was synchronized globally
according to observers in the rotating frame, in the sense that neighboring observers could verify the
synchronization by exchanging electromagnetic signals. Calculate this discontinuity at the equator, and
estimate the resulting error in position that would be experienced by GPS users.
(solution in the pdf version of the book)

**9**.
Resolve the following paradox.

Equation [] on page claims to give the metric obtained by an observer on the surface of a rotating disk. This metric is shown to lead to a non-Euclidean value for the ratio of the circumference of a circle to its radius, so the metric is clearly non-Euclidean. Therefore a local observer should be able to detect violations of the Pythagorean theorem.

And yet this metric was originally derived by a series of changes of coordinates, starting from the Euclidean metric in polar coordinates, as derived in example 7 on page 102. Section 3.4 (p. 95) argued that the intrinsic measurements available in relativity are not capable of detecting an arbitrary smooth, one-to-one change of coordinates. This contradicts our earlier conclusion that there are locally detectable violations of the Pythagorean theorem. (solution in the pdf version of the book)

**10**.
This problem deals with properties of the metric [] on page .
(a) A pulse of collimated light is emitted from the
center of the disk in a certain direction. Does the spatial track of the pulse form a geodesic of this metric?
(b) Characterize the behavior of the geodesics near \(r=1/\omega\).
(c) An observer at rest with respect to the
surface of the disk proposes to verify the non-Euclidean nature of the metric by doing local tests
in which right triangles are formed out of laser beams, and violations of the Pythagorean theorem
are detected. Will this work?
(solution in the pdf version of the book)

**11**.
In the early decades of relativity, many physicists were in the habit of speaking as if the Lorentz
transformation described what an observer would actually “see” optically, e.g., with an eye or a camera.
This is not the case, because there is an additional effect due to optical aberration: observers in different
states of motion disagree about the direction from which a light ray originated. This is analogous to the
situation in which a person driving in a convertible observes raindrops falling from the sky at an angle,
even if an observer on the sidewalk sees them as falling vertically. In 1959, Terrell and Penrose independently
provided correct analyses,^{17}
showing that in reality an object may appear contracted, expanded, or rotated, depending on whether it is approaching the
observer, passing by, or receding.
The case of a sphere is especially interesting. Consider the following four cases:

**A**The sphere is not rotating. The sphere's center is at rest. The observer is moving in a straight line.**B**The sphere is not rotating, but its center is moving in a straight line. The observer is at rest.**C**The sphere is at rest and not rotating. The observer moves around it in a circle whose center coincides with that of the sphere.**D**The sphere is rotating, with its center at rest. The observer is at rest.

Penrose showed that in case A, the outline of the sphere is still seen to be a circle, although regions on the sphere's surface appear distorted.

What can we say about the generalization to cases B, C, and D? (solution in the pdf version of the book)

**12**.
This problem involves a relativistic particle of mass \(m\) which is also a wave, as described by quantum mechanics.
Let \(c=1\) and \(\hbar=1\) throughout. Starting from the de Broglie
relations \(E=\omega\) and \(p=k\), where \(k\) is the wavenumber, find the dispersion
relation connecting \(\omega\) to \(k\). Calculate the group velocity, and verify that
it is consistent with the usual relations \(p=m\gamma v\) and \(E=m\gamma\) for \(m>0\).
What goes wrong if you instead try to associate \(v\) with the phase velocity?
(solution in the pdf version of the book)

(c) 1998-2013 Benjamin Crowell, licensed under the Creative Commons Attribution-ShareAlike license. Photo credits are given at the end of the Adobe Acrobat version.

[1] More on this topic is available in, for example, Keisler's *Elementary Calculus: An Infinitesimal Approach*,
Stroyan's *A Brief Introduction to Infinitesimal Calculus*, or my own *Calculus*, all of which are
available for free online.

[3] The term “elliptic” may be somewhat misleading here. The model is still constructed from
a sphere, not an ellipsoid.

[4] Einstein referred to incidence measurements as “determinations of space-time
coincidences.” For his presentation of this idea, see p. 376.

[5] Proof: Let \(\mathbf{b}\) and \(\mathbf{c}\) be parallel and timelike, and directed
forward in time. Adopt a frame of reference in which
every spatial component of each vector vanishes.
This entails no loss of generality, since inner products are
invariant under such a transformation.
Since the time-ordering is also preserved under transformations in the Poincaré group, each is still directed forward in time,
not backward.
Now let \(\mathbf{b}\) and \(\mathbf{c}\) be pulled away
from parallelism, like opening a pair of scissors in the \(x-t\) plane. This reduces \(b_tc_t\), while causing
\(b_xc_x\) to become negative. Both effects increase the inner product.

[6] In mathematics, a group is defined
as a binary operation that has an identity, inverses, and associativity. For example, addition of
integers is a group. In the present context, the members of the group are not numbers but the
transformations applied to the Euclidean plane. The group operation on transformations \(T_1\) and
\(T_2\) consists of finding the transformation that results from doing one and then the other, i.e.,
composition of functions.

[7] The discontinuous
transformations of spatial reflection and time reversal are not included in the definition of the
Poincaré group, although they do preserve inner products. General relativity has symmetry
under spatial reflection (called P for parity), time reversal (T), and charge inversion (C), but the standard
model of particle physics is only invariant under the composition of all three, CPT, not under any
of these symmetries individually.

[8] The example is described in Einstein's paper “The Foundation of the General Theory of Relativity.”
An excerpt, which includes the example, is given on p. 372.

[10] Space, Time, and Coordinates in a Rotating World, http://www.phys.uu.nl/igg/dieks

[11] P. Ehrenfest, Gleichförmige Rotation starrer Körper und Relativitätstheorie, Z. Phys. 10 (1909) 918,
available in English translation at en.wikisource.org.

[13] The example is described in Einstein's paper “The Foundation of the General Theory of Relativity.”
An excerpt, which includes the example, is given on p. 372.

[15] Giulini, “Some remarks on the notions of general covariance and background independence,” arxiv.org/abs/gr-qc/0603087v1

[16] There is the familiar
electrical potential \(\phi\), measured in volts, but also a vector potential \(\mathbf{A}\), which you may or may not have encountered.
Briefly, the electric field is given not by \(-\nabla\phi\) but by \(-\nabla\phi-\partial\mathbf{A}/\partial t\),
while the magnetic field is the curl of \(\mathbf{A}\). This is introduced at greater
length in section 4.2.5 on page 137.