Gauge theory” is a term which has connotations of being a fearsomely complicated part of mathematics – for instance, playing an important role in quantum field theory, general relativity, geometric PDE, and so forth.  But the underlying concept is really quite simple: a gauge is nothing more than a “coordinate system” that varies depending on one’s “location” with respect to some “base space” or “parameter space”, a gauge transform is a change of coordinates applied to each such location, and a gauge theory is a model for some physical or mathematical system to which gauge transforms can be applied (and is typically gauge invariant, in that all physically meaningful quantities are left unchanged (or transform naturally) under gauge transformations).  By fixing a gauge (thus breaking or spending the gauge symmetry), the model becomes something easier to analyse mathematically, such as a system of partial differential equations (in classical gauge theories) or a perturbative quantum field theory (in quantum gauge theories), though the tractability of the resulting problem can be heavily dependent on the choice of gauge that one fixed.  Deciding exactly how to fix a gauge (or whether one should spend the gauge symmetry at all) is a key question in the analysis of gauge theories, and one that often requires the input of geometric ideas and intuition into that analysis.

I was asked recently to explain what a gauge theory was, and so I will try to do so in this post.  For simplicity, I will focus exclusively on classical gauge theories; quantum gauge theories are the quantization of classical gauge theories and have their own set of conceptual difficulties (coming from quantum field theory) that I will not discuss here. While gauge theories originated from physics, I will not discuss the physical significance of these theories much here, instead focusing just on their mathematical aspects.  My discussion will be informal, as I want to try to convey the geometric intuition rather than the rigorous formalism (which can, of course, be found in any graduate text on differential geometry).

— Coordinate systems —

Before I discuss gauges, I first review the more familiar concept of a coordinate system, which is basically the special case of a gauge when the base space (or parameter space) is trivial.

Classical mathematics, such as practised by the ancient Greeks, could be loosely divided into two disciplines, geometry and number theory, where I use the latter term very broadly, to encompass all sorts of mathematics dealing with any sort of number.  The two disciplines are unified by the concept of a coordinate system, which allows one to convert geometric objects to numeric ones or vice versa.  The most well known example of a coordinate system is the Cartesian coordinate system for the plane (or more generally for a Euclidean space), but this is just one example of many such systems.  For instance:

  1. One can convert a length (of, say, an interval) into an (unsigned) real number, or vice versa, once one fixes a unit of length (e.g. the metre or the foot).  In this case, the coordinate system is specified by the choice of length unit.
  2. One can convert a displacement along a line into a (signed) real number, or vice versa, once one fixes a unit of length and an orientation along that line.  In this case, the coordinate system is specified by the length unit together with the choice of orientation.  Alternatively, one can replace the unit of length and the orientation by a unit displacement vector e along the line.
  3. One can convert a position (i.e. a point) on a line into a real number, or vice versa, once one fixes a unit of length, an orientation along the line, and an origin on that line.  Equivalently, one can pick an origin O and a unit displacement vector e.  This coordinate system essentially identifies the original line with the standard real line {\Bbb R}.
  4. One can generalise these systems to higher dimensions.  For instance, one can convert a displacement along a plane into a vector in {\Bbb R}^2, or vice versa, once one fixes two linearly independent displacement vectors e_1, e_2 (i.e. a basis) to span that plane; the Cartesian coordinate system is just one special case of this general scheme.  Similarly, one can convert a position on a plane to a vector in {\Bbb R}^2 once one picks a basis e_1, e_2 for that plane as well as an origin O, thus identifying that plane with the standard Euclidean plane {\Bbb R}^2.  (To put it another way, units of measurement are nothing more than one-dimensional (i.e. scalar) coordinate systems.)
  5. To convert an angle in a plane to a signed number (modulo multiples of 2\pi), or vice versa, one needs to pick an orientation on the plane (e.g. to decide that anti-clockwise angles are positive).
  6. To convert a direction in a plane to a signed number (again modulo multiples of 2\pi), or vice versa, one needs to pick an orientation on the plane, as well as a reference direction (e.g. true or magnetic north is often used in the case of ocean navigation).
  7. Similarly, to convert a position on a circle to a number (modulo multiples of 2\pi), or vice versa, one needs to pick an orientation on that circle, together with an origin on that circle.  Such a coordinate system then equates the original circle to the standard unit circle S^1 := \{ z \in {\Bbb C}: |z| = 1 \} (with the standard origin +1 and the standard anticlockwise orientation \circlearrowleft).
  8. To convert a position on a two-dimensional sphere (e.g. the surface of the Earth, as a first approximation) to a point on the standard unit sphere S^2 := \{ (x,y,z) \in {\Bbb R}^3: x^2+y^2+z^2 = 1 \}, one can pick an orientation on that sphere, an “origin” (or “north pole”) for that sphere, and a “prime meridian” connecting the north pole to its antipode.  Alternatively, one can view this coordinate system as determining a pair of Euler angles \phi, \lambda (or a latitude and longitude) to be assigned to every point on one’s original sphere.
  9. The above examples were all geometric in nature, but one can also consider “combinatorial” coordinate systems, which allow one to identify combinatorial objects with numerical ones.  An extremely familiar example of this is enumeration: one can identify a set A of (say) five elements with the numbers 1,2,3,4,5 simply by choosing an enumeration a_1, a_2, \ldots, a_5 of the set A.  One can similarly enumerate other combinatorial objects (e.g. graphs, relations, trees, partial orders, etc.), and indeed this is done all the time in combinatorics.  Similarly for algebraic objects, such as cosets of a subgroup H (or more generally, torsors of a group G); one can identify such a coset with H itself by designating an element of that coset to be the “identity” or “origin”.

More generally, a coordinate system \Phi can be viewed as an isomorphism \Phi: A \to G between a given geometric (or combinatorial) object A in some class (e.g. a circle), and a standard object G in that class (e.g. the standard unit circle).  (To be pedantic, this is what a global coordinate system is; a local coordinate system, such as the coordinate charts on a manifold, is an isomorphism between a local piece of a geometric or combinatorial object in a class, and a local piece of a standard object in that class.  I will restrict attention to global coordinate systems for this discussion.)

Coordinate systems identify geometric or combinatorial objects with numerical (or standard) ones, but in many cases, there is no natural (or canonical) choice of this identification; instead, one may be faced with a variety of coordinate systems, all equally valid.  One can of course just fix one such system once and for all, in which case there is no real harm in thinking of the geometric and numeric objects as being equivalent.  If however one plans to change from one system to the next (or to avoid using such systems altogether), then it becomes important to carefully distinguish these two types of objects, to avoid confusion.  For instance, if an interval AB is measured to have a length of 3 yards, then it is OK to write |AB|=3 (identifying the geometric concept of length with the numeric concept of a positive real number) so long as you plan to stick to having the yard as the unit of length for the rest of one’s analysis.  But if one was also planning to use, say, feet, as a unit of length also, then to avoid confusing statements such as “|AB|=3 and |AB|=9“,  one should specify the coordinate systems explicitly, e.g. “|AB| = 3 \hbox{ yards} and |AB| = 9 \hbox{ feet}“.  Similarly, identifying a point P in a plane with its coordinates (e.g. P = (4,3)) is safe as long as one intends to only use a single coordinate system throughout; but if one intends to change coordinates at some point (or to switch to a coordinate-free perspective) then one should be more careful, e.g. writing P = 4 e_1 + 3 e_2, or even P = O + 4 e_1 + 3 e_2, if the origin O and basis vectors e_1, e_2 of one’s coordinate systems might be subject to future change.

As mentioned above, it is possible to in many cases to dispense with coordinates altogether.  For instance, one can view the length |AB| of a line segment AB not as a number (which requires one to select a unit of length), but more abstractly as the equivalence class of all line segments CD that are congruent to AB.  With this perspective, |AB| no longer lies in the standard semigroup {\Bbb R}^+, but in a more abstract semigroup {\mathcal L} (the space of line segments quotiented by congruence), with addition now defined geometrically (by concatenation of intervals) rather than numerically.  A unit of length can now be viewed as just one of many different isomorphisms \Phi: {\mathcal L} \to {\Bbb R}^+ between {\mathcal L} and {\Bbb R}^+, but one can abandon the use of such units and just work with {\mathcal L} directly.  Many statements in Euclidean geometry involving length can be phrased in this manner.  For instance, if B lies in AC, then the statement |AC|=|AB|+|BC| can be stated in {\mathcal L}, and does not require any units to convert {\mathcal L} to {\mathcal R}^+; with a bit more work, one can also make sense of such statements as |AC|^2 = |AB|^2 + |BC|^2 for a right-angled triangle ABC (i.e. Pythagoras’ theorem) while avoiding units, by defining a symmetric bilinear product operation \times: {\mathcal L} \times {\mathcal L} \to {\mathcal A} from the abstract semigroup {\mathcal L} of lengths to the abstract semigroup {\mathcal A} of areas.  (Indeed, this is basically how the ancient Greeks, who did not quite possess the modern real number system {\Bbb R}, viewed geometry, though of course without the assistance of such modern terminology as “semigroup” or “bilinear”.)

The above abstract coordinate-free perspective is equivalent to a more concrete coordinate-invariant perspective, in which we do allow the use of coordinates to convert all geometric quantities to numeric ones, but insist that every statement that we write down is invariant under changes of coordinates.  For instance, if we shrink our chosen unit of length by a factor \lambda > 0, then the numerical length of every interval increases by a factor of \lambda, e.g. |AB| \mapsto \lambda |AB|.  The coordinate-invariant approach to length measurement then treats lengths such as |AB| as numbers, but requires all statements involving such lengths to be invariant under the above scaling symmetry.  For instance, a statement such as |AC|^2 = |AB|^2 + |BC|^2 is legitimate under this perspective, but a statement such as |AB| = |BC|^2 or |AB| = 3 is not.  [In other words, co-ordinate invariance here is the same thing as being dimensionally consistent.  Indeed, dimensional analysis is nothing more than the analysis of the scaling symmetries in one’s coordinate systems.]  One can retain this coordinate-invariance symmetry throughout one’s arguments; or one can, at some point, choose to spend (or break) this coordinate invariance by selecting (or fixing) the coordinate system (which, in this case, means selecting a unit length).  The advantage in spending such a symmetry is that one can often normalise one or more quantities to equal a particularly nice value; for instance, if a length |AB| is appearing everywhere in one’s arguments, and one has carefully retained coordinate-invariance up until some key point, then it can be convenient to spend this invariance to normalise |AB| to equal 1.  (In this case, one only has a one-dimensional family of symmetries, and so can only normalise one quantity at a time; but when one’s symmetry group is larger, one can often normalise many more quantities at once; as a rule of thumb, one can normalise one quantity for each degree of freedom in the symmetry group.)  Conversely, if one has already spent the coordinate invariance, one can often buy it back by converting all the facts, hypotheses, and desired conclusions one currently possesses in the situation back to a coordinate-invariant formulation.  Thus one could imagine performing one normalisation to do one set of calculations, then undoing that normalisation to return to a coordinate-free perspective, doing some coordinate-free manipulations, and then performing a different normalisation to work on another part of the problem, and so forth.  (For instance, in Euclidean geometry problems, it is often convenient to temporarily assign one key point to be the origin (thus spending translation invariance symmetry), then another, then switch back to a translation-invariant perspective, and so forth.  As long as one is correctly accounting for what symmetries are being spent and bought at any given time, this can be a very powerful way of simplifying one’s calculations.)

Given a coordinate system \Phi: A \to G that identifies some geometric object A with a standard object G, and some isomorphism \Psi: G \to G of that standard object, we can obtain a new coordinate system \Psi \circ \Phi: A \to G of A by composing the two isomorphisms.  [I will be vague on what “isomorphism” means; one can formalise the concept using the language of category theory.] Conversely, every other coordinate system \Phi': A \to G of A arises in this manner.  Thus, the space of coordinate systems on A is (non-canonically) identifiable with the isomorphism group \hbox{Isom}(G) of G.  This isomorphism group is called the structure group (or gauge group) of the class of geometric objects.  For example, the structure group for lengths is {\Bbb R}^+; the structure group for angles is {\Bbb Z}/2{\Bbb Z}; the structure group for lines is the affine group \hbox{Aff}({\Bbb R}); the structure group for n-dimensional Euclidean geometry is the Euclidean group E(n); the structure group for (oriented) 2-spheres is the (special) orthogonal group SO(3); and so forth.  (Indeed, one can basically describe each of the classical geometries (Euclidean, affine, projective, spherical, hyperbolic, Minkowski, etc.) as a homogeneous space for its structure group, as per the Erlangen program.)

— Gauges —

In our discussion of coordinate systems, we focused on a single geometric (or combinatorial) object A: a single line, a single circle, a single set, etc.  We then used a single coordinate system to identify that object with a standard representative of such an object.

Now let us consider the more general situation in which one has a family (or fibre bundle) (A_x)_{x \in X} of geometric (or combinatorial) objects (or fibres) A_x: a family of lines (i.e. a line bundle), a family of circles (i.e. a circle bundle), a family of sets, etc.  This family is parameterised by some parameter set or base point x, which ranges in some parameter space or base space X.  In many cases one also requires some topological or differentiable compatibility between the various fibres; for instance, continuous (or smooth) variations of the base point should lead to continuous (or smooth) variations in the fibre.  For sake of discussion, however, let us gloss over these compatibility conditions.

In many cases, each individual fibre A_x in a bundle (A_x)_{x \in X}, being a geometric object of a certain class, can be identified with a standard object G in that class, by means of a separate coordinate system \Phi_x: A_x \to G for each base point x.  The entire collection \Phi = (\Phi_x)_{x \in X} is then referred to as a (global) gauge or trivialisation for this bundle (provided that it is compatible with whatever topological or differentiable structures one has placed on the bundle, but never mind that for now).  Equivalently, a gauge is a bundle isomorphism \Phi from the original bundle (A_x)_{x \in X} to the trivial bundle (G)_{x \in X}, in which every fibre is the standard geometric object G.  (There are also local gauges, which only trivialise a portion of the bundle, but let’s ignore this distinction for now.)

Let’s give three concrete examples of bundles and gauges; one from differential geometry, one from dynamical systems, and one from combinatorics.

Example 1: the circle bundle of the sphere. Recall from the previous section that the space of directions in a plane (which can be viewed as the circle of unit vectors) can be identified with the standard circle S^1 after picking an orientation and a reference direction.  Now let us work not on the plane, but on a sphere, and specifically, on the surface X of the earth.  At each point x on this surface, there is a circle S_x of directions that one can travel along the sphere from x; the collection SX := (S_x)_{x \in X} of all such circles is then a circle bundle with base space X (known as the circle bundle; it could also be viewed as the sphere bundle, cosphere bundle, or orthonormal frame bundle of X). The structure group of this bundle is the circle group U(1) \equiv S^1 if one preserves orientation, or the semi-direct product S^1 \rtimes {\Bbb Z}/2{\Bbb Z} otherwise.

Now suppose, at every point x on the earth X, the wind is blowing in some direction w_x \in S_x.  (This is not actually possible globally, thanks to the hairy ball theorem, but let’s ignore this technicality for now.)  Thus wind direction can be thought of as a collection w = (w_x)_{x \in X} of representatives from the fibres of the fibre bundle (S_x)_{x \in X}; such a collection is known as a section of the fibre bundle (it is to bundles as the concept of a graph \{ (x, f(x)): x \in X \} \subset X \times G of a function f: X \to G is to the trivial bundle (G)_{x \in X}).

At present, this section has not been represented in terms of numbers; instead, the wind direction w (w_x)_{x \in X} is a collection of points on various different circles in the circle bundle SX.  But one can convert this section w into a collection of numbers (and more specifically, a function u: X \to S^1 from X to S^1) by choosing a gauge for this circle bundle – in other words, by selecting an orientation \epsilon_x and a reference direction N_x for each point x on the surface of the Earth X.  For instance, one can pick the anticlockwise orientation \circlearrowleft and true north for every point x (ignore for now the problem that this is not defined at the north and south poles, and so is merely a local gauge rather than a global one), and then each wind direction w_x can now be identified with a unit complex number u(x) \in S^1 (e.g. e^{i\pi/4} if the wind is blowing in the northwest direction at x).  Now that one has a numerical function u to play with, rather than a geometric object w, one can now use analytical tools (e.g. differentiation, integration, Fourier transforms, etc.) to analyse the wind direction if one desires.  But one should be aware that this function reflects the choice of gauge as well as the original object of study.  If one changes the gauge (e.g. by using magnetic north instead of true north), then the function u changes, even though the wind direction w is still the same.  If one does not want to spend the U(1) gauge symmetry, one would have to take care that all operations one performs on these functions are gauge-invariant; unfortunately, this restrictive requirement eliminates wide swathes of analytic tools (in particular, integration and the Fourier transform) and so one is often forced to break the gauge symmetry in order to use analysis.  The challenge is then to select the gauge that maximises the effectiveness of analytic methods.  \diamond

Example 2: circle extensions of a dynamical system. Recall (see e.g. my lecture notes) that a dynamical system is a pair X = (X,T), where X is a space and T: X \to X is an invertible map.  (One can also place additional topological or measure-theoretic structures on this system, as is done in those notes, but we will ignore these structures for this discussion.)  Given such a system, and given a cocycle \rho: X \to S^1 (which, in this context, is simply a function from X to the unit circle), we can define the skew product X \times_\rho S^1 of X and the unit circle S^1, twisted by the cocycle \rho, to be the Cartesian product X \times S^1 := \{ (x,u): x \in X, u \in S^1 \} with the shift \tilde T: (x,u) \mapsto (Tx, \rho(x) u); this is easily seen to be another dynamical system.  (If one wishes to have a topological or measure-theoretic dynamical system, then \rho will have to be continuous or measurable here, but let us ignore such issues for this discussion.)  Observe that there is a free action (S_v: (x,u) \mapsto (x,vu))_{v \in S^1} of the circle group S^1 on the skew product X \times_\rho S^1 that commutes with the shift \tilde T; the quotient space (X \times_\rho S^1)/S^1 of this action is isomorphic to X, thus leading to a factor map \pi: X \times_\rho S^1 \to X, which is of course just the projection map \pi: (x,u) \mapsto x.  (An example is provided by the skew shift system, described in my lecture notes.)

Conversely, suppose that one had a dynamical system \tilde X = (\tilde X, \tilde T) which had a free S^1 action (S_v: \tilde X \to \tilde X)_{v \in S^1} commuting with the shift \tilde T.  If we set X := \tilde X/S^1 to be the quotient space, we thus have a factor map \pi: \tilde X \to X, whose level sets \pi^{-1}(\{x\}) are all isomorphic to the circle S^1; we call \tilde X a circle extension of the dynamical system X.  We can thus view \tilde X as a circle bundle (\pi^{-1}(\{x\}))_{x \in X} with base space X, thus the level sets \pi^{-1}(\{x\}) are now the fibres of the bundle, and the structure group is S^1.  If one picks a gauge for this bundle, by choosing a reference point p_x \in \pi^{-1}(\{x\}) in the fibre for each base point x (thus in this context a gauge is the same thing as a section p = (p_x)_{x \in X}; this is basically because this bundle is a principal bundle), then one can identify \tilde X with a skew product X \times_\rho S^1 by identifying the point S_v p_x \in \tilde X with the point (x,v) \in X \times_\rho S^1 for all x \in X, v \in S^1, and letting \rho be the cocycle defined by the formula

S_{\rho(x)} p_{Tx} = \tilde T p_x.

One can check that this is indeed an isomorphism of dynamical systems; if all the various objects here are continuous (resp. measurable), then one also has an isomorphism of topological dynamical systems (resp. measure-preserving systems).  Thus we see that gauges allow us to write circle extensions as skew products.  However, more than one gauge is available for any given circle extension; two gauges (p_x)_{x \in X}, (p'_x)_{x \in X} will give rise to two skew products X \times_\rho S^1, X \times_{\rho'} S^1 which are isomorphic but not identical.  Indeed, if we let v: X \to S^1 be a rotation map that sends p_x to p'_{x}, thus p'_{x} = S_{v(x)} p_x, then we see that the two cocycles \rho' and \rho are related by the formula

\rho'(x) = v(Tx)^{-1} \rho(x) v(x).  (1)

Two cocycles that obey the above relation are called cohomologous; their skew products are isomorphic to each other.  An important general question in dynamical systems is to understand when two given cocycles are in fact cohomologous, for instance by introducing non-trivial cohomological invariants for such cocycles.

As an example of a circle extension, consider the sphere X = S^2 from Example 1, with a rotation shift T given by, say, rotating anti-clockwise by some given angle \alpha around the axis connecting the north and south poles.  This rotation also induces a rotation on the circle bundle \tilde X := SX, thus giving a circle extension of the original system (X,T).  One can then use a gauge to write this system as a skew product.  For instance, if one selects the gauge that chooses p_x to be the true north direction at each point x (ignoring for now the fact that this is not defined at the two poles), then this system becomes the ordinary product X \times_0 S^1 of the original system X with the circle S^1, with the cocycle being the trivial cocycle 0.  If we were however to use a different gauge, e.g. magnetic north instead of true north, one would obtain a different skew-product X \times_{\rho'} S^1, where \rho' is some cocycle which is cohomologous to the trivial cocycle (except at the poles).  (A cocycle which is globally cohomologous to the trivial cocycle is known as a coboundary.  Not every cocycle is a coboundary, especially once one imposes topological or measure-theoretic structure, thanks to the presence of various topological or measure-theoretic invariants, such as degree.)

There was nothing terribly special about circles in this example; one can also define group extensions, or more generally homogeneous space extensions, of dynamical systems, and have a similar theory, although one has to take a little care with the order of operations when the structure group is non-abelian; see e.g. my lecture notes on isometric extensions. \diamond

Example 3: Orienting an undirected graph. The language of gauge theory is not often used in combinatorics, but nevertheless combinatorics does provide some simple discrete examples of bundles and gauges which can be useful in getting an intuitive grasp of the concept.  Consider for instance an undirected graph G = (V,E) of vertices and edges.  I will let X=E denote the space of edges (not the space of vertices)!.  Every edge e \in X can be oriented (or directed) in two different ways; let A_e be the pair of directed edges of e arising in this manner.  Then (A_e)_{e \in X} is a fibre bundle with base space X and with each fibre isomorphic (in the category of sets) to the standard two-element set \{-1,+1\}, with structure group {\Bbb Z}/2{\Bbb Z}.

A priori, there is no reason to prefer one orientation of an edge e over another, and so there is no canonical way to identify each fibre A_e with the standard set \{-1,+1\}.  Nevertheless, we can go ahead and arbitrary select a gauge for X by orienting the graph G.  This orientation assigns an oriented edge \vec e \in A_e to each edge e \in X, thus creating a gauge (or section) (\vec e)_{e \in X} of the bundle (A_e)_{e \in X}.  Once one selects such a gauge, we can now identify the fibre bundle (A_e)_{e \in X} with the trivial bundle X \times \{-1,+1\} by identifying the preferred oriented edge \vec e of each unoriented edge e \in X with (e,+1), and the other oriented edge with (e,-1).  In particular, any other orientation of the graph G can be expressed relative to this reference orientation as a function f: X \to \{-1,+1\}, which measures when the two orientations agree or disagree with each other. \diamond

Recall that every isomorphism \Psi \in \hbox{Isom}(G) of a standard geometric object G allowed one to transform a coordinate system \Phi: A \to G on a geometric object A to another coordinate system \Psi \circ \Phi: A \to G.  We can generalise this observation to gauges: every family \Psi = (\Psi_x)_{x \in X} of isomorphisms on G allows one to transform a gauge (\Phi_x)_{x \in X} to another gauge (\Psi_x \circ \Phi_x)_{x \in X} (again assuming that \Psi respects whatever topological or differentiable structure is present).  Such a collection \Psi is known as a gauge transformation.  For instance, in Example 1, one could rotate the reference direction N_x at each point x \in X anti-clockwise by some angle \theta(x); this would cause the function u(x) to rotate to u(x) e^{-i\theta(x)}.   In Example 2, a gauge transformation is just a map v: X \to S^1 (which may need to be continuous or measurable, depending on the structures one places on X); it rotates a point (x,u) \in X \times_\rho S^1 to (x, v^{-1} u), and it also transforms the cocycle \rho by the formula (1).  In Example 3, a gauge transformation would be a map v: X \to \{-1,+1\}; it rotates a point (x, \epsilon) \in X \times \{-1,+1\} to (x, v(x) \epsilon).

Gauge transformations transform functions on the base X in many ways, but some things remain gauge-invariant.  For instance, in Example 1, the winding number of a function u: X \to S^1 along a closed loop \gamma \subset X would not change under a gauge transformation (as long as no singularities in the gauge are created, moved, or destroyed, and the orientation is not reversed).  But such topological gauge-invariants are not the only gauge invariants of interest; there are important differential gauge-invariants which make gauge theory a crucial component of modern differential geometry and geometric PDE.  But to describe these, one needs an additional gauge-theoretic concept, namely that of a connection on a fibre bundle.

— Connections —

There are many essentially equivalent ways to introduce the concept of a connection; I will use the formulation based primarily on parallel transport, and on differentiation of sections.  To avoid some technical details I will work (somewhat non-rigorously) with infinitesimals such as dx.  (There are ways to make the use of infinitesimals rigorous, such as non-standard analysis, but this is not the focus of my post today.)

In single variable calculus, we learn that if we want to differentiate a function f: [a,b] \to {\Bbb R} at some point x, then we need to compare the value f(x) of f at x with its value f(x+dx) at some infinitesimally close point x+dx, take the difference f(x+dx)-f(x), and then divide by dx, taking limits as dx \to 0, if one does not like to use infinitesimals:

\displaystyle \nabla f(x) := \lim_{dx \to 0} \frac{f(x+dx) - f(x)}{dx}.

In several variable calculus, we learn several generalisations of this concept in which the domain and range of f to be multi-dimensional.  For instance, if f: X \to {\Bbb R}^d is now a vector-valued function on some multi-dimensional domain (e.g. a manifold) X, and v is a tangent vector to X at some point x, we can define the directional derivative \nabla_v f(x) of f at x by comparing f(x+v dt) with f(x) for some infinitesimal dt, take the difference f(x+vdt) - f(x), divide by dt, and then take limits as dt \to 0:

\displaystyle \nabla_v f(x) := \lim_{dt \to 0} \frac{f(x+vdt) - f(x)}{dt}.

[Strictly speaking, if X is not flat, then x+vdt is only defined up to an ambiguity of o(dt), but let us ignore this minor issue here, as it is not important in the limit.]  If f is sufficiently smooth (being continuously differentiable will do), the directional derivative is linear in v, thus for instance \nabla_{v+v'} f(x) = \nabla_v f(x) + \nabla_{v'} f(x). One can also generalise the range of f to other multi-dimensional domains than {\Bbb R}^d; the directional derivative then lives in a tangent space of that domain.

In all of the above examples, though, we were differentiating functions f:X \to Y, thus each element x \in X in the base (or domain) gets mapped to an element f(x) in the same range Y.  However, in many geometrical situations we would like to differentiate sections f = (f_x)_{x \in X} instead of functions, thus f now maps each point x \in X in the base to an element f_x \in A_x of some fibre in a fibre bundle (A_x)_{x \in X}.  For instance, one might want to know how the wind direction w = (w_x)_{x \in X} changes as one moves x in some direction v; thus computing a directional derivative \nabla_v w(x) of w at x in direction v.  One can try to mimic the previous definitions in order to define this directional derivative.  For instance, one can move x along v by some infinitesimal amount dt, creating a nearby point x+v dt, and then evaluate w at this point to obtain w(x+vdt).  But here we hit a snag: we cannot directly compare w(x+vdt) with w(x), because the former lives in the fibre A_{x+vdt} while the latter lives in the fibre A_x.

With a gauge, of course, we can identify all the fibres (and in particular, A_{x+vdt} and A_x) with a common object G, in which case there is no difficulty comparing w(x+vdt) with w(x).  But this would lead to a notion of derivative which is not gauge-invariant, known as the non-covariant or ordinary derivative in physics.

But there is another way to take a derivative, which does not require the full strength of a gauge (which identifies all fibres simultaneously together).  Indeed, in order to compute a derivative \nabla_v w(x), one only needs to identify (or connect) two infinitesimally close fibres together: A_x and A_{x+vdt}.  In practice, these two fibres are already “within O(dt) of each other” in some sense, but suppose in fact that we have some means \Gamma(x \to x+vdt): A_x \to A_{x+vdt} of identifying these two fibres together.  Then, we can pull back w(x+vdt) from A_{x+vdt} to A_x through \Gamma(x \to x+vdt) to define the covariant derivative:

\displaystyle \nabla_v w(x) := \lim_{dt \to 0} \frac{\Gamma(x \to x+vdt)^{-1}( w(x+vdt) ) - w(x) }{dt}.

In order to retain the basic property that \nabla_v w is linear in v, and to allow one to extend the infinitesimal identifications \Gamma(x \to x+dx) to non-infinitesimal identifications, we impose the property that the \Gamma(x \to x+dx) to be approximately transitive in that

\Gamma(x+dx \to x+dx+dx') \circ \Gamma(x \to x + dx ) \approx \Gamma(x \to x+dx+dx') (1)

for all x, dx, dx’, where the \approx symbol indicates that the error between the two sides is o(|dx| + |dx’|).  [The precise nature of this error is actually rather important, being essentially the curvature of the connection \Gamma at x in the directions dx, dx', but let us ignore this for now.]  To oversimplify a little bit, any collection \Gamma of infinitesimal maps \Gamma(x \to x+dx) obeying this property (and some technical regularity properties) is a connection.

[There are many other important ways to view connections, for instance the Christoffel symbol perspective that we will discuss a bit later.  Another approach is to focus on the differentiation operation \nabla_v rather than the identifications \Gamma(x \to x+dx) or \Gamma(\gamma), and in particular on the algebraic properties of this operation, such as linearity in v or derivation-type properties (in particular, obeying various variants of the Leibnitz rule).  This approach is particularly important in algebraic geometry, in which the notion of an infinitesimal or of a path may not always be obviously available, but we will not discuss it here.]

The way we have defined it, a connection is a means of identifying two infinitesimally close fibres A_x, A_{x+dx} of a fibre bundle (A_x)_{x \in X}.  But, thanks to (1), we can also identify two distant fibres A_x, A_y, provided that we have a path \gamma: [a,b] \to X from x = \gamma(a) to y = \gamma(b), by concatenating the infinitesimal identifications by a non-commutative variant of a Riemann sum:

\Gamma(\gamma) := \lim_{\sup |t_{i+1}-t_i| \to 0} \Gamma(\gamma(t_{n-1}) \to \gamma(t_n)) \circ \ldots \circ \Gamma(\gamma(t_0) \to \gamma(t_1)), (2)

where a = t_0 < t_1 < \ldots < t_n = b ranges over partitions.  This gives us a parallel transport map \Gamma(\gamma): A_x \to A_y identifying A_x with A_y, which in view of its Riemann sum definition, can be viewed as the “integral” of the connection \Gamma along the curve \gamma.  This map does not depend on how one parametrises the path \gamma, but it can depend on the choice of path used to travel from x to y.

We illustrate these concepts using several examples, including the three examples introduced earlier.

Example 1 continued. (Circle bundle of the sphere) The geometry of the sphere X in Example 1 provides a natural connection on the circle bundle SX, the Levi-Civita connection \Gamma, that lets one transport directions around the sphere in as “parallel” a manner as possible; the precise definition is a little technical (see e.g. my lecture notes for a brief description).  Suppose for instance one starts at some location x on the equator of the earth, and moves to the antipodal point y by a great semi-circle \gamma going through the north pole.  The parallel transport \Gamma(\gamma): S_x \to S_y along this path will map the north direction at x to the south direction at y.  On the other hand, if we went from x to y by a great semi-circle \gamma' going along the equator, then the north direction at x would be transported to the north direction at y.  Given a section u of this circle bundle, the quantity \nabla_v u(x) can be interpreted as the rate at which u rotates as one travels from x with velocity v. \diamond

Example 2 continued. (Circle extensions) In Example 2, we change the notion of “infinitesimally close” by declaring x and Tx to be infinitesimally close for any x in the base space X (and more generally, x and T^n x are non-infinitesimally close for any positive integer n, being connected by the path x \to Tx \to \ldots \to T^n x, and similarly for negative n).  A cocycle \rho: X \to S^1 can then be viewed as defining a connection on the skew product X \times_\rho S^1, by setting \Gamma( x \mapsto Tx ) = \rho(x) (and also \Gamma(x \to x) = 1 and \Gamma(Tx \to x ) = \rho(x)^{-1} to ensure compatibility with (1); to avoid notational ambiguities let us assume for sake of discussion that x, Tx, T^{-1} x are always distinct from each other).  The non-infinitesimal connections \rho_n(x) := \Gamma(x \to Tx \to \ldots \to T^n x) are then given by the formula \rho_n(x) = \rho(x) \rho(Tx) \ldots \rho(T^{n-1} x) for positive n (with a similar formula for negative n).  Note that these iterated cocycles \rho_n also describe the iterations of the shift \tilde T: (x,u) \mapsto (Tx,\rho(x)u), indeed \tilde T^n (x,u) = (T^n x, \rho_n(x) u). \diamond

Example 3 continued. (Oriented graphs) In Example 3, we declare two edges e, e’ in X to be “infinitesimally close” if they are adjacent.  Then there is a natural notion of parallel transport on the bundle (A_e)_{e \in X}; given two adjacent edges e = \{u,v\}, e'=\{v,w\}, we let \Gamma(e \to e') be the isomorphism from A_e = \{ \vec{uv}, \vec{vu} \} to A_{e'} = \{ \vec{vw}, \vec{wv} \} that maps \vec{uv} to \vec{vw} and \vec{vu} to \vec{wv}.  Any path \gamma = (\{v_1,v_2\}, \{v_2,v_3\}, \ldots, \{v_{n-1},v_n\}) of edges then gives rise to a connection \Gamma(\gamma) identifying A_{\{v_1,v_2\}} with A_{\{v_{n-1},v_n\}}.  For instance, the triangular path (\{u,v\}, \{v,w\}, \{w,u\}, \{u,v\}) induces the identity map on A_{\{u,v\}}, whereas the U-turn path (\{u,v\}, \{v,w\}, \{w,x\}, \{x,v\}, \{v,u\}) induces the anti-identity map on A_{\{u,v\}}.

Given an orientation \vec G = (\vec e)_{e \in X} of the graph G, one can “differentiate” \vec G at an edge \{u,v\} in the direction \{u,v\} \to \{v,w\} to obtain a number \nabla_{\{u,v\} \to \{v,w\}} \vec G(\{u,v\}) \in \{-1,+1\}, defined as +1 if the parallel transport from \{u,v\} and \{v,w\} preserves the orientations given by \vec G, and -1 otherwise.  This number of course depends on the choice of orientation.  But certain combinations of these numbers are independent of such a choice; for instance, given any closed path \gamma = \{e_1,e_2,\ldots,e_n,e_{n+1}=e_1\} of edges in X, the “integral” \prod_{i=1}^n \nabla_{e_i \to e_{i+1}} \vec G(e_i) \in \{-1,+1\} is independent of the choice of orientation \vec G (indeed, it is equal to +1 if \Gamma(\gamma) is the identity, and -1 if \Gamma(\gamma) is the anti-identity.  \diamond

Example 4. (Monodromy)  One can interpret the monodromy maps of a covering space in the language of connections.  Suppose for instance that we have a covering space \pi: \tilde X \to X of a topological space X whose fibres \pi^{-1}(\{x\}) are discrete; thus \tilde X is a discrete fibre bundle over X.  The discreteness induces a natural connection \Gamma on this space, which is given by the lifting map; in particular, if one integrates this connection on a closed loop based at some point x, one obtains the monodromy map of that loop at x. \diamond

Example 5. (Definite integrals) In view of the definition (2), it should not be surprising that the definite integral \int_a^b f(x)\ dx of a scalar function f: [a,b] \to {\Bbb R} can be interpreted as an integral of a connection.  Indeed, set X := [a,b], and let ({\Bbb R})_{x \in X} be the trivial line bundle over X.  The function f induces a connection \Gamma_f on this bundle by setting

\Gamma_f(x \mapsto x+dx): y \mapsto y + f(x) dx.

The integral \Gamma_f([a,b]) of this connection along {}[a,b] is then just the operation of translation by \int_a^b f(x)\ dx in the real line. \diamond

Example 6. (Line integrals) One can generalise Example 5 to encompass line integrals in several variable calculus.  Indeed, if X is an n-dimensional domain, then a vector field f = (f_1,\ldots,f_n): X \to {\Bbb R}^n induces a connection \Gamma_f on the trivial line bundle ({\Bbb R})_{x \in X} by setting

\Gamma_f( x \mapsto x+dx ): y \mapsto y + f_1(x) dx_1 + \ldots + f_n(x) dx_n.

The integral \Gamma_f(\gamma) of this connection along a curve \gamma is then just the operation of translation by the line integral \int_\gamma f \cdot dx in the real line.

Note that a gauge transformation in this context is just a vertical translation (x,y) \mapsto (x,y+V(x)) of the bundle ({\Bbb R})_{x \in X} \equiv X \times {\Bbb R} by some potential function V: X \to {\Bbb R}, which we will assume to be smooth for sake of discussion.  This transformation conjugates the connection \Gamma_f to the connection \Gamma_{f - \nabla V}.  Note that this is a conservative transformation: the integral of a connection along a closed loop is unchanged by gauge transformation. \diamond

Example 7. (ODE) A different way to generalise Example 5 can be obtained by using the fundamental theorem of calculus to interpret \int_{[a,b]} f(x)\ dx as the final value u(b) of the solution to the initial value problem

u'(t) = f(t); \quad u(a) = 0

for the ordinary differential equation u'=f.  More generally, the solution u(b) to the initial value problem

u'(t) = F( t, u(t) ); \quad u(a) = u_0

for some u: [a,b] \to {\Bbb R}^n taking values in some manifold Y, where F: [a,b] \times {\Bbb R}^n \to {\Bbb R}^n is a function (let us take it to be Lipschitz, to avoid technical issues), can also be interpreted as the integral of a connection \Gamma on the trivial vector space bundle ({\Bbb R}^n)_{t \in [a,b]}, defined by the formula

\Gamma(t \mapsto t+dt): y \mapsto y + F(t,y) dt.

Then \Gamma[a,b] will map u_0 to u(b), this is nothing more than the Euler method for solving ODE.   Note that the method of integrating factors in solving ODE can be interpreted as an attempt to simplify the connection \Gamma via a gauge transformation.  Indeed, it can be profitable to view the entire theory of connections as a multidimensional “variable-coefficient” generalisation of the theory of ODE.  \diamond

Once one selects a gauge, one can express a connection in terms of that gauge.  In the case of vector bundles (in which every fibre is a d-dimensional vector space for some fixed d), the covariant derivative \nabla_v w(x) of a section w of that bundle along some vector v emanating from x can be expressed in any given gauge by the formula

\nabla_v w(x)^i = v^\alpha \partial_\alpha w(x)^i + v^\alpha \Gamma_{\alpha j}^i w(x)^j

where we use the gauge to express w(x) as a vector (w(x)^1,\ldots,w(x)^d), the indices i, j = 1,\ldots,d are summed over the fibre dimensions (and \alpha summed over the base dimensions) as per the usual conventions, and the \Gamma_{\alpha j}^i := (\nabla_{e_\alpha} e_j)^i are the Christoffel symbols of this connection relative to this gauge.

One example of this, which models electromagnetism, is a connection on a complex line bundle V = (V_{t,x})_{(t,x) \in {\Bbb R}^{1+3}} in spacetime {\Bbb R}^{1+3} = \{ (t,x): t \in {\Bbb R}, x \in {\Bbb R}^3 \}.  Such a bundle assigns a complex line V_{t,x} (i.e. a one-dimensional complex vector space, and thus isomorphic to {\Bbb C}) to every point (t,x) in spacetime.  The structure group here is U(1) (strictly speaking, this means that we view the fibres as normed one-dimensional complex vector spaces, otherwise the structure group would be {\Bbb C}^\times). A gauge identifies V with the trivial complex line bundle ({\Bbb C})_{(t,x) \in {\Bbb R}^{1+3}}, thus converting sections (w_{t,x})_{(t,x) \in {\Bbb R}^{1+3}} of this bundle into complex-valued functions \phi: {\Bbb R}^{1+3} \to {\Bbb C}.  A connection on V, when described in this gauge, can be given in terms of fields A_\alpha: {\Bbb R}^{1+3} \to {\Bbb R} for \alpha = 0,1,2,3; the covariant derivative of a section in this gauge is then given by the formula

\nabla_\alpha \phi := \partial_\alpha \phi + i A_\alpha \phi.

In the theory of electromagnetism, A_0 and (A_1,A_2,A_3) are known (up to some normalising constants) as the electric potential and magnetic potential respectively.  Sections of V do not show up directly in Maxwell’s equations of electromagnetism, but appear in more complicated variants of these equations, such as the Maxwell-Klein-Gordon equation.

A gauge transformation of V is given by a map U: {\Bbb R}^{1+3} \to S^1; it transforms sections by the formula \phi \mapsto U^{-1} \phi, and connections by the formula \nabla_\alpha \mapsto U^{-1} \nabla_\alpha U, or equivalently

A_\alpha \mapsto A_\alpha + \frac{1}{i} U^{-1} \partial_\alpha U = A_\alpha + \partial_\alpha \frac{1}{i} \log U.   (2)

In particular, the electromagnetic potential A_\alpha is not gauge invariant (which broadly corresponds to the concept of being nonphysical or nonmeasurable in physics), as gauge symmetry allows one to add an arbitrary gradient function to this potential.  However, the curvature tensor

F_{\alpha \beta} := [\nabla_\alpha, \nabla_\beta] = \partial_\alpha A_\beta - \partial_\beta A_\alpha

of the connection is gauge-invariant, and physically measurable in electromagnetism; the components F_{0i} = -F_{i0} for i=1,2,3 of this field have a physical interpretation as the electric field, and the components F_{ij} = -F_{ji} for 1 \leq i < j \leq 3 have a physical interpretation as the magnetic field.  (The curvature tensor F can be interpreted as describing the parallel transport of infinitesimal rectangles; it measures how far off the connection is from being flat, which means that it can be (locally) “straightened” via some choice of gauge to be the trivial connection.  In nonabelian gauge theories, in which the structure group is more complicated than just the abelian group U(1), the curvature tensor is non-scalar, but remains gauge-invariant in a tensor sense (gauge transformations will transform the curvature as they would transform a tensor of the same rank).

Gauge theories can often be expressed succinctly in terms of a connection and its curvatures.  For instance, Maxwell’s equations in free space, which describes how electromagnetic radiation propagates in the presence of charges and currents (but no media other than vacuum), can be written (after normalising away some physical constants) as

\partial^\alpha F_{\alpha \beta} = J_\beta

where J_\beta is the 4-current.  (Actually, this is only half of Maxwell’s equations, but the other half are a consequence of the interpretation (*) of the electromagnetic field as a curvature of a U(1) connection.  Thus this purely geometric interpretation of electromagnetism has some non-trivial physical implications, for instance ruling out the possibility of (classical) magnetic monopoles.)  If one generalises from complex line bundles to higher-dimensional vector bundles (with a larger structure group), one can then write down the (classical) Yang-Mills equation

\nabla^\alpha F_{\alpha \beta} = 0

which is the classical model for three of the four fundamental forces in physics: the electromagnetic, weak, and strong nuclear forces (with structure groups U(1), SU(2), and SU(3) respectively).  (The classical model for the fourth force, gravitation, is given by a somewhat different geometric equation, namely the Einstein equations G_{\alpha \beta} = 8 \pi T_{\alpha \beta}, though this equation is also “gauge-invariant” in some sense.)

The gauge invariance (or gauge freedom) inherent in these equations complicates their analysis.  For instance, due to the gauge freedom (2), Maxwell’s equations, when viewed in terms of the electromagnetic potential A_\alpha, are ill-posed: specifying the initial value of this potential at time zero does not uniquely specify the future value of this potential (even if one also specifies any number of additional time derivatives of this potential at time zero), since one can use (2) with a gauge function U that is trivial at time zero but non-trivial at some future time to demonstrate the non-uniqueness.  Thus, in order to use standard PDE methods to solve these equations, it is necessary to first fix the gauge to a sufficient extent that it eliminates this sort of ambiguity.  If one were in a one-dimensional situation (as opposed to the four-dimensional situation of spacetime), with a trivial topology (i.e. the domain is a line rather than a circle), then it is possible to gauge transform the connection to be completely trivial, for reasons generalising both the fundamental theorem of calculus and the fundamental theorem of ODEs.  (Indeed, to trivialise a connection \Gamma on a line {\Bbb R}, one can pick an arbitrary origin t_0 \in {\Bbb R} and gauge transform each point t \in {\Bbb R} by \Gamma([t_0,t]).)  However, in higher dimensions, one cannot hope to completely trivialise a connection by gauge transforms (mainly because of the possibility of a non-zero curvature form); in general, one cannot hope to do much better than setting a single component of the connection to equal zero.  For instance, for Maxwell’s equations (or the Yang-Mills equations), one can trivialise the connection A_\alpha in the time direction, leading to the temporal gauge condition

A_0 = 0.

This gauge is indeed useful for providing an easy proof of local existence for these equations, at least for smooth initial data.  But there are many other useful gauges also that one can fix; for instance one has the Lorenz gauge

\partial^\alpha A_\alpha = 0

which has the nice property of being Lorentz-invariant, and transforms the Maxwell or Yang-Mills equations into linear or nonlinear wave equations respectively.  Another important gauge is the Coulomb gauge

\partial_i A_i = 0

where i only ranges over spatial indices 1,2,3 rather than over spacetime indices 0,1,2,3.  This gauge has an elliptic variational formulation (Coulomb gauges are critical points of the functional \int_{{\Bbb R}^3} \sum_{i=1}^3 |A_i|^2) and thus are expected to be “smaller” and “smoother” than many other gauges; this intuition can be borne out by standard elliptic theory (or Hodge theory, in the case of Maxwell’s equations).  In some cases, the correct selection of a gauge is crucial in order to establish basic properties of the underlying equation, such as local existence.  For instance, the simplest proof of local existence of the Einstein equations uses a harmonic gauge, which is analogous to the Lorenz gauge mentioned earlier; the simplest proof of local existence of Ricci flow uses a gauge of de Turck that is also related to harmonic maps (see e.g. my lecture notes); and in my own work on wave maps, a certain “caloric gauge” based on harmonic map heat flow is crucial (see e.g. this post of mine).  But in many situations, it is not yet fully understood whether the use of the correct choice of gauge is a mere technical convenience, or is more innate to the equation.  It is definitely conceivable, for instance, that a given gauge field equation is well-posed with one choice of gauge but ill-posed with another.  It would also be desirable to have a more gauge-invariant theory of PDEs that did not rely so heavily on gauge theory at all, but this seems to be rather difficult; many of our most powerful tools in PDE (for instance, the Fourier transform) are highly non-gauge-invariant, which makes it very inconvenient to try to analyse these equations in a purely gauge-invariant setting.