In casual discussions with scientists and engineers, I’m sometimes surprised at how poorly understood the concept of an eigenvalue is. Eigenvalues are sometimes viewed as exotic or suspicious, when in reality they are fundamental to comprehending many basic facts in applied mathematics. For English speakers, maybe part of it is the name. In the context of eigenvalues and eigenvectors, the German word “eigen” is usually translated as “characteristic” or “inherent”. That’s a good way to think about it. They are values and vectors which characterize or describe a linear transformation. And they are not hard to understand. I think a lot of this confusion has to do with the way linear algebra is typically taught. I’d like to make a brief attempt to give a relatively non-technical introduction of eigenvalues and eigenvectors.
I’m going to assume that you have taken a typical undergraduate mathematics sequence for scientists and engineers at some point in your life. So you have passed course in linear algebra, you have at least seen the definition of an eigenvalue (it’s okay if you can’t remember it), and you have probably even computed a few eigenvalues.
Linearity in a nonlinear world
For some context, let’s zoom out and look at the subject of Applied Mathematics as a whole. It is a huge area, mostly consisting of constructing, analyzing, understanding, and solving a wide variety of mathematical models for phenomena that we observe in the real world. This includes applications to physics, engineering, informatics, statistics, biology, finance, the social sciences, even marketing. The list goes on, underpinning most of our modern technological world.
One of the most basic structures we use to create models is that of a function. A function $f$ is simply a mapping between two sets, let’s call them $U$ and $V$. For each element $x$ in $U$, $f$ assigns a unique element $y = f(x)$ in $V$. Functions are sometimes called transformations, an evocative way of describing the action of a function. Our function $f$ transforms elements in $U$ into elements in $V$. For a large chunk of applied mathematics, we have to deal with nonlinear transformations. This is not only because there are so many interesting phenomena described by nonlinear models, but because in some sense, most functions are nonlinear. Unfortunately, for all but a few very special cases, nonlinear functions are hard to understand. It’s difficult to solve problems involving nonlinear functions. It’s difficult to make general statements about nonlinear functions. It’s difficult to analyze algorithms involving nonlinear functions. So what do we do? Often, and when possible, we linearize. Linearizing nonlinear functions is the subject of differential calculus, another fun topic. For now let’s just say that the process of linearization, when successful, produces a linear function which is a good local approximation to a nonlinear function. Why do we linearize? Because we understand linear functions. We can characterize them, we can make general statements about them, and importantly, we can solve equations involving them, even in very general contexts.
The one-dimensional case
What distinguishes a linear function from a nonlinear function? Let’s start with perhaps the most familiar context, where the sets $U$ and $V$ above are both equal to the space of real numbers $\mathbb{R}$. In this context, a nonlinear function could be nearly anything, say $f(x) = \frac{1}{2} \sin(\pi x) + 7x^3$ or $f(x) = xe^{-2x}$. But all linear functions look exactly the same: every linear function $f$ from $\mathbb{R}$ to $\mathbb{R}$ has the form $f(x) = ax$, where $a$ is just another real number (this is easy to prove from the definition of linearity, which we’ll skip). That’s it. And guess what the eigenvalue of this linear transformation is? That’s right, it’s $a$. So if we know that $f$ is linear and we know it’s eigenvalue, we know everything about $f$. You might say that we’ve completely characterized $f$.
Although $f(x) = ax$ is a very simple function, let’s try to visualize what it does, to set up visualization of higher dimensional cases. As pictured below, $f$ simply takes a point $x$ on the real line and scales it by the constant $a$. So for example, the unit interval $[-1, 1]$ gets mapped to the interval $[-a, a]$ (the picture shows a case where $a < 1$). Note that similarly, every other interval $[-r, r]$ gets mapped to $[-ar, ar]$.
A more traditional way to view the mapping would be with an $xy$ graph, as shown below (assuming $0 < a < 1$). This picture has the advantage that it clearly shows the linearity of the map (because the graph is a line). But it has the disadvantage that it is difficult to visualize analogous mappings in higher dimensions. So this is the only case for which we’ll use this graph.
That’s it for the one-dimensional case. To summarize, in one dimension, all linear maps look like $f(x) = ax$, $a$ is the eigenvalue of the map, and so if we know the eigenvalue, we know everything about $f$.
Rescaling in two dimensions
In introducing general functions above, we mentioned mappings from any set $U$ to any set $V$. We’ve already discussed that eigenvalues pertain only to linear maps. It’s also important to understand that eigenvalues only make sense in the special case of a linear map where the spaces $U$ and $V$ are identical, although once we understand eigenvalues, it’s easy to generalize to the case where $U$ and $V$ are possibly different, by introducing singular values. Maybe a topic for another post.
So let’s consider now the much more interesting two-dimensional case. How can we describe linear transformations from $\mathbb{R}^2$ to $\mathbb{R}^2$? Again skipping the definition of linearity, and bypassing any discussion of abstract vector spaces, linear transformations are represented by matrices. A matrix is just a shorthand way to represent the linear equations that describe the transformation. For example we could represent the linear transformation $f(x_0, x_1) = (ax_0 + bx_1, cx_0 + dx_1)$, by the matrix $A = \left( \begin{array}{cc} a & b \\ c & d \end{array} \right)$. The matrix $A$ tells us everything we need to know to compute $f$. Using the shorthand $x = (x_0, x_1)$, we can write $f(x) = Ax$, where $Ax$ indicates matrix-vector multiplication.
Okay, $f$ now is already considerably more complicated than in the one-dimensional case. Let’s begin by considering the very special case where $b = c = 0$. So $f(x_0, x_1) = (ax_0, dx_1)$, and $A = \left( \begin{array}{cc} a & 0 \\ 0 & d \end{array} \right)$. This is equivalent to two one-dimensional linear functions, just like we discussed previously. If you were to guess that there are two eigenvalues of $A$ and their values are $a$ and $d$, you would be correct. However, contrary to the one-dimensional case, the eigenvalues alone do not completely characterize the transformation. To see why, let’s make a picture similar to the first drawing above.
Notice that just as in the one-dimensional case, the interval $[-1, 1]$ on the $x_0$ axis gets mapped to the interval $[-a, a]$ on the same axis. Similarly, the interval $[-1, 1]$ on the $x_1$ axis gets mapped to $[-d, d]$. The unit circle gets stretched and squeezed into an ellipse. The transformation is simply rescaling the axes. We can express this with eigenvectors. The eigenvalue $a$ has an associated eigenvector $v_0 = (1,0)$ (corresponding with the x_0 axis), while $d$ has associated eigenvector $v_1 = (0, 1)$ (corresponding with the x_1 axis). The definition of eigenvalues and eigenvectors says that $(\lambda v)$ is an eigenvalue, eigenvector pair for $A$ if $Av = \lambda v$. We know that $Av_0 = av_0$ and $Av_1 = dv_1$, so $(a, v_0)$ and $(d, v_1)$ are eigenpairs and the linear transformation represented by $A$ rescales along the coordinate axes. That’s all it does. We understand it completely.
General stretching and shrinking
But is this all that a linear transformation can do? No, luckily things can get more interesting, even in two dimensions. This time, let’s work backwards. Suppose that we already know that the matrix $A$ has eigenvalue $\lambda_0$ with eigenvector $v_0$, and $\lambda_1$ with vector $v_1$. Let’s make two new matrices $V = \left( \begin{array}{cc} v_0 | v_1 \end{array} \right)$ (where $v_0$ and $v_1$ are column vectors), and $\Lambda = \left( \begin{array}{cc} \lambda_0 & 0 \\ 0 & \lambda_1 \end{array} \right)$. Notice that $\Lambda$ is in exactly the same form as the diagonal matrix above, and we’ve already seen that it simply rescales the coordinate axes. We can write the fact that $Av_0 = \lambda_0 v_0$ and $Av_1 = \lambda_1 v_1$ as $AV = V\Lambda$. Let’s also suppose that $V$ turns out to be an invertible matrix (true whenever $v_0$ and $v_0$ are linearly independent). Then multiplying by $V^{-1}$ on the right, $A = V \Lambda V^{-1}$. In this way, we can construct a matrix which has any (linearly independent) eigenvectors and any eigenvalues we want. But more importantly, this construction reveals exactly what $A$ does.
Suppose we want a matrix which scales some arbitrary directions by some arbitrary amounts. The arithmetic works out a little cleaner if we choose “unit directions”, that is, direction vectors with length 1. So for example suppose we want a matrix that stretches by $\lambda_0 = 2$ along the direction $v_0 = \frac{\sqrt{2}}{2}(1, 1)$ while shrinking by $\lambda_1 = \frac{1}{2}$ along the direction $v_1 = \frac{\sqrt{2}}{2}(-1, 1)$. Then $\Lambda = \left( \begin{array}{cc} 2 & 0 \\ 0 & \frac{1}{2} \end{array} \right)$, and $V = \left( \begin{array}{rr} v_0 | v_1 \end{array} \right) = \frac{\sqrt{2}}{2}\left( \begin{array}{cc} 1 & -1 \\ 1 & 1 \end{array} \right)$. For this example, it turns out that $V^{-1} = \frac{\sqrt{2}}{2}\left( \begin{array}{cc} 1 & 1 \\ -1 & 1 \end{array} \right)$. Both $V$ and $V^{-1}$ are rotation matrices. $V$ rotates the coordinate axes $45^\circ$ counterclockwise and as you would expect, the inverse $V^{-1}$ rotates $45^\circ$ clockwise. This makes it clear what $A$ does: it rotates $45^\circ$ counterclockwise (so $v_0$ aligns with the $x_0$ axis and $v_1$ aligns with the $x_1$ axis), rescales along the coordinate axes, and then rotates back. That’s it!
By the way, multiplying out $V \Lambda V^{-1}$ gives $A = \left( \begin{array}{cc} \frac{5}{4} & \frac{3}{4} \\ \frac{3}{4} & \frac{5}{4} \end{array} \right)$. Without knowing the eigenvalues and eigenvectors, it would not be so obvious what $A$ is really doing. Also notice that $A$ is symmetric, that is $b = c$. This is not a coincidence, more on that later.
A little complexity
The discussion above got a little ahead of itself. It’s not obvious why an arbitrary matrix $A$ would necessarily have an eigenpair. Consider $A = \left( \begin{array}{cc} 0 & 1 \\ -1 & 0 \end{array} \right)$. The transformation is now $f(x_0, x_1) = (x_1, -x_0)$. You can see that the matrix maps the $x$ axis $(1,0)$ to the negative $y$ axis $(0,-1)$ and the $y$ axis $(0,1)$ to the $x$ axis $(1, 0)$. That is just a clockwise rotation by $90^\circ$. If the matrix simply rotates everything by $90^\circ$, how can there be a direction that the matrix only rescales?
Seems unlikely but let’s forge ahead anyway and try to find an eigenvalue. $Av = \lambda v$ means $v_0 = \lambda v_1$ and $v_1 = -\lambda v_0$. Combining these equations gives $v_1 = -\lambda^2 v_1$. Since $v_1$ can’t be zero without $v_0$ also being zero, we can divide by $-v_1$ to get $\lambda^2 = -1$. So there are two eigenvalues, but they are imaginary $\lambda = \pm i$ (or $\lambda = \pm j$ if you prefer). Now it is easy to find eigenvectors: for example $v_0 = (i, 1)$ and $v_1 = (1, i)$ are eigenvectors for $\lambda_0 = -i$ and $\lambda_1 = i$, respectively.
Unfortunately the meaning seems a little more obscure now. A real matrix with imaginary eigenvalues? Seems like cheating. One might argue that the problem is that we started with a real matrix in the first place. Complex eigenvalues would seem more natural if $A$ was complex. The situation is directly analogous to what we see in algebra when trying to solve polynomial equations with real coefficients. The real numbers simply aren’t “big enough” to represent all of the solutions.
But let’s see how these complex eigenpairs work. Take $v_0 = (i, 1)$. Then $Av_0 = (1, -i) = -i(i, 1)$, so we definitely have an eigenvector. The transformation is harder to visualize, since this is now equivalent to an operation in four real dimensions, but it’s clear enough that the definition is satisfied, and similarly for the pair $(i, (1, i))$. For a general matrix $A$, we can’t rule out the existence of complex eigenvalues, but they do have a common interpretation as representing some underlying rotation in their associated eigenspace.
Fun facts
The examples above were in one- and two-dimensional spaces only because computations and visualizations are much easier. But the basic ideas apply in any finite dimension, and they can be extended in some circumstances to infinite-dimensional spaces. Here are some general facts.
- Symmetric real $n \times n$ matrices have only real eigenvalues. Eigenvectors associated with distinct eigenvalues are orthogonal. Every symmetric matrix has a representation $A = V \Lambda V^t$, where the columns of $V$ are orthogonal unit-length eigenvectors for $A$, and $\Lambda$ is a diagonal matrix with (possibly repeated) eigenvalues on the diagonal. This representation is not necessarily unique, but it does describe exactly what $A$ does. In this case, due to the orthogonality of $V$ (similar to the example above), the transpose $V^t = V^{-1}$. A similar result holds for complex Hermitian matrices.
- Every $n \times n$ real matrix has $n$ (possibly repeated, possibly complex) eigenvalues. The same holds for $n \times n$ complex matrices. This is a direct consequence of the Fundamental Theorem of Algebra.
- Although eigenvalues strictly only make sense for square matrices, there is a closely related construction called the singular value decomposition (SVD) which exists for all matrices. The construction of the SVD relies on fact 1 above. In the SVD, the role of eigenvalues is replaced by singular values, which are always real and non-negative. I will probably make another post on the SVD at some point. It is computable and super useful in practice.
- While you probably learned in your linear algebra course to calculate eigenvalues and eigenvectors for small matrices by hand, the methods you learned are probably not the ones that are used in modern computational algorithms. There are efficient, effective computational methods for finding eigenvalues and eigenvectors of even very large matrices. Standard eigenvalue solvers are available in almost any language or package used for numerical computations (Python, Julia, Matlab, C/C++, etc.)
- For many purposes, it’s not necessary to compute all the eigenvalues/eigenvectors of a matrix. Special purpose, efficient methods exist for computing a few eigenvalues (usually the largest or smallest in magnitude) of a matrix. The underlying matrix can be huge (possibly too big even to store explicitly), with the algorithm only requiring computation of the action of the matrix on a vector.