SMART: The Stochastic Monotone Aggregated Root-Finding Algorithm

Last night, I uploaded a new paper on the Stochastic Monotone Aggregated Root-Finding (SMART) algorithm. The algorithm excites me for a few reasons:

  1. SMART extends SAGA, SVRG, Finito, and SDCA, all of which solve problems like

    \displaystyle \text{minimize}_{x \in \mathbb{R}^d}\; \frac{1}{N} \sum_{i=1}^N f_i(x),

    to allow asynchronous parallel implementations, arbitrary block-coordinate updates, mini batching, and importance sampling.

  2. SMART replaces function gradients, {\nabla f_i}, with black-boxes, {S_i}, called operators, and arrives at the root-finding problem:

    \displaystyle \text{Find }x^\ast \in \mathbb{R}^d\text{ such that }\frac{1}{N} \sum_{i=1}^N S_i(x^\ast) = 0.

    For SMART to converge, these operators need only satisfy a weak property, which we call the coherence condition.

  3. Because SMART works with operators, it generates some new algorithms for large-scale optimization problems like

    \displaystyle \text{minimize}_{x \in \mathbb{R}^d}\; \frac{1}{M}\sum_{j=1}^M g_j(A_j x) + \frac{1}{n} \sum_{i=1}^N f_i(x),

    where the functions {g_j} are proximable and the maps {A_j} are linear—these problems are hot in machine learning right now.

In the coming weeks, I’ll devote some blog posts to implementations of SMART on problems like logistic regression, support vector machines, collaborative filtering, feasibility problems, and more. In the meantime, check out the paper; comments are welcome.

This material is based upon work supported by the National Science Foundation under Award No. 1502405. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

There is no infinite dimensional Lebesgue measure

Imagine that there was a countably additive and translation invariant measure, {m}, on a Hilbert Space {\mathcal{H}} such that every ball {B(v, \varepsilon)} has positive measure, finite measure. Because {\mathcal{H}} is infinite dimensional, there exists {\{e_{i} | i = 1, \cdots, \infty\}} such that {\langle e_{i}, e_{j}\rangle = \delta_{ij}}. Note that the points {e_{i}} and {e_{j}} are far apart: {\|e_{i} - e_{j}\|^2 = \|e_{i}\|^2 + \|e_{j}\|^2 = 2}, if {i \neq j}. Now, the ball, {B(0, 2)}, contains the countable collection of disjoint balls: {\{B(e_{i}, \frac{1}{2}) | i = 1,\cdots, \infty\}} (you can see where this is going). Thus, because all balls, {B(e_{i}, \frac{1}{2})} have the same measure (by translation invariance), and {m} is countable additive, it follows that {\infty = \sum_{i=1}^\infty m(B(e_{i}, \frac{1}{2})) < B(0, 2) < \infty}. Thus, we’ve reached a contradiction.

In some cases, it’s desirable to have an analogue of a measure on a Hilbert space {\mathcal{H}}. The most common application is to make statements such as “Property X is true for a.e. function.” One way to do make these statements is through the concept of Prevalence. Another is the definition of a Gaussian like measure called the Abstract Wiener Space.

Note that I wrote this article before realizing that Wikipedia has a similar article.

My First Two Years

During the past one and a half years I’ve spent as a graduate student, I’ve developed a few principles. I have decided to post them here, and update them as I learn more.

1. Don’t be afraid to branch out and learn subjects in other disciplines. Broadening your interests is easy in graduate school, and you’re paid to do it. Further, you may discover that you’ve gone to grad school for the wrong subject. You may be studying number theory, topology, etc., because you’re already good at it, and you don’t know enough about other subjects. It’s easy to get lost in the achievement factor in math, and just because you can solve problems with ease, doesn’t mean you’re truly motivated by the subject’s goals. The best way to discover your true motivations is to think about the guiding problems in the field and decide if they’re interesting to you in their most isolated form (e.g. do you “care” about distinguishing smooth structures on manifolds?). If the subjects no longer seem interesting, then move on to something you really enjoy.

2. Talk to everybody: professors, graduate students, undergrads, etc. The best way to learn is from other people, but don’t depend on them. Everybody has a tool they’re fond of — learn it. Find out what professors are interested in and ask for a research project or a reading course in that topic. Professors generally want to speak with you after you pass your quals. It’s part of their job to graduate students. So don’t be discouraged by professors that turn you down, five no’s and a yes is still a yes.

3. Don’t be afraid to say you don’t understand. Once you realize you don’t understand, you’re in the best position to learn. Further, many students may give the impression that they truly understand a subject, but will be forced to reevaluate if you ask enough questions.

4. Don’t be afraid of the computer. There is a massive amount of data in the world, and you’ll be at a supreme advantage if you know how to compute with it.

5. Go to seminars and colloquiums in your field and other nearby fields. Learning what other people are working on is likely to give you research ideas. Sign up to give talks in front of these groups, even if you don’t know a thing about the topic. Put a lot of effort into explaining concepts clearly and accurately.

6. Think long term. I know that many PhD students are only thinking about the next five years of graduate school. Assuming that everything will “work out” will put you at a severe disadvantage. Before every major choice you make in grad school, ask yourself why you want a PhD, and how will this choice benefit your goal. No matter what field you plan to go into, ask yourself the following question: who are the experts, and am I good enough to take their job? There isn’t a lot of room at the top, so you’re likely going to have to push someone out of the way to make room for yourself.

7. Schedule enough down time so that you can be fresh every day. Make sure you attend social functions, go to the gym, or play a sport, etc.

8. Learn to manage your time effectively. How many projects are you working on? If you get side tracked on a particular problem, move on and come back to it. Try to skim a book before you do the problems. Then, go back and read carefully when you need a deep understanding.

9. Learn how to measure success. Set benchmarks for yourself. Did you achieve more today than you do on average? Did you master a difficult concept? Try to achieve more every day, but don’t be discouraged if you can’t. You can’t learn everything in finite time. Some things are long term projects and even reading a few pages can give you something to be proud of.

10. Apply for fellowships. Teaching eats up a lot of time, but it does give you valuable experience. If you can’t get an RA, having your own funding can make your research a lot more productive. Having your own funding also increases the chances of a professor working with you.

A Haskell implementation of combinatorial Heegaard Floer homology

I’ve uploaded a Haskell implementation of a combinatorial algorithm to compute the “hat” version of Heegaard Floer knot homology. This program should be seen as a stepping stone to implementing the more general HF complexes for links, and ultimately for implementing HF^- for large integer surgeries on links. Generally, our approach is the same as the one found in the C++ implementation of Baldwin and Gillam, located here.

The major difference is how we compute generators in positive Alexander grading. Essentially, we noticed the number of possible solutions to a particular integer programming problem was very small. We used this fact to narrow the set of possible permutations with positive Alexander grading and used a recursive algorithm to generate the restricted set of permutations.

In general, Gillam and Baldwin’s program seems to be much faster. Though for some knots such as KT_{2,1} or obviously trivial, but huge, uknots, our program is comparable in speed, and for the uknot example, much faster.

To compile the code, you must install the Data.Vector Haskell and the Data.Array.Repa modules (via cabal install Vector and cabal install Repa) and run the terminal command:

ghc --make -O2 HFKhat.hs -XBangPatterns -XTypeOperators -XTypeSynonymInstances

If you’re having trouble installing Data.Repa, see the instructions here.

You can download the gzipped tar file here.

To Learn more about combinatorial Heegaard Floer homology, check out the references contained in thislink.

Note: If you’re a haskell beginner, I recommend downloading the Haskell platform for best results.

Final Note: More code, including a general module to compute homology over Z/2Z is available at my website.

Path connectedness is a homotopy invariant

I haven’t updated this blog in a while, due to qualifying exam preparation and research projects. I’ll write about the latter soon.  While studying for the geometry/topology qual, I asked a basic question: Is path connectedness a homotopy invariant? Turns out the answer is yes, and I’ve written up a quick proof of the fact below.  You can view a pdf of this entry here.

Proposition 1
 Let {f : X \rightarrow Y} be a homotopy equivalence. If {X} is path connected, then so is {Y}.

Proof: It’s clear that the image of {f} is path connected. Thus, it is enough to show that any point of {Y} can be connected to a point of {f(X)}. Let {g : Y \rightarrow X} be a map such that {f\circ g} is homotopic to {\text{id}_Y}, via the homotopy {h : Y \times I \rightarrow Y}. Let {y \in Y}. Then, {y' = f(g(y)) \in f(X)} and {\gamma(t) = h(y, t)} is a path from {y'} to {y}. \Box

We can deduce the following result from the proof of proposition 1.

Corollary 2 Let {f : X \rightarrow Y} be a homotopy equivalence. Then {X} is path connected if, and only if, {f(X)} is path connected.

Proof: If {f(X)} is path connected, then {Y} is path connected. Hence {X} is path connected. \Box

A Note On Sheafification

You can view a pdf of this entry here.

A presheaf {\mathcal{F}} on a topological space {X} with values in a category {\mathcal{D}} (usually with values in GrpCommRing, or Set), can be associated to a sheaf {\widetilde{\mathcal{F}}} on {X} in a natural way. In fact, this association is left adjoint to the inclusion functor from the category of sheaves on {X} to the category of presheaves on {X}.

The standard construction of {\widetilde{\mathcal{F}}} is fairly straightforward: consider the topology on the disjoint union of stalks {\text{Tot}(\mathcal{F}) = \bigcup_{x \in X} \mathcal{F}_x} generated by the collection of sets

\displaystyle \begin{array}{rcl} \mathfrak{V}(b, U) &=& \{ (b_x, x) : x \in U\}, \end{array}

where {U \subseteq X} is open, {b\in \mathcal{F}(U)} and {b_x} is the image of {b} in {\mathcal{F}_x}. Let {p : \text{Tot}(\mathcal{F}) \rightarrow X} be the natural projection. For every {U \subseteq X} define {\widetilde{\mathcal{F}}(U)} to be the set of continuous sections {s : U \rightarrow \text{Tot}(\mathcal{F})} of {p}, i.e. {p\circ s = \text{id}_U}. It follows that {\widetilde{\mathcal{F}}} is a sheaf and there is a natural morphism {i : \mathcal{F} \rightarrow \widetilde{\mathcal{F}}}, such that {\mathcal{F}} is a sheaf if, and only if, {i} is an isomorphism, (this is a fairly easy exercise).

This shows that every sheaf arises as the sheaf of sections of some continuous map onto {X}. This also justifies the terminology: an element {a \in \mathcal{F}(U)} is called a section. At first glance, it is hard to see how this sheaf relates to {\mathcal{F}}. Indeed, the canonical map {i} fails to be either surjective or injective, in general. This says that we cannot “complete” {\mathcal{F}} by simply adding more elements, or by glueing together elements; we must do both. Thus, we can can’t view {\mathcal{F}} as a subpresheaf of {\widetilde{\mathcal{F}}} or as a cover. However, there is another natural way to think of {\widetilde{\mathcal{F}}}.

Let {\mathcal{G}} be a sheaf on {X} with values in some category {\mathcal{D}}, as above. One of the advantages of having a sheaf on {X} is that a section {s \in \mathcal{G}(U)} is uniquely determined by the collection of elements {\{s_x \in \mathcal{G}_x : x \in U\}}. This can fail for presheaves. Thus, we need {i(U): \mathcal{F}(U) \rightarrow \widetilde{\mathcal{F}}(U)} to glue together elements {a, b \in \mathcal{F}(U)} such that {a_x = b_x} for all {x \in U}. This is evident in the commutative diagram:

\displaystyle \begin{array}{rcl} \begin{array}{ccc} \mathcal{F}(U) & \longrightarrow & \widetilde{\mathcal{F}}(U) \\ \downarrow & \; & \downarrow \\ \mathcal{F}_x & \longrightarrow & \widetilde{\mathcal{F}}_x \end{array} \end{array}

This says that {i(U)(a)} and {i(U)(b)} have the same image in {\widetilde{\mathcal{F}}_x} for all {x \in U}. Thus, because {\widetilde{\mathcal{F}}} is a sheaf, {i(U)(a) = i(U)(b)}.

Aside from glueing sections together that “should” be equal, we also need to add more elements to our presheaf {\mathcal{F}}. Suppose we have an open cover of an open subset {U \subseteq X}: {U = \bigcup_\alpha U_\alpha}. If we can find a collection {\{b_\alpha\}, b_\alpha \in U_\alpha}, such that {b_\alpha|_{U_\alpha \cap U_\beta} = b_{\beta}|_{U_\alpha \cap U_\beta}}, then we should be able to find a “global” section {b} on {U} such that {b|_{U_\alpha} = b_\alpha}. Because our presheaf may be too “small” and may not contain this element, we need to ensure that our sheafification {\widetilde{\mathcal{F}}} does. The set {\{b_\alpha\}} is compatible in the obvious sense, so we can glue them together to form a section {\hat{b} \in \widetilde{\mathcal{F}}(U), \hat{b} : x \mapsto (b_{\alpha})_x}. This map is continuous because {\hat{b}^{-1}(\mathfrak{V}(a, W)) = \{x \in W \cap U : \hat{b}(x) = a_x\}}. is open. Indeed, if {a_x = b(x) = (b_\alpha)_x} for some {x}, then there is an open subsets {W' \subseteq W \cap U} such that {x\in W'} and {a|_{W'} = (b_\alpha)|_{W'}} in {\mathcal{F}(W')}, so {W' \subseteq \hat{b}^{-1}(\mathfrak{V}(a, W))}. It is also clear that {p \circ \hat{b} = \text{id}_U}.

To sum up our construction, we note the following: to construct {\widetilde{\mathcal{F}}} we find the most “efficient” sheaf such that

  1. sections that have the same germ at every point are equal
  2. compatible sets of sections have a common extension.

This gives the following alternative description of {\widetilde{\mathcal{F}}}:

\displaystyle \begin{array}{rcl} \widetilde{\mathcal{F}}(U) &=& \{ \{(b_{\alpha}, W_\alpha)\}_{\alpha} : U = \bigcup_{\alpha} W_\alpha, b_\alpha \in \mathcal{F}(W_{\alpha}), (b_\alpha)_x = (b_\beta)_x \text{ for all } x \in W_\alpha \cap W_\beta\} /\sim \end{array}

where {\{(b_\alpha, W_\alpha)\}_\alpha \sim \{(a_\beta, V_{\beta})\}_\beta}, if for all {\alpha, \beta} and {x \in W_\alpha \cap V_{\beta}}, {(b_\alpha)_x = (a_\beta)_x}. Note that condition {2} and the assumption that this extension is unique is just the sheaf axiom and, hence, it implies {1}.

We claim that this definition of {\widetilde{\mathcal{F}}} is equivalent to the definition given above. Indeed, we can view each collection {\{(b_\alpha, W_\alpha)\}_\alpha} as a section {\hat{b}} in the natural way: {\hat{b}(x) = (b_\alpha)_x} if {x \in W_\alpha}. This is independent of the representative of {\{(b_\alpha, W_\alpha)\}_\alpha} by {\sim}. Note that {\hat{b}(x)} is continuous, as we showed earlier. Further, any section {s : U \rightarrow \text{Tot}(\mathcal{F})} arises in this way. Indeed, let {x \in U}, then {s(x) \in \mathfrak{V}(b^x, W_x)} for a neighborhood {W_x} of {x} and an element {b^x \in \mathcal{F}(W_x)}. Thus, because {s} is continuous there exists {W_x' \subseteq s^{-1}( \mathfrak{V}(b^x, W_x)) = \{ y \in W \cap U : s(y) = (b^x)_y\}}. Consider the collection {b = \{(b^x, W_x')\}_{x\in U}}. Because {(b^x)_y = s(y) = (b^z)_y} for all {y \in W_x' \cap W_z'}, it follows that {b} corresponds to {s}. This shows that the correspondence is surjective. We claim that this correspondence is also injective. Indeed, if {\{(b_\alpha, W_\alpha)\}_\alpha} and {\{(a_\beta, V_\beta)\}_\beta} correspond to the same section {s}, then {(b_\alpha)_x = s(x) = (a_\beta)_x} for all {x \in W_\alpha \cap V_\beta}. Thus, {\{(b_\alpha, W_\alpha)\}_\alpha = \{(a_\beta, V_\beta)\}_\beta} Finally we note that this correspondence is actually a morphism of presheaves: Restricting a collection {\{(b_\alpha, W_\alpha)\}_\alpha \in \widetilde{\mathcal{F}}(U)} to a subset {U'\subseteq U}, yields the element {\{(b_\alpha|_{U'}, W_\alpha\cap U')\}_{\alpha}} and this corresponds to the restricted section {\hat{b}|_{U'}}.

This new definition has some utility. It helps to illuminate the following claims:

Claim 1 Let {i : \mathcal{F} \rightarrow \widetilde{\mathcal{F}}} be the natural map: {i(U)(b) = \{(b, U)\}}. Then {\mathcal{F}} is a sheaf if, and only if, {i} is an isomorphism.

Proof: If {\mathcal{F}} is a sheaf then each collection {\{(b_{\alpha}, W_\alpha)\}_{\alpha} \in \widetilde{\mathcal{F}}(U)} can be written uniquely as {\{(b, U)\}} for some {b \in \mathcal{F}(U)}. Conversely, suppose {i} is an isomorphism, let {U \subseteq X} be an open subset, let {U = \bigcup W_\alpha}, and let {\{b_\alpha : b_\alpha \in \mathcal{F}(W_\alpha)\}} be a collection such that {b_{\alpha}|_{W_\alpha \cap W_\beta} = b_{\beta}|_{W_\alpha \cap W_\beta}}. Then {(b_{\alpha})_x = (b_\beta)_x} for all {x \in W_\alpha \cap W_\beta}, so {\{(b_\alpha, W_\alpha)\}_\alpha \in \widetilde{\mathcal{F}}(U)}. Thus, there exists a unique element {b \in \mathcal{F}(U)} such that {i(U)(b)= \{(b_\alpha, W_\alpha)\}_\alpha}. Therefore, for each {\alpha}, {b_x = (b_\alpha)_x} for all {x \in W_\alpha}. In particular, for each fixed {\alpha}, {i(W_\alpha)(b|_{W_\alpha}) = \{(b|_{W_\alpha}, W_\alpha)\} = \{(b_\alpha, W_\alpha)\} = i(W_\alpha)(b_\alpha)}, whence, {b|_{W_\alpha} = b_\alpha}, as desired. \Box

Now, suppose that {\mathcal{G}} is a sheaf on {X}. Let {j : \mathcal{F} \rightarrow \mathcal{G}} be a morphism of presheaves.

Claim 2 There exists a unique map {\widetilde{j} : \widetilde{\mathcal{F}} \rightarrow \mathcal{G}}, such that {\widetilde{j} \circ i = j}.

Proof: Let {U \subseteq X} be an open subset and let {\{(b_\alpha, W_\alpha)\}_\alpha \in \widetilde{\mathcal{F}}(U)}. Observe that the commutative diagrams

\displaystyle \begin{array}{rcl} \begin{array}{ccc} \mathcal{F}(W_\alpha) & \stackrel{j(W_\alpha)}{\longrightarrow} & \mathcal{G}(W_\alpha) \\ \downarrow & \; & \downarrow \\ \mathcal{F}(W_\alpha\cap W_\beta) & \stackrel{j(W_\alpha\cap W_\beta)}{\longrightarrow} & \mathcal{G}(W_\alpha\cap W_\beta) \\ \downarrow & \; & \downarrow \\ \mathcal{F}_x & \longrightarrow & \mathcal{G}_x \end{array} \end{array}

ensure that {j(W_\alpha)(b_\alpha)_x = j(W_\beta)(b_\beta)_x} for all {x \in W_\alpha \cap W_\beta}. Thus, {j(W_\alpha)(b_\alpha)|_{W_\alpha \cap W_\beta} = j(W_\beta)(b_\beta)|_{W_\alpha \cap W_\beta}} and so by the sheaf axiom for {\mathcal{G}}, there is a unique extension {\widetilde{j}(U)(\{(b_\alpha, W_\alpha)\}_\alpha) \in \mathcal{G}(U)}. For convenience, we can view {\mathcal{G}} as {\widetilde{\mathcal{G}}}, with {\widetilde{j}(U)(\{(b_\alpha, W_\alpha)\}_\alpha) =\{( j(U)(b_\alpha), W_\alpha)\}_\alpha}. Now, if {U' \subseteq U} is an open set, then

\displaystyle \begin{array}{rcl} \widetilde{j}(U')( \{(b_\alpha, W_\alpha)\}_\alpha|_{U'}) &=& \widetilde{j}(U')(\{(b_\alpha|_{U'}, W_\alpha\cap U')\}_\alpha) \\ &=& \{(j(U')(b_\alpha|_{U'}), W_\alpha\cap U')\}_\alpha \\ &=& \{(j(U)(b_\alpha)|_{U'}, W_\alpha\cap U')\}_\alpha \\ &=& \{(j(U)(b_\alpha), W_\alpha )\}_\alpha|_{U'} \\ &=& \widetilde{j}(U) (\{(b_\alpha, W_\alpha)\})|_{U'}. \end{array}

Thus, {\widetilde{j}} is a morphism {\widetilde{j} : \widetilde{\mathcal{F}} \rightarrow \mathcal{G}}. From the definition of {\widetilde{j}}, it is clear that {\widetilde{j} \circ i = j} and {\widetilde{j}} is unique. \Box

A (Slightly) Different Proof of Burnside’s Theorem

A pdf of this post is available here.

1. Introduction

Burnside’s theorem, first proved in the early 20th century by William Burnside, shows that a group of order {p^aq^b}, where {p} and {q} are primes and {a, b \in \mathbf{N} \cup \{0\}}, is solvable. The original proof of Burnside’s theorem utilized representation theory in an essential way. A proof using purely group theoretical tools was given by D. Goldschmidt around 1970. The aim of this note is to provide yet another proof of Burnside’s theorem, via representation theory. First we establish a fundamental definition: A group {G}, with identity {1}, is solvable, if there exists a chain of subgroups

\displaystyle \begin{array}{rcl} G = G_0 \supseteq G_1 \supseteq \cdots \supseteq G_{k-1} \supseteq G_k = \{1\} \end{array}

such that {G_i} is normal in {G_{i-1}} and {G_i/G_{i-1}} is abelian for {i = 1, \cdots, k}. Such a chain is called an abelian tower. Thus, for instance, all abelian subgroups are solvable. We also get the easy result: If {P} is a {p}-group, then {P} is solvable and has a non-trivial center. Proof: Note that {P} acts on itself by conjugation, so by the class equation:

\displaystyle \begin{array}{rcl} |P| &=& |Z(P)| + \sum (P : \text{Stab}(x)). \end{array}

where the sum is over distinct conjugacy classes, and {Z(P)} is the center of {P}. Because each index {(P:\text{Stab}(x))} is divisible {p} and {|Z(P)| \geq 1}, it follows that {|Z(P)| >1}. Now, note that a group of order {p} is abelian, so it is solvable. Thus, by induction, {P/Z(P)} is solvable. We can lift any abelian tower in {P/Z(P)} to get an abelian tower in {P}, as desired. \Box

We fix some notation. If {\rho : G \rightarrow \text{GL}(V)} is a complex representation of {G}, then we omit the homomorphism {\rho} and simply write {V} instead. We denote the character of the representation {V} by {\chi_V}. If {g \in G}, then {\text{Cent}_G(g)}, is the centralizer of {g} in {G} and {C(g)} is the conjugacy class of {g} in {G}. Note that {(G : \text{Cent}_G(g)) = |C(g)|}, by the orbit-stabilizer formula. We write {\text{Stab}(g)} for the stabilizer of {g} in {G}.

2. The Proof of Burnside’s Theorem

The proof will proceed by a series of claims. We assume that {G} is non-abelian and {a > 0}, otherwise {G} is solvable by section one. We wish to derive a contradiction.

Claim 1 We can, without loss of generality, assume that {G} is simple.

Proof: If {N} is a normal subgroup of {G}, then {N} and {G/N} are solvable, by induction, therefore, {G} is solvable. \Box

Assuming this, we note that {Z(G) = 1} because {Z(G)} is normal, and {G} is non-abelian. This also implies that every non-trivial representation of {G} is faithful.

Claim 2 Let {V} be an irreducible representation of {G}. Then, for all {g \in G}

\displaystyle \begin{array}{rcl} \frac{|C(g)|\chi_V(g)}{\chi_V(1)} \end{array}

is an algebraic integer.

Proof: Let {C(g_1), \cdots C(g_k)} be the conjugacy classes of {G}. Then

\displaystyle \begin{array}{rcl} \varphi_{g_i} &=& \sum_{g \in C(g_i)} g \end{array}

is in the center of {\mathbf{Z}[G]} because it is invariant under conjugation. Further, the center of {\mathbf{Z}[G]} is generated by these elements. Indeed, suppose { x= \sum_{g\in G} a(g)g} is in the center of {G}. Write {x = \sum_{i=1}^k \sum_{g \in C(g_i)} a(g)g}. Suppose that {h,h' \in C(g_i)}, and choose {y} such that {h' = yh'y^{-1}}. Then conjugation by {y} fixes {\sum_{g \in C(g_i)} a(g)g}, so {a(h) = a(h')}. It follows that {a(g)} is a class function and so {\sum_{g \in C(g_i)} a(g)g = a(C(g_i)) \varphi_{g_i}}. We have shown that the center of {\mathbf{Z}[G]} is finitely generated over {G}, and hence it is integral over {\mathbf{Z}}. Note that {\varphi_{g} : V \rightarrow V} is a {G} invariant map. Thus, by Schur’s lemma there exists {\lambda \in \mathbf{C}} such that {\varphi_{g} = \lambda\text{id}_V}. Because {\varphi_{g}} is integral over {G}, and {\lambda} satisfies the same polynomial over {\mathbf{Z}} that {\varphi_{g}} does, it follows that {\lambda} is integral over {\mathbf{Z}}. Thus, {\lambda \text{dim} V = \text{Tr}(\varphi_{g}) = |C(g)|\chi_V(g)} and so {\lambda = \frac{|C(g)|\chi_{V}(g)}{\chi_V(1)}} is integral over {\mathbf{Z}}. \Box

Claim 3 Let {V} be an irreducible representation of {G}. Then {\chi_V(1) \mid |G|}.

Proof: We maintain the notation from the previous claim. Note that by the second orthogonality relations:

\displaystyle \begin{array}{rcl} \sum_{i=1}^k |C(g_i)|\overline{\chi_{V}(C(g_i))}\chi_V(C(g_i)) &=& |G| \end{array}

If we divide both sides by {\text{dim} V}, we get a sum of algebraic integers on the left, and a rational number on the right. Thus, {\text{dim} V \mid |G|}. \Box

Claim 4 G has no non-trivial 1-dimensional representations.

Proof: If {V} is a non-trivial one 1-dimensional representation, then {G \hookrightarrow \mathbf{C}^\ast}, so {G} is abelian. This is a contradiction. \Box

Claim 5 For every {g \in G}, {g \neq 1}, there exists an irreducible representation {V} such that {\chi_{V}(g) \neq 0} and {\text{dim} V = p^r} for some {r > 0}.

Proof: Let {V_1, \cdots, V_k} be the irreducible representations of {G}, where {V_1} is the trivial representation. By claim 2.3, {\text{dim}(V_i) \mid |G|} for all {i}, so {\chi_{V_i}(1) = p^{c}q^d} for some {c \leq a, b \leq d}. If {r_G} is the character of the regular representation, then {r_G(g) = \sum_{i} \chi_{V_i}(1) \chi_{V_i}(g)} is {0} if {g \neq 1}, and {|G|} otherwise. Now, suppose {q} divides {\chi_{V_i}(1)} for {i = 2, \cdots, n}. Then

\displaystyle \begin{array}{rcl} \sum_{i=2}^k \chi_{V_i}(1) \chi_{V_i}(g) &=& -1 \end{array}

because {V_1} is the only 1-dimensional representation of {G}. This is a contradiction: if we divide both sides by {q}, we get an algebraic integer on the left, and a rational number on the right. In particular, it follows that there is an {i} such that {\chi_{V_i}(1) = p^r} for some { r > 0} and {\chi_{V_i}(g) \neq 0}. \Box

The following claim holds for any finite group.

Claim 6 If there exists {g \in G} and a representation {V} such that {|\chi_V(g)| = \chi_V(1)}, then {g \in Z(G)}.

Proof: Because {g} is unitary, it is diagonalizable: {g = \text{diag}(\lambda_1, \cdots, \lambda_{\chi_V(1)})}. Note that {|\lambda_1 + \lambda_2| \leq |\lambda_1| + |\lambda_2|} with equality if, and only if, {\lambda_1 = \lambda_2}. Thus, because

\displaystyle \begin{array}{rcl} \chi_V(1) = \left|\sum_{i=1}^{\chi_V(1)}\lambda_i \right| \leq \sum_{i=1}^{\chi_{V}(1)} |\lambda_i|= \chi_{V}(1), \end{array}

it follows by induction that {\lambda_i = \lambda_1} for all {i}. Thus, {g = \lambda_{1}\text{id}}, and so {g} is in the center of the image of {G}, but {G} is simple so it is isomorphic to its image and hence {g \in Z(G) }. \Box

By the Sylow theorems, {G} has a {p}-Sylow subgroup {P}. By proposition 1.2, {Z(P)} is non-trivial. Thus, we can choose {g \in Z(P)}. By taking an appropriate power of {g}, we can assume that {\text{ord} (g) = p}. Note that {P \subseteq \text{Cent}_G(g)}, so {(G : \text{Cent}_G(g)) = |C(g)|} is not divisible by {p}, i.e. {|C(g)| = q^k} for some {k \geq 0}. Let {V} be a representation such that {\text{dim} V = p^r} and {\chi_{V}(g) \neq 0}. Note that the the map

\displaystyle \begin{array}{rcl} G \hookrightarrow \text{GL}(V) \stackrel{\det}{\rightarrow} \mathbf{C}^\ast \end{array}

is a 1-dimensional representation of {G}, thus, it is trivial, so {\det(h) = 1}, for all {h \in G}. Let {\lambda_1, \cdots, \lambda_{p^r}} be the eigenvalues of {g} listed with multiplicity. Note that {\lambda_i} is a {p}-th root of unity for all {i}, because {g^{p} = 1} but {g^i \neq 1} for all { 1\leq i < p}. Let {\lambda_1, \cdots, \lambda_{j_0} = 1} and {\lambda_{j_0+1}, \cdots, \lambda_{p^r} \neq 1}. By the formula

\displaystyle \begin{array}{rcl} \prod_{i=j_0}^{p^r} \lambda_i = \det (g) = 1, \end{array}

each conjugate of {\lambda_i}, {i \geq j_0} appear with the constant multiplicity because the product is stable under the action of {\text{Gal}(\overline{\mathbf{Q}}/ \mathbf{Q})}, in other words, if {\lambda_i} has multiplicity {l}, then each {\lambda_{i'}} has multiplicity {l} for {i' \geq j_0}. Therefore, because each {\lambda_i}, {i \geq j_0} has {p-1} conjugates, we have {l(p-1) = p^r - j_0}. In particular, {l \equiv j_0 \mod p} Recall that for any primitive {p}-th root of unity {\zeta_{p}}, {\sum_{i=1}^{p-1} \zeta_{p}^i = -1}, thus,

\displaystyle \begin{array}{rcl} \chi_V(g) &=& \sum_{i=1}^{p^r} \lambda_i \\ &=& \sum_{i=1}^{j_0} \lambda_i + \sum_{i=j_0+1}^{p^r} \lambda_{i} \\ &=& j_0 - l \\ &\equiv& 0 \mod p. \end{array}

By claim 2.2, it follows that {\frac{|C(g)| \chi_V(g)}{\chi_V(1)}} is an algebraic integer, and hence an integer. However, {|C(g)| = q^k}, is relatively prime to {p}, so {\frac{\chi_{V}(g)}{\chi_V(1)} \in \mathbf{Z}}, and {\chi_V(g) = p^rb} for some {b \in \mathbf{Z}}, because {\chi_V(g) \neq 0}. Now, because

\displaystyle \begin{array}{rcl} |\chi_V(g)| \leq \sum_{i=1}^{p^r} |\lambda_i| = \chi_V(1) \end{array}

it follows that {b = \pm 1} and {|\chi_V(g)| = \chi_V(1)}. By claim 2.6, it follows that {g \in Z(G) = \{1\}}, which is a contradiction. Thus, {G} is solvable.

3. Consequences

There are several key steps in this argument. The most important assumptions are that {G} is simple, and for any {g \in G}, there is a prime power dimensional representation {V} such that {\chi_V(g) \neq 0}. Retracing the steps of the proof yields the following consequence:

Corollary 1 Suppose {G} is a simple group and {V} is an irreducible representation of {G} of dimension {p^r}, for some { r > 0}. Let {P} be a {p}-Sylow subgroup of {G}. If {g \in Z(P)} of order {p}, and {\chi_V(g) \neq 0}, then {G} is abelian.