Home page for accesible maths 7 Polynomials

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

7.1 Polynomial arithmetic

Our first definition will be familiar to most, although our notation and terminology may be a little different.

Definition 7.1.1

A (real) polynomial in the indeterminate XX is an expression of the form

a0+a1X+a2X2++anXn,a_{0}+a_{1}X+a_{2}X^{2}+\cdots+a_{n}X^{n},

where n0n\in\mathbb{N}_{0} and a0,a1,a2,,ana_{0},a_{1},a_{2},\dots,a_{n}\in\mathbb{R}. The numbers a0,a1,a2,,ana_{0},a_{1},a_{2},\dots,a_{n} are called the coefficients of the polynomial.

We shall often write f(X)f(X), g(X)g(X), etc., to represent polynomials.

Example 7.1.2

f(X)=1+2X+3X2f(X)=1+2X+3X^{2} and g(X)=32-7Xg(X)=\frac{3}{2}-7X are both polynomials.

Remark 7.1.3
  1. (i)

    In a polynomial the terms may be written in any order; for instance, in Example 7.1.2 we could also write

    f(X)=2X+3X2+1,  or  f(X)=3X2+2X+1.f(X)=2X+3X^{2}+1,\qquad\text{or}\qquad f(X)=3X^{2}+2X+1.
  2. (ii)

    Any term with coefficient zero can be omitted, so that 2+0X+3X22+0X+3X^{2} is the same polynomial as 2+3X22+3X^{2}. The zero polynomial is simply written 00; by removing or inserting zero terms if necessary, any non-zero polynomial can be written in the form a0+a1X+a2X2++anXna_{0}+a_{1}X+a_{2}X^{2}+\cdots+a_{n}X^{n} with an0a_{n}\neq 0.

Definition 7.1.4

If f(X)=a0+a1X+a2X2++anXnf(X)=a_{0}+a_{1}X+a_{2}X^{2}+\cdots+a_{n}X^{n} is a non-zero polynomial with an0a_{n}\neq 0, then nn is the degree of f(X)f(X), written degf(X)\deg f(X), and anXna_{n}X^{n} is its highest (or leading) term. The zero polynomial has no degree and no highest term.

Example 7.1.5

The polynomial 1-5X+2X31-5X+2X^{3} has degree 33 and highest term 2X3.2X^{3}.

Note that polynomials of degree 00 are just non-zero real numbers, such as -6-6 or 73\frac{7}{3} (called constants, or scalars). Polynomials of degree 11, such as X+3X+3 or 4X-74X-7, are called linear polynomials (because the graph of the function f(x)=ax+bf(x)=ax+b is a straight line); those of degree 22, such as X2-3X+7X^{2}-3X+7, are quadratic; those of degree 33, such as 2X3-3X+52X^{3}-3X+5, are cubic.

Remark 7.1.6

We can now state precisely what it means for two polynomials f(X)f(X) and g(X)g(X) to be equal. Indeed, f(X)=g(X)f(X)=g(X) if and only if either f(X)f(X) and g(X)g(X) are both the zero polynomial, or they have the same degree, say n0n\in\mathbb{N}_{0}, and when they are written in the form

f(X)=a0+a1X+a2X2++anXn  and  g(X)=b0+b1X+b2X2++bnXn,f(X)=a_{0}+a_{1}X+a_{2}X^{2}+\cdots+a_{n}X^{n}\quad\ \text{and}\quad\ g(X)=b_{% 0}+b_{1}X+b_{2}X^{2}+\cdots+b_{n}X^{n},

we have aj=bja_{j}=b_{j} for each j{0,1,,n}.j\in\{0,1,\ldots,n\}.

Polynomials have an arithmetic which is rather similar to that of the integers. We can always add or subtract two polynomials to obtain another, by adding or subtracting ‘‘like terms’’ (in which the power of XX is the same). We can also multiply any two polynomials to obtain another; this is more complicated, but just involves multiplying ‘‘term by term’’ according to the rule ajXjbkXk=ajbkXj+ka_{j}X^{j}\cdot b_{k}X^{k}=a_{j}b_{k}X^{j+k} for j,k0j,k\in\mathbb{N}_{0}, and then gathering together like terms.

Example 7.1.7

Let f(X)=2+3Xf(X)=2+3X and g(X)=1+4X-6X2g(X)=1+4X-6X^{2}. Then

f(X)+g(X)\displaystyle f(X)+g(X) =3+7X-6X2,f(X)-g(X)=1-X+6X2,\displaystyle=3+7X-6X^{2},\qquad\qquad f(X)-g(X)=1-X+6X^{2},
f(X)g(X)\displaystyle f(X)g(X) =(2+3X)(1+4X-6X2)\displaystyle=(2+3X)(1+4X-6X^{2})
=2+8X-12X2+3X+12X2-18X3=2+11X-18X3.\displaystyle=2+8X-12X^{2}+3X+12X^{2}-18X^{3}=2+11X-18X^{3}.

Note that if f(X)f(X) and g(X)g(X) are non-zero polynomials with highest terms amXma_{m}X^{m} and bnXnb_{n}X^{n}, respectively, then the highest term of the product f(X)g(X)f(X)g(X) is ambnXm+n.a_{m}b_{n}X^{m+n}. Thus the degree of the product f(X)g(X)f(X)g(X) is the sum of the degrees of f(X)f(X) and g(X)g(X):

deg(f(X)g(X))=degf(X)+degg(X).\deg\bigl(f(X)g(X)\bigr)=\deg f(X)+\deg g(X). (7.1.1)

This easy observation has an important consequence. We saw in the example above that when multiplying two polynomials it may sometimes happen that certain terms ‘‘cancel out’’ (as -12X2-12X^{2} and 12X212X^{2}). However, it is not possible for all terms to do so, because the highest term cannot. Thus the product of two non-zero polynomials is non-zero; or to put it another way,

iff(X)g(X)=0, thenf(X)=0org(X)=0.\hbox{if}\quad f(X)g(X)=0,\quad\hbox{then}\quad f(X)=0\quad\hbox{or}\quad g(X)% =0. (7.1.2)

As in the case of \mathbb{Z}, we refer to this fact by saying that the polynomials have no zero divisors.

So far everything has been fairly straightforward; just as with the integers, however, it is when we come to division that things get difficult. If g(X)g(X) divides f(X)f(X), meaning that f(X)=q(X)g(X)f(X)=q(X)g(X) for some q(X)q(X), we shall write g(X)|f(X)g(X)|f(X); but our observation above on highest terms in products makes it easy to see that this does not always happen. For example, let f(X)=X+1f(X)=X+1 and g(X)=X2+2g(X)=X^{2}+2. If f(X)=q(X)g(X)f(X)=q(X)g(X) for some q(X)q(X), then q(X)q(X) would have to be non-zero (as f(X)f(X) is), so that the degree of the right-hand side would be at least 22 (the degree of g(X)g(X)), while the degree of the left-hand side is 11 — a contradiction.

Essentially the problem here is that we are trying to divide something ‘‘small’’ by something ‘‘large’’; X2+2X^{2}+2 is ‘‘too big’’ to divide X+1X+1. We met a similar situation when we first came across division of integers: 88 is ‘‘too big’’ to divide 33, for example. In that setting we found that there are two possible ways forward. One is to accept that, in dividing by a number dd, we cannot divide anything smaller than dd; thus we look for an answer of the form ‘‘quotient with remainder’’ (as we did in Chapter 4). The other is to enlarge our number system by introducing fractions (as we did when constructing \mathbb{Q} from \mathbb{Z} in Section 6.4), which then allows us to divide any number by any other, provided of course that the second is non-zero.

Each of these two approaches has its counterpart in the world of polynomials. Here we shall take the first. Thus, given two polynomials f(X)f(X) and g(X)g(X) (with g(X)0g(X)\neq 0), we seek to divide f(X)f(X) by g(X)g(X) to obtain a quotient q(X)q(X) and a remainder r(X)r(X) satisfying

f(X)=q(X)g(X)+r(X).f(X)=q(X)g(X)+r(X).

The aim is to do this in such a way that r(X)r(X) is ‘‘smaller’’ than g(X)g(X); we use the degree of a polynomial as a measure of its size, so we want the degree of r(X)r(X) to be less than that of g(X)g(X) (or r(X)=0r(X)=0). It is not immediately obvious that we can always do this, but it is in fact possible. Briefly, the trick is to do the division step by step, reducing the degree of the polynomial being divided at each stage. We call this method of division with remainder the division algorithm for polynomials. Before we formally state and prove this theorem, let us consider an example.

Example 7.1.8

Use the division algorithm to divide 6X3+7X2+4X-16X^{3}+7X^{2}+4X-1 by 3X+23X+2.

Solution.

2X2+X+233X+26X3+7X2+4X-16X3+4X23X2+4X3X2+2X2X-12X+43-73;\begin{array}[]{r|rrrr}\omit&&2X^{2}&\!\!\!\!\!{}+\phantom{4}X&\!\!\!\!\!{}+% \frac{2}{3}\\ \cline{2-5}3X+2&6X^{3}&\!\!\!\!\!{}+7X^{2}&\!\!\!\!\!{}+4X&\!\!\!\!\!{}-1\\ \omit&6X^{3}&\!\!\!\!\!{}+4X^{2}&&\\ \cline{2-3}\omit&&\!\!\!\!\!3X^{2}&\!\!\!\!\!{}+4X&\\ \omit&&\!\!\!\!\!3X^{2}&\!\!\!\!\!{}+2X&\\ \cline{3-4}\omit&&&\!\!\!\!\!2X&\!\!\!\!\!{}-1\\ \omit&&&\!\!\!\!\!2X&\!\!\!\!\!{}+\frac{4}{3}\\ \cline{4-5}\omit&&&&\!\!\!\!\!{}-\frac{7}{3}{\text{\makebox[0.0pt][l]{$;$}}}\\ \end{array}

thus 6X3+7X2+4X-1=(2X2+X+23)(3X+2)+(-73).6X^{3}+7X^{2}+4X-1=(2X^{2}+X+\frac{2}{3})(3X+2)+\bigl(-\frac{7}{3}\bigr).

The following lemma formalizes the ‘‘reduction-of-degree’’ approach that the division algorithm relies on.

Lemma 7.1.9

Let f(X)f(X) and g(X)g(X) be non-zero polynomials with degf(X)deggX),\deg f(X)\geqslant\deg gX), say

f(X)=a0+a1X+a2X2++amXm𝑎𝑛𝑑g(X)=b0+b1X+b2X2++bnXn,f(X)=a_{0}+a_{1}X+a_{2}X^{2}+\cdots+a_{m}X^{m}\quad\text{and}\quad g(X)=b_{0}+% b_{1}X+b_{2}X^{2}+\cdots+b_{n}X^{n},

where am0a_{m}\neq 0 and bn0b_{n}\neq 0. Define h(X)=ambnXm-nh(X)=\frac{a_{m}}{b_{n}}X^{m-n}. Then either f(X)=h(X)g(X)f(X)=h(X)g(X) or

deg(f(X)-h(X)g(X))<degf(X).\deg\big(f(X)-h(X)g(X)\bigr)<\deg f(X).

Proof. We have

h(X)g(X)\displaystyle h(X)g(X) =ambnXm-n(bnXn+bn-1Xn-1++b1X+b0)\displaystyle=\frac{a_{m}}{b_{n}}X^{m-n}(b_{n}X^{n}+b_{n-1}X^{n-1}+\cdots+b_{1% }X+b_{0})
=amXm+ambn-1bnXm-1++amb1bnXm-n+1+amb0bnXm-n,\displaystyle=a_{m}X^{m}+\frac{a_{m}b_{n-1}}{b_{n}}X^{m-1}+\cdots+\frac{a_{m}b% _{1}}{b_{n}}X^{m-n+1}+\frac{a_{m}b_{0}}{b_{n}}X^{m-n},

so that

f(X)-h(X)g(X)\displaystyle f(X)-h(X)g(X) =(amXm+am-1Xm-1++a1X+a0)\displaystyle=(a_{m}X^{m}+a_{m-1}X^{m-1}+\cdots+a_{1}X+a_{0})
-(amXm+ambn-1bnXm-1++amb1bnXm-n+1+amb0bnXm-n)\displaystyle\quad\ \mbox{}-\Bigl(a_{m}X^{m}+\frac{a_{m}b_{n-1}}{b_{n}}X^{m-1}% +\cdots+\frac{a_{m}b_{1}}{b_{n}}X^{m-n+1}+\frac{a_{m}b_{0}}{b_{n}}X^{m-n}\Bigr)
=(am-1-ambn-1bn)Xm-1+.\displaystyle=\Bigl(a_{m-1}-\frac{a_{m}b_{n-1}}{b_{n}}\Bigr)X^{m-1}+\cdots.

This shows that either f(X)-h(X)g(X)=0f(X)-h(X)g(X)=0 (if all the coefficients on the right-hand side vanish), or this polynomial has degree at most m-1.m-1. \Box

Theorem 7.1.10

(Polynomial division with remainder.) Let f(X)f(X) and g(X)g(X) be polynomials, and suppose that g(X)0g(X)\neq 0. Then there are unique polynomials q(X)q(X) and r(X)r(X) such that f(X)=q(X)g(X)+r(X)f(X)=q(X)g(X)+r(X) and either r(X)=0r(X)=0 or degr(X)<degg(X)\deg r(X)<\deg g(X).

Terminology: the polynomial q(X)q(X) is called the quotient, while r(X)r(X) is the remainder.

Proof. We begin with the existence part. In the special case where g(X)g(X) divides f(X)f(X), we have f(X)=q(X)g(X)f(X)=q(X)g(X) for some polynomial q(X)q(X), and we may simply take r(X)=0r(X)=0. Note: this case covers f(X)=0f(X)=0 (because all polynomials divide 00), and also the case where g(X)g(X) has degree 00 (because if g(X)=cg(X)=c for some c{0}c\in\mathbb{R}\setminus\{0\}, then we have f(X)=(c-1f(X))g(X)f(X)=(c^{-1}f(X))g(X)). Hence we may suppose that f(X)0f(X)\neq 0, so that m:=degf(X)0m:=\deg f(X)\in\mathbb{N}_{0} is defined, and that n:=degg(X)1n:=\deg g(X)\geqslant 1. In this case we establish the existence of q(X)q(X) and r(X)r(X) by induction on mm, using the Generalized (or Second) Principle of Induction, where the inductive step consists of showing that if the result holds for each k<mk<m (not just for k=m-1k=m-1), then it also holds for mm.

The base step (m=0m=0) is easy. Since m=0<nm=0<n, q(X)=0q(X)=0 and r(X)=f(X)r(X)=f(X) have the required properties.

Now let m1m\geqslant 1 and assume inductively that, for each non-zero polynomial f0(X)f_{0}(X) of degree less than mm, there are polynomials q0(X)q_{0}(X) and r0(X)r_{0}(X) such that

f0(X)=q0(X)g(X)+r0(X)and eitherr0(X)=0ordegr0(X)<n.f_{0}(X)=q_{0}(X)g(X)+r_{0}(X)\quad\text{and either}\quad r_{0}(X)=0\quad\text% {or}\quad\deg r_{0}(X)<n. (7.1.3)

If m<nm<n, then we can take q(X)=0q(X)=0 and r(X)=f(X)r(X)=f(X). Otherwise we choose h(X)h(X) as in Lemma 7.1.9 and define f0(X)=f(X)-h(X)g(X)f_{0}(X)=f(X)-h(X)g(X), so that either f0(X)=0f_{0}(X)=0 or degf0(X)<m\deg f_{0}(X)<m. In the first case, we can take q(X)=h(X)q(X)=h(X) and r(X)=0r(X)=0. In the second case, the induction hypothesis implies that there are polynomials q0(X)q_{0}(X) and r0(X)r_{0}(X) satisfying (7.1.3), and we have

f(X)=f0(X)+h(X)g(X)\displaystyle f(X)=f_{0}(X)+h(X)g(X) =q0(X)g(X)+r0(X)+h(X)g(X)\displaystyle=q_{0}(X)g(X)+r_{0}(X)+h(X)g(X)
=(q0(X)+h(X))g(X)+r0(X).\displaystyle=\bigl(q_{0}(X)+h(X)\bigr)g(X)+r_{0}(X).

Hence the identity f(X)=q(X)g(X)+r(X)f(X)=q(X)g(X)+r(X) holds if we define q(X)=q0(X)+h(X)q(X)=q_{0}(X)+h(X) and r(X)=r0(X)r(X)=r_{0}(X), and the second half of (7.1.3) ensures that this choice of r(X)r(X) has the required properties. Thus the induction continues. Note that this inductive argument is essentially just a formal version of the division algorithm that we used in Example 7.1.8.

To prove that the polynomials q(X)q(X) and r(X)r(X) are unique, suppose that q1(X)q_{1}(X), q2(X)q_{2}(X), r1(X)r_{1}(X) and r2(X)r_{2}(X) are polynomials such that f(X)=qj(X)g(X)+rj(X)f(X)=q_{j}(X)g(X)+r_{j}(X) and either we have rj(X)=0r_{j}(X)=0 or degrj(X)<degg(X)\deg r_{j}(X)<\deg g(X) for j=1,2j=1,2. Our aim is to show that q1(X)=q2(X)q_{1}(X)=q_{2}(X) and r1(X)=r2(X)r_{1}(X)=r_{2}(X). Rearranging the defining equations, we see that

rj(X)=f(X)-qj(X)g(X)  for  j=1,2,r_{j}(X)=f(X)-q_{j}(X)g(X)\qquad\text{for}\qquad j=1,2,

which implies that

r1(X)-r2(X)\displaystyle r_{1}(X)-r_{2}(X) =(f(X)-q1(X)g(X))-(f(X)-q2(X)g(X))\displaystyle=\bigl(f(X)-q_{1}(X)g(X)\bigr)-\bigl(f(X)-q_{2}(X)g(X)\bigr)
=f(X)-q1(X)g(X)-f(X)+q2(X)g(X)\displaystyle=f(X)-q_{1}(X)g(X)-f(X)+q_{2}(X)g(X)
=q2(X)g(X)-q1(X)g(X)=(q2(X)-q1(X))g(X).\displaystyle=q_{2}(X)g(X)-q_{1}(X)g(X)=\bigl(q_{2}(X)-q_{1}(X)\bigr)g(X). (7.1.4)

Assume towards a contradiction that r1(X)r2(X)r_{1}(X)\neq r_{2}(X), so that the left-hand side of (7.1) is non-zero. On the one hand, this implies that

deg(r1(X)-r2(X))=deg(q1(X)-q2(X))+degg(X)degg(X)\deg\bigl(r_{1}(X)-r_{2}(X)\bigr)=\deg\bigl(q_{1}(X)-q_{2}(X)\bigr)+\deg g(X)% \geqslant\deg g(X)

by (7.1.1). On the other hand, deg(r1(X)-r2(X))<degg(X)\deg\bigl(r_{1}(X)-r_{2}(X)\bigr)<\deg g(X) because r1(X)r_{1}(X) and r2(X)r_{2}(X) are either 00 or of smaller degree than g(X)g(X). This is clearly impossible, so we must have r1(X)=r2(X)r_{1}(X)=r_{2}(X). Combining this with (7.1), we obtain (q2(X)-q1(X))g(X)=0\bigl(q_{2}(X)-q_{1}(X)\bigr)g(X)=0. Since g(X)0g(X)\neq 0, we deduce that q1(X)=q2(X)q_{1}(X)=q_{2}(X) by (7.1.2). \Box

We conclude this section with two remarks about polynomials in general.

Remark 7.1.11
  1. (i)

    One can consider polynomials over number systems other than \mathbb{R}, such as \mathbb{Z} (called ‘‘integral polynomials’’), \mathbb{Q} (‘‘rational polynomials’’) or \mathbb{C} (‘‘complex polynomials’’), or even m\mathbb{Z}_{m} (called ‘‘polynomials over m\mathbb{Z}_{m}’’) for m{2,3,4,}m\in\{2,3,4,\ldots\}.

  2. (ii)

    We do not think of polynomials as functions (that is, ‘‘machines’’ that map one real number to another). This is why we refer to XX as an ‘‘indeterminate’’, not a ‘‘variable’’. To emphasize this difference, we use a capital XX rather than a small xx.

    To exemplify the difference between polynomials and functions, consider the polynomials f(X)=Xf(X)=X and g(X)=X2g(X)=X^{2} over 2\mathbb{Z}_{2}. They are, by definition, distinct polynomials; however, we have

    f(0^)=0^=g(0^)  and  f(1^)=1^=g(1^)f(\widehat{0})=\widehat{0}=g(\widehat{0})\qquad\hbox{and}\qquad f(\widehat{1})% =\widehat{1}=g(\widehat{1})

    so if we regarded ff and gg as functions from 2\mathbb{Z}_{2} to 2\mathbb{Z}_{2}, they would actually be equal!