# Full text of "The enhancement of data by data compression using polynomial fitting"

## See other formats

NATIONAL AERONAUTICS AND SPACE ADMINISTRATION Technical Report 32-1078 The Enhancement of Data by Data Compression Using Polynomial Fitting William Kizner M £. "7 "' 1 ■? ^ m not - Jt ;^^ii * ^=^nrss"sr?;i^r^5:^o NUMBER) JET PROPULSION LABORATORY CALIFORNIA INSTITUTE OF TECHNOLOGY PASADENA, CALIFORNIA October 1, 1967 NATIONAL AERONAUTICS AND SPACE ADMINISTRATION Technical Report 32-1078 The Enhancement of Data by Data Compression Using Polynomiai Fitting William Kizner Approved by: W. G. Melbourne^ Manager Systems Analysis Research Section JET PROPULSION LABORATORY CALIFORNIA INSTITUTE OF TECHNOLOGY PASADENA, CALIFORNIA October 1 , 1 967 TECHNICAL REPORT 32-1078 Copyright © 1967 Jet Propulsion Laboratory California Institute of Technology Prepared Under Contract No. NAS 7-100 National Aeronautics & Space Administration Contents I. Introduction 1 II. Approximation of Functions by Polynomials 1 III. The Convergence of Chebyshev Series 3 IV. The Use of Convergence Properties in Estimation 7 V. An Example 10 VI. General Conclusion 11 Appendix. The Evaluation of Chevyshev Series 12 Tables 1. Chevyshev polynomial expansions for e* and e"* on interval [0,1] giving coefficients a„ of 2 ct„Tn [f], t = 2x — 1 3 2. Chevyshev polynomial expansion for In (1 + x), 0^ x :^ 1, giving coefficients a„ of 2 °nT» (t), t = 2x — 1 6 Figure 1 . Expansion of In (1 + x) = 2 On^n (t), — x<1,t = 2x — 1 6 JPL TECHNICAL REPORT 32-1078 Abstract Optimal data compression by polynomial fitting is discussed from a rigorous theoretical point of view. By means of information gained from smoothness proper- ties of the functions based on the constructive theory of functions, it is shovi'n how^ estimates can be improved over the usual minimum-variance unbiased estimates. A conclusion is that measurements should be made as often as possible. The theory developed here extends beyond the applications given to polynomial fitting. Proof is given for various theorems concerning biased estimators. One theorem, given on the result of using an approximation to the covariance matrix of the errors of the observations, is believed to be nevi^. A similar technique might be used in estimating any set of parameters whose magnitude decreased to zero in a reasonable way. JPL TECHNICAL REPORT 32-1078 The Enhancement of Data by Data Connpression Using Polynomial Fitting I. Introduction One method of data compression used in orbit deter- mination consists of fitting many observations of a given data type in a fixed time interval by a polynomial involv- ing fewer parameters. Then instead of dealing with the many observations, the parameters of the polynomial are used. It is commonly assumed that such data compression must necessarily degrade the orbit determination. Here it is shown that the usual methods of fitting polynomials, which have a bad theoretical foundation, may lead to decreased accuracies, but optimal methods should signifi- cantly increase the accuracies obtained by the usual un- biased minimum-variance formulas. The first problem with which we will concern ourselves is that of polynomial fitting for the general case when the function to be fitted is not an exact polynomial of finite degree. Using the results of approximation theory, we show how one can take advantage of the smoothness of the function to be approximated in an optimal way by obtaining, in a sense, a priori values for the polynomial coefficients. The result is an estimate of the orbital parameters that may be slightly biased, but has better accuracy than that obtained from the minimum-variance unbiased estimate without data compression. Also one need not choose a unique degree for the approximating polynomial. With the method used in this Report, the same result obtains if the degree is chosen large enough. This means that one does not have to apply the usual F test, which usually cannot be theoretically justified. Various theorems are proved about biased estimators. One theorem, which the author believes to be new, con- cerns bounds for errors introduced by using a wrong noise method. II. Approximation of Functions by Polynomials Engineers know that any "smooth" function j{x), a^x^h, can be represented by a polynomial with arbi- trary accuracy in a given interval, so they often use the following simplistic method for obtaining the desired approximation. They choose an integer n, calculate a step size Ax = (fo — a)/n, and evaluate f{x) at x = a, a + Ax, a +2 Ax, ■ ■■ a + nAx = h. A polynomial of degree n, JPt TECHNICAL REPORT 32-1078 p„ (x), is fitted to these points. If n is chosen high enough, then the actual result is supposed to be a polynomial Pm (x) of degree m with m <in. This is not the case, how- ever. Such a procedure may "blow up" [p„ (x) may not converge to / (x) in any sense for increasing n] even for functions f(x) that are infinitely differentiable. If the reader cares to verify this he can try equally spaced inter- polation on f{x) = 1/(1 + X-) for the interval [ — 5,5]. No matter how accurately he will do the interpolation, the error in the representation maxi I /(x) — p„(x)| [ — 5i^x=^5] will increase beyond all bounds as n becomes large. An introductory account of the constructive theory of functions, which deals with problems of approximation by polynomials, is given by Todd (Ref. 1). A more complete account, which covers the material we will use, is given by Lorentz (Ref. 2). Some classical results that we will use are: 1. Let H„ be the set of all polynomials whose degree does not exceed n, and p„ (x) a member of H„. For con- tinuous / (x), a — x — b, define En = inf max | / (x) - p„ (x) | Then there exists a unique member of H„, "pn (x) such that En — maxj. I / (x) — pn (x) | ; p'„ (x) is called the polynomial of best approximation (in the sense of the uniform norm). 2. Um £„ = for all continuous functions. 3. The manner in which E„ approaches zero depends on the differentiability of f{x). The set of real functions having k continuous derivatives on [a, b] is denoted by C*. The greater the value of k, the faster £„ will approach zero with increasing n. In fact, one can deduce the value of k from the behavior of E„ as a function of n. 4. It does not make sense (in a practical sort of way) to try to accurately approximate a function with a poly- nomial unless it is many times differentiable, or better still, infinitely differentiable. 5. The polynomial of best approximation p„ (x) cannot be obtained in general by linear methods. There is no way of finding p„ (x) by evaluating / (x) at a predeter- mined number of values of x and calculating coeflBcients of the polynomial by linear relations. The algorithms for determining "pn (x) are nonlinear and are usually time- consuming. 6. Usually acceptable polynomial approximations can be obtained by truncating a Chebyshev series expansion. Let / (x) be defined for — 1 =^ x ^ 1 and satisfy a Dini- Lipschitz condition, or lim [a, (8) In 8] =0 (1) where m (8) is the modulus of continuity. Let T„ (x) be the Chebyshev polynomial of the first kind of degree n. Then if f{x) satisfies the Dini-Lipschitz condition, /(x) can be represented by a uniformly convergent Chebyshev poly- nomial series / (x) = 2 UnTn (X), l^X^l (2) where 1 r / lo = - / —^ ^7-1 VI fix) 2 f^fix)T„{x) '" W-. VT dx dx. (3) n = 1,2, The Chebyshev polynomials are defined by Tn (x) =E cos (n arc cos x) (4) The condition (1) is satisfied if f(x) satisfies a Lipschitz condition, or if /' (x) is bounded on [ — 1, 1]. For well-behaved functions the truncated expansion of Eq. (2) is a good approximation and normally has a maxi- mum absolute error not much greater than for'p„(x). The drawback of this expansion is that it is usually not possible to express the a„ in terms of known functions. However, approximate numerical methods for obtaining the a„ are available. 7. The coefficients a„ for n = 0, 1, 2, approximately by N are given «,.^^|:/(x7) (5) a»^-2 f{x^)T„ixf), m n = l,2. N] where m is an integer >N and xf, / = 1,2, • the m zeros of r„, (x). The roots are given by tn, are cos- - (2/ - 1) m fc = 1,2, • • ■ m (6) When m is taken to be 2V + 1, then Eq. (5) gives us the coefficients of the interpolated Nth-degree polynomial JPL TECHNICAL REPORT 32-1078 determined by the N + 1 roots of Ty+i (x). It can be shown that for fixed N the approximate values of Eq. (5) will con- verge to the exact values given by the integrals in Eq. (3) as m — > 00 for any continuous function f(x). 8. To establish this result we interpret Eq. (5) as the approximation of Eq. (3), using Gaussian quadrature with a weight function (1 — x^)'^. Here the form of the approxi- mation is /: gw VI :dx •: ^ ^ r ;— 2 g cos ^ (2/ - 1)- m (7) We next use a theorem by Stieltjes, which establishes that the general Gaussian quadrature scheme (increasing m) for any weight function on a finite interval is convergent for any continuous function g (x). In evaluating Chebyshev series and the special series in Eq. (5), various tricks can be used (discussed in the Appendix). The important point is that Chebyshev series can be accurately and quickly evaluated. We thus have an efficient method for determining a polynomial that uniformly approximates a function (which we assume to be at least many times differentiable) to any desired accuracy. We note that the coefficients are linear combinations of certain values of / (x). III. The Convergence of Chebyshev Series The central idea of this Report is that in orbit- determination problems, at least, we can do much better than determine the degree of a polynomial by the usual F test. Without discussing this standard method in detail, we seek to determine whether the improvement in fit that results from adding one or more parameters, or making the degree of the polynomial higher, is statistically significant. The drawbacks of this method are many. One is that there is usually no polynomial of finite degree that will fit the data exactly even if no noise were present. Next, one does not know what significance level to use for the test. Also, one must assume that the noise is Gaussian. In addition, the test is difficult to apply when the noise is correlated. The most serious objection to this method is that it does not consider any a priori knowledge about the nature of the function to be approximated. If we know that there exists a polynomial, and its approximate degree, that will fit the data "satisfactorily" we really know quite a bit about the function to be approximated. As intimated before we really know how its coefficients approach zero. Let us illustrate with a few examples. We give in Table 1 the Chebyshev coefficients for the expansions of e* and e^ in the interval [0, 1]. Of course the variable * in T„ (t) has to be related to x by a linear transformation so that t = — l<-*x = and t — l<-^x = 1. Table 1 . Chebyshev polynomial expansions for e'' and e ' on interval [0, 1 ] giving coefficients On of 2 a„T„ U), f = 2x - 1 n On for e^ o„ for e" 1.753387654 0.645035270 1 0.850391654 -0.312841606 2 0.105208694 0.038704116 3 0.008722105 -0.003208683 4 0.000543437 0.000199919 5 0.000027115 -0.000009975 6 0.000001128 0.000000415 7 0.000000040 —0.000000015 8 0.000000001 0.000000000 Note how the magnitude of the coefficients decreases in size approximately according to Kp", where K and p are constants and p ^ 0.1. In fact, for e^, :^x ^ 1, we have for all n I a„ I ^75.59 (0.0514)" For e", 0— X :^ 1, we have for all n I a„ 1^31.68 (0.0502)" The fact that we are able to do this, and the fact that these relations hold for all a„ (which we do not show here) is no accident. We next define analyticity on the line for functions. A function / (x) defined on [a, b] is said to be analytic on the interval if for an Xo c [a, b] there is a power series 00 2 Ci(x„)(x-Xo)* i = o convergent for | x — Xo | = R, which represents the func- tion at all points belonging simultaneously to [a, b] and (xo — R,Xo + R). We denote by A[a,b] the class of func- tions analytic on segment [a,b].Ii R= oo, the function is said to be an entire function. Then we have a result ob- tained by Bernstein (see Bernstein, Ref. 3, p. 110, or Lorentz, Ref. 2, p. 77, for the proof). JPL TECHNICAL REPORT 32-1078 Theorem 1 Let / (x) be continuous on [a, b] and E„ = £„ (/) be the best approximation to f(x} by polynominals belonging toH„. In this case f{x)eA [a, b] if and only if En < Kp- (8) where K and p < 1 are constants. Moreover, / (x) is an entire function if and only if lim (E„y/" = (9) where z = x + iy and w — Re"!'. The circle in the u)-plane with R constant >1 maps into the ellipse Er, {x-/c-) + (y-/d-) = 1, with the semiaxes c = (1/2) [R + (1/R)] and d = (1/2) [R - (1/R)], and the foci ±1. For R = 1, Er is the interval [ — 1, 1] covered twice. For each point z not on [ — 1, 1] there is exactly one ellipse En, R > 1, pass- ing through z, which can be found from the equation R = |Z±: VZ'- 1| (12) where the sign for the square root is chosen to make the absolute value of the expression >1. We next have a theorem derived by Bernstein (Ref. 3, p. 112) on the growth of polynomials in the complex plane. We restate the theorem more precisely in the termi- nology of the theory of functions of a complex variable. Theorem 2 A. If the best approximation of the function f{x)eC [a, b] [f (x) is continuous] is such that lim £„ n-* oc l_ if) l/n 2 R>1 (10) then f{x) is analytic inside an ellipse, the foci of which are the points a and b, the semiaxes having the sum [{b — a)/2] R, with a singular point lying on its boundary. B. Conversely, if / (x) is analytic inside an ellipse whose foci are a and b, and a singular point lies on the ellipse boundary, then the best approximation by polynomials on [a, b] satisfies Eq. (10), [(b — a)/2] R being the sum of both semiaxes of the ellipse. Thus p ^ 1/R, which in turn is determined by the location of the singularities of the function. For entire functions we can find values of K and p that satisfy the inequality (8) with p arbitrarily small. These results carry over in the convergence of coeffi- cients of Chebyshev series. To establish this we need some preliminary results. For simplicity assume that [a,b] has been linearly transformed into [ — 1,1]. We introduce a standard mapping in the complex plane z = l(-i) (11) Theorem 3 If a polynomial F„ with complex coeiBcients satisfies |P„(x)|^Mfor-l^x^l, then |P„(z)!^MR«, Z€En, R>1 (13) Proof. Let Q2n (W) ^{^hh)]'-'-^ where Qan (v) is a 2nth-degree polynomial in v. For | o | =1, using the property of the mapping (11), Q2n jw) Pn{x) ^M Thus Q,„{w)^M (14) From the maximum principle the inequality (14) holds for |m)| ^ 1. Let \w\ = 1/R, so that z = (1/2) [w + (l/u;)] lies on the ellipse Eg. Then Pn{z) Q2n(w) w" ^MR" which was to be proved. We now can prove the following theorem about the convergence of Chebyshev series in a manner that closely follows Theorem 2. JPL TECHNICAL REPORT 32-1078 Theorem 4 If the Chebyshev series expansion of / (x) e C [ — 1, 1] is such that / I I \ v» 1 hm I a„ 1 — "5" , Ro > 1 (15) then / (x) is analytic inside the elHpse Eg^, with a singular point lying on Er^. Conversely, if f{x) is analytic inside Eg^ for Ro > 1, then the coefficients satisfy Eq. (15). Proof. Let Dg be the interior of the ellipse Eg. We show first that if / is analytic in Dg, 1 < R < oo , then lim( r4 (16) We expand f(x) on [ — 1,1] in Chebyshev polynomials according to Eq. (2) and (3); the fact that f{x) has a con- tinuous derivative assures us that the series is uniformly convergent. By the substitution cos t = x we obtain / (cos *) = 2 <^k cos k t n = o 1 ("^ a„ = — f f (cos t) cos k t dt. n = l,2. The substitution z = e'* in the integral for a„ gives the line integral along the circle | z | = 1, '-Hi'^r + Z-" dz n = l,2, (17) Choose Ri with 1 < Ri < R. Consider the function .(^)-/(H^) in the closed ring G bounded by the circles Cii \z\ = Rl\ and Co-. |z| = R,. From the discussion following Eq. (11) we see that if z is on either of the circles |z| = cr or |z| = a-\ then w = (1/2) (z + z'^) is on the ellipse E„. Since / (w) is analytic in D„, it follows that g (z) is analytic in the ring G. We now change the path of integration to estimate the fl„. We split the integral (17) into two parts; the integral containing z " is taken over a circle with a large radius, and the integral with z" is taken over a circle with a small radius. Thus an = ^.[^(.z) z»-i dz + ^.[g{z) z-c-^) dz Let M be the maximum value of | f | on Eg^. Then the abso- lute value of the first integral is bounded by 1 /I \"-' In the same way the second integral is bounded by MR-". Thus , 2M '' R?' n = l,2, (18) It follows that {\a„\ ) v» ^ (2M) V" 1/Rj, and Ita ( | «„ | )!/" =^ 1/Ri. Since R, can be taken arbitrarily close to R, we obtain lim I o„ ) ^ ^ n^ooVI 1/ R (19) (yonversely, let the inequality (16) hold for some R, 1 < R < 00 . We will show that / has an analytic extension onto each elliptic disc Dg^, 1 < R, < R, and therefore onto Dg. This taken with Eq. (19) will show that the larg- est elliptic disc in which / is analytic is Dg^, which will prove the theorem. Let Rr<R2<R. Then ( | a„ | y^" ^ l/R^ forn ^ N with N sufficiently large. It follows that on [ — 1, 1] a„ T„ (x) n>N (20) where we use the fact that | r„ (x) | :^ 1 for X€[ — 1, 1]. Applying Theorem 3 to expression (20), using z in Dg^, OnTniz) <fj- zeDg^, n>N (21) Hence the Chebyshev series expansion of f(x), with x replaced by z, f (z) = 2n=o ^n Tn (z), converges uniformly on each compact subset of Dg. Its sum is analytic on Dg and provides the desired analytic extension of /. This com- pletes the proof. JPL TECHNICAL REPORT 32-1078 The importance of Theorem 4 is that it shows that for functions analytic on the hne there exist bounds for the Chebyshev coefficients of the form Kp" (22) where K and p < 1 are constants, and p is arbitrarily close to 1/Ro, given by Theorem 4. Consider for example the expansion of / (x) = 4/ (5 + 4x), where — l^x^l. Here there is a pole at —5/4. From Eq. (12), R = I -5/4 - \/9/16 | = 2. Hence we expect On -ay for large n. In fact, Elliott (Ref. 4) shows that for all n 8(-l)''/lY ^(iy Another example that shows the regularity with which a„-> is given by the expansion of ln(l + x), — x^l, presented in Table 2. Figure 1 shows the behavior of log a„ vs n. In addition, log (Kp") vs n is graphed, which is a straight line. Here R = |-3-V3^ 5.8284 -8 -12 -16 -20 ^NX loc / \ N '-log[o .34 (0.161 ).]\ \ \ \ \ 12 16 20 24 28 n Fig. 1 . Expansion of In (1 -f x) — 2 *>„ T„ (t), 0^x<l,T = 2x-l Table 2. Chebyshev polynomial expansion for In (1 + x), ^ X ^ 1 , giving coefficients a„ of 2 o„ T„ (t), t = 2x — 1 n 1 0.376 X 10° 1 0.343 X 1 01° 2 -0.294 X 10' 3 0.336 X 10' 4 -0.433 X 10' 5 0.594 X 10* 6 -0.850 X 10 ' 7 0.125 X 10"= 8 -0.187 X 10 ° 9 0.286 X 10' 10 -0.442 X 10' 11 0.689 X 10° 12 -0.108 X 10 » 13 0.171 X 10" 14 -0.273 X 10" 15 0.438 X 10" 16 -0.704 X 10" 17 0.113 X 10"" 18 -0.184 X 10" 19 0.299 X 10-" 20 -0.488 X lO'" 21 0.798 X 10" 22 -0.131 X 10" since the singularity (in the scale we use) occurs at Z = —3. The value for p used for the straight line in Fig. 1 is 0.161 and is not exactly equal to 1/R, which is 0.1715 ■ • • . Since only a finite number of terms was used, one can expect some discrepancy. This raises the question of whether or not we can be sure that the function we are approximating is analytic on the line, since in practice we do not have an analytic expres- sion for f{x). In many cases we can find a diflEerential equation for the quantity to be approximated and infer the analyticity from the nature of the equation [see for in- stance Lefschetz (Ref. 5)]. If the function does seem poorly behaved in a small region, it is best to eHminate this region from the approximation by using a smaller interval. The author has generated tens of thousands of Cheby- shev series (Ref. 6) and can report from the many cases he has examined that a very simple convergence pattern always holds for rapidly converging series. In fact, for functions analytic on the line. |a„l = Kp'' (23) JPL TECHNICAL REPORT 32-1078 often seems to hold for large n, when the function is not where (7 is a known nXm matrix, q is an m-dimensional an entire function. In other words, an expression of the form Kp" is not merely a good bound, but often is a good approximation for | a„ | when n is large. This may be partly justified from the results of Elliott (Ref. 4), where sharp estimates of a„ are obtained. We shall give a simplified description of some of Elliott's results and refer the reader to Ref. 4 for exact statements. Elliott discusses the contributions to the Chebyshev coeffi- cients of poles and a branch point on the real axis by evalu- ating a contour integral. The result is that the contribution to a„ is expressed as residues of functions involving f{z) evaluated at the singularities of /(z). For large n we need only consider the effects of those singularities located on Er, where R > 1 is chosen so that there is at least one singularity of / (z) on E^ and / (z) is analytic in the interior of Er. If the singular points on Er consist of a single pole of order k, then Elliott shows that for large n -a) " (fi + fc - 1) ! n! (24) where K is a constant. If fc = 1 then Eq. (24) is of the form (23). The Schwarz principle of reflection applied to / (z) [/ (z) is real for real z] shows us that its poles must occur at conjugate complex values of z. If we consider pairs of poles (or any other singularity) on Er, we find that their individual effects are conjugate complex values of each other. Their effects may be characterized by a slowly varying term similar to Eq. (24) multiplied by a rapidly varying term of maximum amplitude one. Similar results hold for the case of a branch point on the real axis. Thus when there is only one singularity on Er, Eq. (24) is a good approximation to |a„|, which behaves much like Eq. (23) for a restricted range of n. Our main hypothesis is that we can use the inequality (22) by estimating K and p from a finite number of terms. The effect of errors caused by taking a finite number of terms will be discussed in connection with a general algorithm for implementing this curve-fitting method. IV. The Use of Convergence Properties in Estimation Let us first recall the usual minimum-variance linear unbiased estimator (see Scheffe, Ref. 7). Let the n-dimensional column vector y represent the observa- tions, and column vector representing the unknown elements we wish to estimate, and e is an n-dimensional column noise vector. We assume that the expectation of e is zero. E (c) = (26) and assume that the covariance matrix of the noise E {ee^ = Ae (27) is known and is positive definite. Here the superscript T indicates transpose. If the columns of U are linearly inde- pendent, by the Gauss-Markoff theorem we have the un- biased minimum-variance estimate of q, ^=(l/-VC7)-(7^A-y (28) The covariance matrix of q is given by E[(^-q){^-qr]={U'A',^U)-^ (29) We now show how to obtain "better" estimates by mak- ing use of the convergence properties of Chebyshev series. Since the new estimates may be biased, we define a "best estimate" to be a linear minimum second-moment estimate oi q,'q, where the second moments are taken with respect to the true value of q. Since ordinarily it is the size of the errors in one experiment that matters, and not whether under some hypothetical situation the expectation of the errors over the hypothetical ensemble is zero, we dispense with the condition of no bias. Thus we seek a linear esti- mate q such that its second-moment matrix E[{q-q){q-qr]-A-, (30) IS a mmimum m some sense. y — Uq + e (25) In connection with the minimization of matrices we review the following definitions. A square matrix A is called positive definite if x^'Ax > for all x^O.lix^Ax^O for all X and there exists a nonzero vector x for which the equality holds, A is called positive semidefinite. By defini- tion, a positive definite matrix is less than or equal to a second positive definite matrix if the second matrix minus the first matrix is positive semidefinite. We now state the mathematical problem we shall solve. As before we are given a model for a set of observations y, as in Eq. (25), with U and y known and with certain known JPL TECHNICAL REPORT 32-1078 statistics about the error vector. However, we now no longer assume that E (e) = 0, but E(e) (31) where e is unknown. Again, E{ee^)=Ae (27) is assumed to be positive definite. The problem is to choose a linear estimator q from the class of linear esti- mators such that A7=sE(^— q)(q — qY ^A-, where "q is any other linear estimator. Let where Q is a given matrix. Hence (32) A^= £ ([Q([7c/ +e)-q] [Q{Uq + e) - qY) = {QU - I) q q'' {Q U - ly + {QU -!)€'■ Q'' + QeiVQ''-I) + QAeQ^ (33) where / is the unit matrix. This results in A— being depen- dent on the unknown q and e. If we restrict our estimators by the condition QU (34) which previously resulted from the condition of no bias, then Aj=QAeQ^ (35) no longer depends on the unknown q or e. This means that the solution to the problem of finding an estimator with a minimum A- reduces to the form of the previous case, with ?=(l7-A-[7)-C7-A-y (36) where the tilde is used to remind us that E{q — q)^0. In fact,- using Eq. (34) and (36), Eiq-q)-^ (V A-' Uy U A;' e (37) We next develop a special formula for the use of a priori information with bias. The most general formula is left as an exercise for the reader. We reinterpret the observation vector to include the a priori estimate of q, q^ for the first part of y. Thus y' is partitioned as '-(?) (38) By definition -(i) (39) is the augmented U matrix. We consider the special case where the augmented A^ matrix is of the form V , Ae/ (40) When Eq. (36) is evaluated with the new quantities, cer- tain simplifications result: 'q = {A-^ + U^AtU)-^{A;y^U^Af)y' (41) and A~=(A- + C7-A-t7)-' (42) We now have the basic mathematical machinery to make use of convergence properties of series. We wish to estimate A, the vector of coefficients for the Chebyshev polynomial series. Without a priori information y^BA + e (43) where the B matrix is known in terms of the Chebyshev polynomials. For programming purposes it is convenient to have indices start at one, instead of zero. Hence, Bi) — jTy-i (tj), t = l,2, / = 1,2, • n ■ L (44) where n is the number of observations of the particular data type, or of the curve we are fitting, L is the dimension of A, and tj is the transformed time (to [ — 1,1]) of the ith observation. For the moment let us choose L to be so large that the error in the representation of E (y) is neg- ligible. Mathematically there is no limit to how large L might be chosen, although if L > n we have to introduce some complicated algebraic notions (such as generalized inverses). If the interval on which the observations are given is denoted by [to, tf], then 2 /. t^±±\ . ^'-^7^(*' — 2-)' ' = 1'2' (45) JPL TECHNICAL REPORT 32-7078 The evaluation of a Chebyshev polynomial for Eq. (44) may be regarded as a special case of the evaluation of a Chebyshev series, which is discussed in the Appendix. Now we assume that we know that the series is bounded by an expression such as (22). Then there exists an a priori estimate of A, which we take to be zero, and a finite second-moment matrix of this a priori estimate, which we call Aa.p.a.- It is easily shown that the best second-moment matrix for the a priori estimate depends on the unknown A. The matrix is given by AA''', which obviously is positive semi- definite. Since the value of A is unknown, we might approximate AA''' in some manner by using the available observations and any other a priori information. Thus we will have to consider the effect of using an approximate value for Aa.p.a. . This will be done in a more general con- text when we consider the effect of using a wrong second- moment matrix of the noise. For simplicity we assume that the approximate A„,p,a. that is used is positive definite. If we assume that the true value of each component of A is uncorrelated with the error in any component of obser- vation other than A, or From Eq. (5), (6), and (48), we have £ (0 - fli) (e,) = 0, i = 1,2, • • • L j = L + l, ■ ■■ n+ L (46) then we can partition A' as in Eq. (40) and obtain A = (A-.„. + B-A-B)B^A-/y (47) where use has been made of the fact that the a priori estimate of A is taken to be zero. For the case of orbit determination we use A as a means of estimating q, the orbital parameters. If then where A = Cq ?=[C-A-C]-C^A-A A^^ = (A-.„. +B^A-B) (48) (49) (50) L'ij L'iy / = 1,2, Ti-i (ts). • m i = 2,3, / = 1,2, • L • m (51) where m is the dimension of q. Here we assume that the dimension of q is less than that of A and the indicated inverses exist. We assume that f{t) is given in the form Also 77 /2k - 1\ to ~^ tf tf to (52) t, "Tft If we combine Eq. (49) and (47), 'q={C^A-^C)-^C^B^A;^y (53) where the largest matrix to be inverted (besides A^, which requires special treatment) is A^ = (C^A-^C) (54) instead of A"' as in Eq. (47) and (49). The disadvantage of Eq. (53) is that it deals with coordinate deviations directly and the data are not "compressed." Note that in Eq. (53) C^ B^ is simply the transpose of the matrix U, which relates the observations to the orbital elements as in Eq. (25). We now discuss the consequences of the assumptions or approximations that we are making. One assumption, that the columns of the U matrix are linearly independent, can be dealt with very simply. This is discussed by Scheffe (Ref. 7). The result is that if the columns are in fact linearly dependent, then the minimum-variance estimates are no longer unique as far as their representation in terms of the parameters go. However, they are unique in terms of the residual quadratic form. For the orbit-determination prob- lem this means that if we try to determine too many astro- nomical constants from a set of observations of a space JPL TECHNICAL REPORT 32-1078 probe, the result will be failure to determine the constants uniquely, but nevertheless the residuals of the observa- tions will be uniquely determined. Thus if one is primarily interested in orbit prediction for short times in the future, it should not matter whether or not the columns of the U matrix are linearly dependent. To simplify the discus- sion we can always eliminate some parameters to make the columns linearly independent. Our next topic is the effect of using approximations for Ae, in particular for A^.p „., which forms a part of Ae. A gen- eral discussion of the approximation of using a diagonal matrix for A^ is given by Magness and McGuire (Ref. 8). A question they ask is not how much better one can do with a minimum-variance estimate, since this is unattain- able when the cross correlations are unknown, but whether the unknown correlations "hurt" you. Unfortunately, one has to go through certain transformations to find out the effect of the unknown correlations. Here we take a different approach. proved from the continuity of the product as a function of S. Also Q{Ae + s)Q'' = qtq-' = ([/T-i uy using Eq. (55). (57) Corollary. The same result holds if A<. is positive semi- definite and r = S + Ap, where r, the nominal-error second-moment matrix, is positive definite and S is posi- tive semidefinite. Equations (34) and (35) will still hold and Eq. (56) and (57) follow as before, which proves the corollary. Corollary. T [used in Eq. (55) and (57)] may be chosen to be a diagonal matrix. It is easily shown that if the elements of the diagonal matrix r are made large enough, then r — Ae is a positive semidefinite matrix. In fact, if each element of r is larger than the largest eigenvalue of A^, the corollary holds. Theorem 5 Given a model for the observations y — Uq + e, where the columns of U are linearly independent and £ iee'^) = A^ is positive definite, £ (e) = e, which may not be zero. Sup- pose we construct an estimator for q according to Eq. (36), but use an incorrect value of A,,, which we denote by r, and assume that r is positive definite. Then the matrix multiplying the observations is Q = {WT-^U)-^Wi^^ (55) Let r ^ Ae, or r — Ae = S, where S is positive semidefinite. Then the actual second-moment matrix of the error in "q will be less than or equal to the nominal one, {V' V~^ (7)"'. Proof. We recall the result from matrix theory that V^ AU is positive definite if A is positive definite and the columns of U are linearly independent. Hence (t/T'^ U)~^ is positive definite. Also, the columns of Q''' are linearly independent by the construction of Eq. (55). Hence the actual second-moment matrix, from Eq. (35), is Aact^QA,Q^^Q{Ae + S)Q' (56) The inequality in Eq. (56) certainly holds if S is positive definite. If it is positive semidefinite, the inequality can be Incidentally, it is easy to obtain a simple bound for the largest eigenvalue of a matrix A. For instance, it can be shown that the largest eigenvalue, Xmax, is bounded by / " I IN : max ( 2 A;,i ) ;■ \k = 1 I I / where n is the dimension of A. (This is a modification of a theorem derived by L. Levy.) V. An Example In Table 1 we gave the Chebyshev expansions for e* and e'^ on the interval [0,1]. Suppose we are trying to estimate the function f{x) = q c" on the same interval [0, 1] where q is unknown. Assume that in reality q = 1- Assume that we have some knowledge of the size of Cj, which was obtained from a study of comparison func- tions. For simplicity we will limit the dimension of A to 3, or assume that we adequately approximate the function with a polynomial of degree 3 — 1 =2. We also assume that we are limited to three measurements at time x = 0, V&, and 1. If we restrict ourselves to these three time points, it can easily be shown that e''' evaluated at these points deter- mines a second-degree polynomial with slightly different 10 JPL TECHNICAL REPORT 32-1078 coefficients than given by Table 1. This is obtained by evaluating er" at these three time points, 0, %, and 1 (1, 0.60653, 0.36788 to five decimal places) and interpolat- ing. The resulting series is / (r) = 0.645235 - 0.31606 Ti (r) + 0.038705 T2 (t) - 1 ^ T ^ 1 (58) Next, assume that the covariance matrix for A that is available to us is 10 times the squares of the values for Oj given in Table 1. The factor of 10 provides us with a margin of safety for our estimating method: (59) shev polynomials. / 4.16025 A„.p.a. = ( 0.97969 V 0.01521 / 0.24037 K\.a. = ( 1.02073 \ 65.7462 assume that (60) Ae = I, E(e) = (61) These are our basic assumptions. Next, we use the usual unbiased minimum-variance estimate given by Eq. (28) with the covariance matrix given by Eq. (29). Here [/'' = (1,0.60653,0.36788) (62) Then the variance of the estimate given by Eq. (28) is, from Eq. (29), ([/r (7)-i = 0.66524 (63) We now redo the problem, using our a priori knowl- edge of the size of the coefficients, and see what we gain. From Eq. (44) and a knowledge of the behavior of Cheby- B = (64) A-> = B-B + A-;,.„. 3.24037 1 3.02037 1 68.7462 (65) From Eq. (58) and (51), C = (0.645235, - 0.31606, 0.038705) (66) i'^ = C^ A„i C = 1.8037447 (67) A^= 0.5544022 (68) Finally, from Eq. (53), qf = (0.5544022, 0.3362614, 0.2039532) y (69) We note that Eq. (68) gives the estimated variance of q. The actual variance of q is obtained by using Eq. (69) and the statistics of y. The result is a variance of 0.462030 plus a bias term of 0.166661. The total second moment about the true value is 0.462030 + (0.166661)^ = 0.48979 which is smaller than that given by Eq. (63) (0.66524), using the usual formula. The reason that the actual sec- ond moment is smaller than the calculated one given by Eq. (63) is that we allowed ourselves plenty of margin for error in estimating the size of c„. VI. General Conclusion The method we used to improve our polynomial fitting can be used in the estimation of other parameters if we give up the condition of unbiasedness. Equation (28) is the best linear unbiased estimate. If we can determine bounds for E{qq'^), then we can employ the same pro- cedure and use an a priori estimate of qf as zero, with the associated second-moment matrix. JPL TECHNICAL REPORT 32-1078 n Appendix The Evaluation of Chebyshev Series The evaluation of a series of orthogonal polynomials can be conveniently accomplished by a method described in Ref. 6. For a Chebyshev polynomial series the algorithm is as follows: Given the series Define p{x)= 2 an T„ (x) bn^l = bn^o - Compute recursively b„ = 2x bn+i - fcn+2 + a«, n = N,N -I, ■ ■ ■ 1 bo = xbi — bi + Go p (x) = bo These formulas may also be used in curve fitting as in Eq. (5) and (6). If maximum accuracy is desired for curve fitting, another method is recommended. This alternative method is most useful when n is large and the round- off error in x, as used in the above algorithm, may be significant. To obtain the Chebyshev coefficients in curve fitting, we make use of the definition of T„ (x) as given by Eq. (4) when used in Eq. (5) and (6), which results in We now observe that the cosine function need only be evaluated in the first quadrant for the values IT k COS -r- — , 2 m fc = 0, 1, m For values of n (2; — 1) that are greater than m, the quad- rant L is found from -[^^Jh-]- where integer arithmetic is used for the first division. Then the well-known formulas for transforming the cosine function to a function of the first quadrant are used to obtain fco = n (2/ - 1) [mod m] T (x"'^ = cos- — L = 1 T„(x») = -cosf^, L = 2 where k — m~ko, and T„(x-)=-cosJ^, L = 3 T„(x7) = cosJ^, L = 4 where k = m — k,,. m n = 1, 2, N N ; = 1 12 JPt TECHNICAL REPORT 32-1078 References 1. Todd, J., Introduction to the Constructive Theory of Functions, Academic Press, Inc., New York, 1963. 2. Lorentz, G. G., Approximation of Functions, Holt, Rinehart and Winston, Inc., New York, 1966. 3. Bernstein, S., Legons sur les Proprietes Extremales et la Meilleure Approximation des Fonctions Analytiques d'une Variable Reelle, Gauthier-Villars, Paris, 1926. 4. Elliott, E., "The Evaluation and Estimation of the Coefficients in the Chebyshev Series Expansion of a Function," Mathematics of Computation, Vol. 18, No. 86, p. 274, 1964. 5. Lefschetz, S., Differential Equations: Geometric Theory, 2nd edition. Inter- science Publishers, New York, p. 43, 1957. 6. Kizner, W., Error Curves for Lanczos' "Selected Points" Method, Technical Report 32-703, Jet Propulsion Laboratory, Pasadena, Calif., April 30, 1966 (Reprint from Computer Journal, Vol. 8, No. 4, pp. 372-382, January 1966). 7. Scheffe, H., The Analysis of Variance, Ch. 1, John Wiley & Sons, Inc., New York, 1959. 8. Magness, T. A., and McGuire, J. B., "Comparison of Least Squares and Mini- mum Variance Estimates of Regression Parameters," Annals of Mathematical Statistics, Vol. 33, pp. 462-470, 1962. JPL TECHNICAL REPORT 32-1078 13