Introduction to Numerical Analysis: Linear Systems of Equations
Definitions
Linear Vector Spaces
In this course, by a linear vector space we mean the set
The dimension of this vector space is . A vector in this space has components each of which is a real number. The zero vector is the vector whose components are all equal to zero.
For example is the two-dimensional linear vector space. Each vector in this space has two components. can be represented geometrically by a plane. is a three-dimensional linear vector space. Each vector in this space has three components. The vector is the zero vector in . The vector is the zero vector in .
Linear Functions between Vector Spaces(Matrices)
Given a linear vector space , and another linear vector space , a linear function has the following form, :
The function takes a vector with components and returns a vector which has components. Using matrix notation, the above can be written in the following more convenient form:
If we denote the matrix of numbers by , then the linear function can be written in the following simple form:
We noted that is an element of the linear vector space . Similarly, we view as an element of the possible linear functions from to which is denoted:
If , then for simplicity, we say that . We also use the symbol to denote the set of square matrices that live in the space .
For a visual representation showing the action of the matrix on the space , see the tool shown here.
Matrix Transpose
If , then the transpose of , namely is defined as the matrix whose columns are the rows of .
Examples
Example 1
Consider the function defined as . Following the above definition, is a one-dimensional vector and the matrix has dimensions . In this case and .
Example 2
Consider the function defined as follows: :
In matrix form, this function can be represented as follows:
This function takes vectors that have three components and gives vectors that have two components. The matrix associated with this linear function has dimensions , lives in the space , and has the form:
In this case, is simply:
Example 3
Consider the function defined as follows: :
In matrix form, this function can be represented as follows:
This function takes vectors that have three components and gives vectors that have three components. The matrix associated with this linear function has dimensions , lives in the space , and has the form:
In this case, is simply:
Example 4
Consider the function defined as follows: :
In matrix form, this function can be represented as follows:
This function takes vectors that have three components and gives vectors that have one component. The matrix associated with this linear function has dimensions , lives in the space , and has the form:
In this case, is simply:
Linear System of Equations
Let and given a vector , a linear system of equations seeks the solution to the equation:
In a matrix component form the above equation is:
Another way of viewing this set of equations is as follows:
The solution for a linear system of equations is the numerical value for the variables that would satisfy the above equations. There are three possible scenarios:
Scenario 1
If then we have more unknown components of the -dimensional vector than we have equations. The system is then termed underdetermined. This results in having many possible solutions. For example, consider the linear system of equations defined as:
There are many possible solutions that would satisfy the above equation. One solution can be obtained by setting . Then:
which results in and . Therefore one possible solution is:
Another possible solution can be obtained by setting . Then:
which results in and . Therefore another possible solution is:
In fact, there are infinite possible solutions that depend on what value you choose for . So, if is set such that , then we have:
Therefore:
and
Therefore:
Scenario 2
If and the equations are linearly independent, then we have more equations than we have unknown components of the -dimensional vector . This results in having too many restrictions which prevents us from finding a solution. In this case, the system is termed overdetermined. For example, consider the linear system of equations defined as:
In this system, we have three independent equations and two unknowns and . If the first two equations are used to find a solution, then the third equation will present a contradiction. The first two equations have the following form:
which result in the solution , and . If we substitute this solution into equation 3 we get:
Scenario 3
If and the equations are independent, i.e., the rows of the matrix are linearly independent, then one unique solution exists and the matrix that forms the equation has an inverse. In this case, there exists a matrix such that:
To know whether the equations are independent or not the determinant function defined below is sufficient. If the determinant of a matrix is equal to 0, then the rows of the matrix are linearly dependent, and so, an inverse cannot be found. If the determinant of a matrix on the other hand is not equal to 0, then the rows of the matrix are linearly independent and an inverse can be found.
For example, consider the system defined as:
The rows of the matrix are clearly linearly independent. The inverse of can be found (which will be discussed later in more detail). For now, we can simply solve the two equations as follows. From the second equation we get:
Substituting into the first equation we get:
Therefore, the system has a unique solution.
If and the equations are dependent, i.e., the rows of the matrix are linearly dependent, then it is not possible to find a unique solution for the linear system of equations. For example, consider the system defined as:
The rows of the matrix are clearly linearly dependent. The inverse of can not be found (which will be discussed later in more detail). The system presents a contradiction and cannot be solved because the first equation predicts that while the second equation predicts that !
As a different example, consider the system defined as:
The rows of the matrix are clearly linearly dependent. The inverse of can not be found (which will be discussed later in more detail). The system has infinite number of possible solutions. The first and second equations give us the relationship . So, by assuming any value for , we can find a value for . For example, if , then . Another possible solution is and .
We will focus in the remaining part of this section on how to find a solution for a linear system of equations when .
Special Types of Square Matrices
Symmetric Matrix
A symmetric matrix is a matrix that satisfies . For example:
is a symmetric matrix.
Identity Matrix
The identity matrix is the matrix whose diagonal components have the value of 1 while its off diagonal components have a value of zero. For example, has the form:
This is called the identity matrix because it takes every vector and returns the same vector. For example, consider the general vector with components , , and . The same vector is obtained after the operation :
Diagonal Matrix
A matrix is called diagonal if all its off diagonal components are zero. The identity matrix is a diagonal matrix. Similarly, the following matrix is diagonal:
Upper Triangular Matrix
Let . is an upper triangular matrix if all the components below the diagonal components are equal to zero. For example, the following matrix is an upper triangular matrix:
Lower Triangular Matrix
Let . is a lower triangular matrix if all the components above the diagonal components are equal to zero. For example, the following matrix is a lower triangular matrix:
Banded Matrix
A matrix is called banded if all its components are zero except the components on the main diagonal and one or more diagonals on either side. The band width is the maximum number of columns on the same row that have non-zero entries. For example, the following is banded with a band width of 3.
Matrix Operations
Matrix Addition
Let . Consider the function and the function . Then, the function is given as:
where is computed by adding each component in to the corresponding component in . For example, consider the two matrices:
Then, the matrix can be computed as follows:
Two matrices can only be added if they have the same dimensions.
Matrix Multiplication
Consider the matrix which takes vectors of dimensions and returns vectors of dimensions . The matrix has dimensions . Consider the matrix which takes vectors of dimension and gives vectors of dimension . The matrix has dimensions . The combined operation takes vectors of dimensions and gives vectors of dimension . This matrix corresponding to the combined operation can be obtained by taking the dot product of the row vectors of the matrix with the column vectors of the matrix . This can only be done if the inner dimensions are equal to each other. The product would have dimensions of and it is such that :
For example, consider the two matrices:
We can only perform the operation but the opposite is not defined. The matrix multiplication results in:
You can use “.” in Mathematica to multiply matrices, or you can use “MMULT” in excel to multiply matrices.
View Mathematica Code
Nn = {{1, 4}, {2, 2}, {3, 5}}; M = {{2, 3, 1, 5}, {0, -1, 2, 4}}; L = Nn.M; L // MatrixForm
It is important to note that when the dimensions are appropriate, then matrix multiplication is associative and distributive, i.e., given the three matrices then:
However, matrix multiplication is not necessarily commutative, i.e., . For example, consider the matrices
Then:
Matrix Determinant
If such that:
then . If such that:
then .
Let . The matrix determinant can be defined using the recursive relationship:
where and is formed by eliminating the 1st row and column of the matrix . It can be shown that the rows of are linearly dependent. In other words, if and only if the matrix cannot be inverted. The function “Det[]” in Mathematica can be used to calculate the determinant of a matrix. In Excel, you can use the function “MDETERM()” to calculate the determinant of a matrix as well.
The following function in Mathematica computes the determinant of the matrix using the above recursive relationship and then compares the solution with the built-in function “Det[]” in Mathematica.
View Mathematica Code
fdet[M_] := (n = Length[M]; If[n == 2, (M[[1, 1]]*M[[2, 2]] - M[[1, 2]]*M[[2, 1]]),Sum[(-1)^(i + 1) M[[1, i]]*fdet[Drop[M, 1, {i}]], {i, 1, n}]]); M = {{1, 1, 3, 7}, {4, 4, 5, 0}, {1, 5, 5, 6}, {1, -4, 3, -2}}; fdet[M] Det[M]
You can check the section on the determinant of a matrix for additional information on some of the properties of the determinant of a matrix.
Solving Systems of Linear Equations
Solving systems of linear equations has been the subject of study for many centuries. In particular, for every engineering discipline and almost every engineering application, systems of linear equations appear naturally with large numbers of unknowns. For example, the stiffness method in structural analysis as well as the finite element analysis method seek to find the solution of the linear system of equations where is a symmetric matrix of dimensions , is a vector of external forces and is the vector of the displacement of the structure nodes. The software used need to implement a method by which can be obtained given particular values for . An appropriate method to solve for a linear system of equations has to be computationally efficient. In the following, we will present a few methods to solve systems in which the number of equations is equal to the number of unknowns along with their pros and cons.
Cramer’s Rule
Cramer’s rule is one of the early methods to solve systems of linear equations . It is dated back to the eighteenth century! The method works really well; however, it is inefficient when the number of equations is very large. Nevertheless, with the current computing capabilities, the method can still be pretty fast. The method works using the following steps. First, the determinant of the matrix is calculated. Each component of the unknowns can be computed as:
where is the matrix formed by replacing the column in with the vector .
Example 1
Consider the linear system of equations defined as:
. We will form the following two matrices by replacing the first column in with and then the second column in with :
The determinants of these matrices are:
Therefore:
The following is code that can be used for Cramer’s rule for a two-dimensional system of equations.
View Mathematica CodeA = {{a11, a12}, {a21, a22}}; AT = Transpose[A] b = {b1, b2}; s1 = Drop[AT, {1}] s2 = Drop[AT, {2}] D1T = Insert[s1, b, 1] D2T = Insert[s2, b, 2] D1 = Transpose[D1T] D2 = Transpose[D2T] x1 = Det[D1]/Det[A] x2 = Det[D2]/Det[A]
Example 2
Consider the linear system of equations defined as:
. We will form the following three matrices by replacing the corresponding columns in with :
The determinants of these matrices are:
Therefore:
The following procedure applies the Cramer’s rule to a general system. Test the procedure to see how fast it is for higher dimensions.
View Mathematica Code
Cramer[A_, b_] := (n = Length[A]; Di = Table[A, {i, 1, n}]; Do[Di[[i]][[All, i]] = b, {i, 1, n}]; x = Table[Det[Di[[i]]]/Det[A], {i, 1, n}]) A = {{1, 2, 3, 5}, {2, 0, 1, 4}, {1, 2, 2, 5}, {4, 3, 2, 2}}; b = {-4, 8, 0, 10}; Cramer[A, b]
The following link provides the MATLAB code for implementing the Cramer’s rule.
Naive Gauss Elimination
The naive Gauss elimination is a procedure in which the linear system of equations is manipulated such that the coefficients of the component are eliminated from equation to equation . It would then be possible to solve for using the last equation, and then backward substitute to find the remaining components. The procedure is divided into two steps, forward elimination followed by backward substitution. We will illustrate the process in the following example. Consider the system of equations defined as:
Forward elimination: First, we will use the first equation to eliminate the coefficients of in equation 2 and 3. By multiplying the first equation by -1 and adding it to the second equation we get:
which becomes:
By multiplying the first equation by and adding it to the third equation we get:
We are now ready to eliminate the coefficient of from equation 3 by multiplying equation 2 by -3 and adding it to equation 3:
Backward substitution: Solving the above system becomes very simple. is straightforward, it is equal to . can be obtained from the second equation as: and finally . Therefore, the solution is:
Of course, as expected, if we multiply the original by we should get the matrix .
The Naive Gauss elimination method can be implemented in matrix form of dimensions as follows:
The following code implements the above calculations in Mathematica
View Mathematica Code
A = {{2, 2, 2}, {2, 3, 4}, {-1, 2, 3}}; b = {6, 4, 8}; Aa = {{2, 2, 2, b[[1]]}, {2, 3, 4, b[[2]]}, {-1, 2, 3, b[[3]]}}; Aa[[2]] = Aa[[2]] + -1*Aa[[1]] Aa[[3]] = Aa[[3]] + 1/2*Aa[[1]] Aa[[3]] = Aa[[3]] - 3 Aa[[2]] Aa // MatrixForm x3 = Aa[[3, 4]]/Aa[[3, 3]] x2 = (Aa[[2, 4]] - Aa[[2, 3]]*x3)/Aa[[2, 2]] x1 = (Aa[[1, 4]] - Aa[[1, 2]]*x2 - Aa[[1, 3]]*x3)/Aa[[1, 1]]
Programming the Method
The following are the steps to program the Naive Gauss Elimination method. Assuming an number of equations in unknowns of the form :
- Form the combined matrix
- Forward elimination: Use the pivot elements on the row as “Pivot” elements. Use the “Pivot” elements to eliminate the components with from to . This is done by iterating from to and for each row , doing the operation .
- Backward substitution: The element with running backwards from to 1 can be found using the equation:
The following is the code that relies on the “Sum” built-in function in Mathematica.
View Mathematica Code
(*Procedure*) GaussElimination[A_, b_] := (n = Length[A]; G = Transpose[Insert[Transpose[A], b, n + 1]]; For[k = 1, k <= n, k++, Do[G[[i]] = G[[i]] - G[[i, k]]/G[[k, k]]*G[[k]], {i, k + 1, n}]]; x = Table[0, {i, 1, n}]; For[i = n, i >= 1, i = i - 1, x[[i]] = (G[[i, n + 1]] - Sum[G[[i, l]]*x[[l]], {l, i + 1, n}])/G[[i, i]]]; x) (*An example*) A = {{1, 2, 3, 5}, {2, 0, 1, 4}, {1, 2, 2, 5}, {4, 3, 2, 2}}; b = {-4, 8, 0, 10}; (*Applying the procedure to the example*) N[GaussElimination[A, b]] GaussElimination[A, b]
The following is the code that does not rely on the built-in function in Mathematica.
View Mathematica Code(*Procedure*) GaussElimination[A_, b_] := (n = Length[A]; G = Transpose[Insert[Transpose[A], b, n + 1]]; For[k = 1, k <= n, k++, Do[G[[i]] = G[[i]] - G[[i, k]]/G[[k, k]]*G[[k]], {i, k + 1, n}]]; x = Table[0, {i, 1, n}]; For[i = n, i >= 1, i = i - 1, (BB = 0; Do[BB = G[[i, l]]*x[[l]] + BB, {l, i + 1, n}]; x[[i]] = (G[[i, n + 1]] - BB)/G[[i, i]])]; x) (*An example*) A = {{1, 2, 3, 5}, {2, 0, 1, 4}, {1, 2, 2, 5}, {4, 3, 2, 2}}; b = {-4, 8, 0, 10}; (*Applying the procedure to the example*) N[GaussElimination[A, b]] GaussElimination[A, b]
Naive?
The method described above is considered “naive” for two reasons. The first reason is that it fails to find a solution if a pivot element (one of the diagonal elements) is found to be zero. For example, consider the linear system:
This is really simple and the solution is just . However, the code fails to find a solution because when the second row is used for elimination, the pivot element is equal to zero, and so, an error is encountered. Try the above systems out and see the error that Mathematica spits out. This problem can be solved by simply exchanging rows 2 and 3. The same issue will be encountered for the following system:
In this case, one solution would be to rearrange the equations such that equation 3 would be equation 1. In general, the naive Gauss elimination code presented above can be augmented so that the pivot elements are chosen to be the largest elements in the column below. This augmentation is termed: “Partial Pivoting”. Partial Pivoting is performed by exchanging the rows. In this case, the order of the variables stays the same. In rare cases, there could be a need to exchange the columns as well; a procedure termed “Complete Pivoting”, but this adds the complication of re-arranging the variables.
The second reason why it is called naive is that the method essentially produces an upper triangular matrix that can be easily solved. It is important to note that if the system is already a lower triangular matrix, then there is no need to employ the forward elimination procedure, because we can simply use forward substitution to find in this order.
Consider the following system:
It is straightforward that , , and . However, employing the naive Gauss elimination would conduct the following steps:
Gauss-Jordan Elimination
Another method that is rooted in the Gauss elimination method is the Gauss-Jordan elimination method. Two steps are added to the previous algorithm. The first step is that each pivot row is normalized by the pivot element. The second step is that the coefficients above the pivot element are also eliminated. This results in not needing the backward substitution step. The following example illustrates the method. Consider the system of equations defined as:
The first row is normalized:
The first (pivot) row is used to eliminate the coefficients of in the second and third rows:
The second row is already normalized and can be used to eliminate the coefficients of in the third equation:
It will also be used to eliminate the coefficients of in the first equation:
The third pivot element will now be normalized:
The third row is used to eliminate the coefficients of in the second and first equations:
The right hand side is the required solution! In matrix form the following are essentially the steps above:
The same issue of zero pivot applies to this method and another step of “pivoting” could be added to ensure no zero pivot is found in the matrix.
View Mathematica CodeGJElimination[A_, b_] := (n = Length[A]; G = Transpose[Insert[Transpose[A], b, n + 1]]; For[k = 1, k <= n, k++, (G[[k]] = G[[k]]/G[[k, k]]; Do[G[[i]] = G[[i]] - G[[i, k]]*G[[k]], {i, 1, k - 1}]; Do[G[[i]] = G[[i]] - G[[i, k]]*G[[k]], {i, k + 1, n}])]; x = G[[All, n + 1]]) A = {{1, 2, 3, 5}, {2, 0, 1, 4}, {1, 2, 2, 5}, {4, 3, 2, 2}}; b = {-4, 8, 0, 10}; GJElimination[A, b]
LU Decomposition
Another method that is comparable in efficiency and speed to the Gauss elimination methods stated above is the LU decomposition. It was implied when the Gauss elimination method was introduced that if a linear system of equations is such that the matrix is an upper triangular matrix, then the system can be solved directly using backward substitution. Similarly, if a linear system of equations is such that the matrix is a lower triangular matrix, then the system can be solved directly using forward substitution. The following two examples illustrate this.
Solving Lower Triangular Systems
Consider the linear system defined as:
which in matrix form has the form:
The matrix is a lower triangular matrix. The system can be solved by forward substitution. Namely, the first equation gives . The second equation gives: . The third equation gives: . Forward substitution can be programmed in a very fast and efficient manner. Consider the following system:
Therefore:
This can be written in the following simple form with starting at to :
Solving Upper Triangular Systems
Similarly, consider the linear system defined as:
which in matrix form has the form:
The matrix is an upper triangular matrix. The system can be solved by backward substitution. Namely, the third equation gives . The second equation gives: . The first equation gives: . Backward substitution can be easily programmed in a very fast and efficient manner. Consider the following system:
Therefore:
This can be written in the following simple form with starting at and decreasing to :
LU Decomposition Example
We will first introduce the LU decomposition using an example, consider the system defined as:
Without explaining how at this point, we can decompose the matrix as the multiplication of two matrices where is a lower triangular matrix and is an upper triangular matrix. For this particular example, and have the forms:
The reader should verify that indeed . The linear system of equations can then be written as:
The above system can be solved quickly as follows. In the equation , set , then . Since is a lower triangular matrix, we can solve for easily. Then, since and is an upper triangular matrix, we can solve for easily. The equation has the form:
The above system can be solved using forward substitution. Therefore . The second equation yields: . The third equation yields: . We can now solve as follows:
This system can be solved using backward substitution. The third equation yields . The second equation yields . The third equation yields . This example implies that if the matrix can be decomposed such that where is a lower triangular matrix and is an upper triangular matrix, then the system can be easily solved using forward and backward substitution. It turns out that such decomposition always exists provided that the rows of the matrix are properly ordered (by that we mean there are no zero pivots during the Gauss elimination step).
Formal Representation
The following is a more formal representation of the method. Let be a linear system of equations. Let be an LU decomposition for with being a lower triangular matrix and being an upper triangular matrix. Then: . Set , then . The elements of the vector can be solved using forward substitution as explained above. Then, the system can be solved using backward substitution as explained above.
LU Decomposition Algorithm
There are various algorithms to find the decomposition . Mathematica has a built-in function to find the LU decomposition called “LUDecomposition”. Here we present the most common algorithm. In the first step, using Gauss elimination, the matrix is turned into an upper triangular matrix which is the matrix . The matrix has values of 1 in the diagonal entries. The entries that are below the diagonals are such that negative the factor that was used to reduce row using the pivot row during the Gauss elimination step. The following example illustrates the process. Consider the matrix
To find both and we will write the identity matrix on the left and the matrix on the right:
Gauss elimination is then used to eliminate the coefficient by multiplying row 1 by and then adding it to row 2. Then, the coefficient
Gauss elimination is then used to eliminate the coefficient by multiplying row 1 by and then adding it to row 3. Then, the coefficient
Gauss elimination is then used to eliminate the coefficient by multiplying row 2 by and then adding it to row 3. Then, the coefficient
The matrix on the left hand side is and the one on the right hand side is .
Programming the Method
Programming the LU decomposition method requires three steps. In the first one, the Gauss elimination is used to create the and matrices. Then, forward substitution is used to solve for the vector in . Then, backward substitution is used to solve for the vector in . The following code has two procedures, the first one takes the matrix and produces the two matrices and . The second procedure utilizes the first procedure to produce and and then uses the backward and forward substitutions to find the vectors and then .
View Mathematica CodeLU[A_] := (n = Length[A]; L = IdentityMatrix[n]; U = A; For[k = 1, k <= n, k++, Do[L[[i, k]] = U[[i, k]]/U[[k, k]];U[[i]] = U[[i]] - U[[i, k]]/U[[k, k]]*U[[k]], {i, k + 1, n}]]; {L, U}) LUAB[A_, b_] := (n = Length[A]; L = LU[A][[1]]; U = LU[A][[2]]; y = Table[0, {i, 1, n}]; x = Table[0, {i, 1, n}]; For[i = 1, i <= n,i = i + 1, (BB = 0; Do[BB = L[[i, l]]*y[[l]] + BB, {l, 1, i - 1}]; y[[i]] = (b[[i]] - BB)/L[[i, i]])]; For[i = n, i >= 1, i = i - 1, (BB = 0; Do[BB = U[[i, l]]*x[[l]] + BB, {l, i + 1, n}]; x[[i]] = (y[[i]] - BB)/U[[i, i]])]; x) A = {{1, 2, 3, 5}, {2, 0, 1, 4}, {1, 2, 2, 5}, {4, 3, 2, 2}}; b = {-4, 8, 0, 10}; LUAB[A, b]
Comparison with Previous Methods
In comparison to the naive Gauss elimination method, this method reduces the computations in the forward elimination step by conducting the forward elimination step on the matrix without adding the vector . However, it adds another step which is the forward substitution step described above. The advantage of this method is that no manipulations are done on the vector . So, if multiple solutions corresponding to multiple possible values of the vector are sought, then this method could be faster than the Gauss elimination methods.
Cholesky Factorization for Positive Definite Symmetric Matrices
Positive definite symmetric matrices appear naturally in many engineering problems. In particular, the stiffness matrices of elastic structures that appear in the stiffness method and the finite element analysis method for structural engineering applications are positive definite symmetric matrices. Positive definite symmetric matrices, as shown in this section have positive eigenvalues and so working with them can be very easy. Also, being symmetric, only one half of the off diagonal components need to be stored which reduces the computational time significantly. The Cholesky Factorization is a particular type of LU decomposition that can be applied to symmetric matrices. In particular, a positive definite symmetric matrix can be decomposed such that:
Where is a lower triangular matrix and its transpose is an upper triangular matrix. For example, the following symmetric matrix is positive definite (because all its eigenvalues when calculated are positive):
admits the following decomposition :
This decomposition is similar to the LU decomposition and if a linear system is such that is positive definite, then the Cholesky decomposition enables the quick calculation of and then the backward and forward substitution can be used to solve the system (as shown in the LU decomposition section). The factorization can be found by noticing that the diagonal entry of admits the following:
where the fact that whenever was used. Therefore:
(1)
On the other hand, an off diagonal component admits the following:
where again, the fact that whenever was used. Therefore:
(2)
Notice that for the factorization to work, we can’t have negative values for or zero values for which is guaranteed not to occur because is positive definite.
Programming
Equations 1 and 2 can be programmed by first setting , then iterating from 2 to . Then, setting and iterating from 3 to and so on. As an illustration, we will apply the algorithm for
Setting in Equation 1 yields:
Iterating from 2 to 3 in Equation 2 yields:
Setting in Equation 1 yields:
Iterating from 3 to 3 in Equation 2 yields:
Finally, setting in Equation 1 yields:
Which yields the following matrix :
The following is the Mathematica code for the algorithm, along with the code to solve a linear system of equations with being positive definite. The algorithm is divided into two procedures. The first procedure calculates the Cholesky Decompsition of a matrix and the second procedure uses and for backward and forward substitution as described in the LU decomposition. It should be noted that in general the Cholesky Decomposition should be much more efficient than the LU decomposition for positive definite symmetric matrices. However, the code below will need to be optimized to achieve that.
View Mathematica CodeCD[A_] := (n = Length[A]; U = Table[0, {i, 1, n}, {j, 1, n}]; Do[U[[i, i]] = Sqrt[A[[i, i]] - Sum[U[[k, i]] U[[k, i]], {k, 1, i - 1}]];U[[i, j]] = (A[[i, j]] - Sum[U[[k, i]] U[[k, j]], {k, 1, i - 1}])/U[[i, i]], {i, 1, n}, {j, i, n}]; U) CDAB[A_, b_] := (n = Length[A]; U = CD[A]; L = Transpose[U]; y = Table[0, {i, 1, n}]; x = Table[0, {i, 1, n}]; For[i = 1, i <= n,i = i + 1, (BB = 0; Do[BB = L[[i, l]]*y[[l]] + BB, {l, 1, i - 1}]; y[[i]] = (b[[i]] - BB)/L[[i, i]])]; For[i = n, i >= 1, i = i - 1, (BB = 0; Do[BB = U[[i, l]]*x[[l]] + BB, {l, i + 1, n}]; x[[i]] = (y[[i]] - BB)/U[[i, i]])]; x) A = {{39, 25, 36, 26}, {25, 21, 24, 18}, {36, 24, 34, 24}, {26, 18, 24, 33}}; b = {-196, 392, 0, 490} N[CDAB[A, b]] FullSimplify[CDAB[A, b]]
The above code relies on using the “Sum” command in Mathematica which is not very efficient. The following code is more efficient and uses the “Do” command for the sum function:
View Mathematica Code
CD[A_] := (n = Length[A]; U = Table[0, {i, 1, n}, {j, 1, n}]; Do[BB = 0; Do[BB = BB + U[[k, i]] U[[k, i]], {k, 1, i - 1}]; U[[i, i]] = Sqrt[A[[i, i]] - BB]; Do[BB = 0; Do[BB = BB + U[[k, i]] U[[k, j]], {k, 1, i - 1}]; U[[i, j]] = (A[[i, j]] - BB)/U[[i, i]], {j, i + 1, n}], {i, 1, n}]; U) CDAB[A_, b_] := (n = Length[A]; U = CD[A]; L = Transpose[U]; y = Table[0, {i, 1, n}]; x = Table[0, {i, 1, n}]; For[i = 1, i <= n, i = i + 1, (BB = 0; Do[BB = L[[i, l]]*y[[l]] + BB, {l, 1, i - 1}]; y[[i]] = (b[[i]] - BB)/L[[i, i]])]; For[i = n, i >= 1,i = i - 1, (BB = 0; Do[BB = U[[i, l]]*x[[l]] + BB, {l, i + 1, n}]; x[[i]] = (y[[i]] - BB)/U[[i, i]])]; x) A = {{39, 25, 36, 26}, {25, 21, 24, 18}, {36, 24, 34, 24}, {26, 18, 24, 33}}; b = {-196, 392, 0, 490} N[CDAB[A, b]] FullSimplify[CDAB[A, b]]
Mathematica
Mathematica can be used to calculate the Cholesky Factorization using the function “CholeskyDecomposition[]” which outputs the matrix in the decomposition .
View Mathematica Code
A={{1,3,2},{3,13,8},{2,8,6}} U=CholeskyDecomposition[A] Transpose[U].U//MatrixForm
Iterative Methods to Solve Linear Systems
In the previous section, we introduced methods that produced an exact solution for the determined linear system . These methods relied on exactly solving the set of equations at hand. There are other “numerical techniques” that involve iterative methods that are similar to the iterative methods shown in the root finding methods section. When is relatively large, and when the matrix is banded, then these methods might become more efficient than the traditional methods above.
The Jacobi Method
The Jacobi method is named after Carl Gustav Jacob Jacobi. The method is akin to the fixed-point iteration method in single root finding described before. First notice that a linear system of size can be written as:
The left hand side can be decomposed as follows:
Effectively, we have separated into two additive matrices: where has zero entries in the diagonal components and is a diagonal matrix. If we start with nonzero diagonal components for , then is a diagonal matrix with nonzero entries in the diagonal and can easily be inverted and its inverse is:
Therefore:
This form is similar to the fixed-point iteration method. By assuming initial guesses for the components of the vector and substituting in the right hand side, then a new estimate for the components of can be computed. In addition to having non-zero diagonal components for , there are other requirements for the matrix for this method to converge to a proper solution which are beyond the scope of these notes. One fact that is useful is that this method will converge if the diagonal components of are large compared to the rest of the matrix components. The criteria for stopping this algorithm will be based on the size or the norm of the difference between the vector in each iteration. For this, we can use the Euclidean norm. So, if the components of the vector after iteration are , and if after iteration the components are: , then, the stopping criterion would be:
where
Note that any other norm function can work as well.
Example
Consider the system defined as:
The Jacobi method with a stopping criterion of will be used. First the system is rearranged to the form:
Then, the initial guesses for the components are used to calculate the new estimates:
The relative approximate error in this case is
The next iteration:
The relative approximate error in this case is
The third iteration:
The relative approximate error in this case is
The fourth iteration:
The relative approximate error in this case is
Therefore convergence has been achieved. The exact solution is in fact:
Programming the Method
We will use the built-in Norm function for the stopping criteria. In the following code, the procedure “J” takes the matrix , the vector , and the guess to return a new guess for the vector . The maximum number of iterations is 100 and the stopping criteria are either the maximum number of iterations is reached or :
View Mathematica Code
J[A_, b_, x_] := ( n = Length[A]; InvD = Table[0, {i, 1, n}, {j, 1, n}]; R = A; Do[InvD[[i, i]] = 1/A[[i, i]]; R[[i, i]] = 0, {i, 1, n}]; newx = InvD.(b - R.x) ) A = {{3, -0.1, -0.2}, {0.1, 7, -0.3}, {0.3, -0.2, 10}} b = {7.85, -19.3, 71.4} x = {{1, 1, 1.}}; MaxIter = 100; ErrorTable = {1}; eps = 0.001; i = 2; While[And[i <= MaxIter, Abs[ErrorTable[[i - 1]]] > eps], xi = J[A, b, x[[i - 1]]]; x = Append[x, xi]; ei = Norm[x[[i]] - x[[i - 1]]]/Norm[x[[i]]]; ErrorTable = Append[ErrorTable, ei]; i++] x // MatrixForm ErrorTable // MatrixForm
The Gauss-Seidel Method
The Gauss-Seidel method offers a slight modification to the Jacobi method which can cause it to converge faster. In the Gauss-Seidel method, the system is solved using forward substitution so that each component uses the most recent value obtained for the previous component. This is different from the Jacobi method where all the components in an iteration are calculated based on the previous iteration. First notice that a linear system of size can be written as:
The left hand side can be decomposed as follows:
Effectively, we have separated into two additive matrices: where is an upper triangular matrix with zero diagonal components, while is a lower triangular matrix with its diagonal components equal to those of the diagonal components of . If we start with nonzero diagonal components for , then can be used to solve the system using forward substitution:
the same stopping criterion as in the Jacobi method can be used for the Gauss-Seidel method.
Example
Consider the system defined as:
The Gauss-Seidel method with a stopping criterion of will be used. First the system is rearranged to the form:
Then, the initial guesses for the components are used to calculate the new estimates using forward substitution. Notice that in forward substitution, the value of uses the value for and that uses the values of and :
The relative approximate error in this case is
For iteration 2:
The relative approximate error in this case is
For iteration 3:
The relative approximate error in this case is
And for iteration 4:
The relative approximate error in this case is
Therefore, the system converges after iteration 4 similar to the Jacobi method. It is not obvious in this example, but the Gauss-Seidel method can converge faster than the Jacobi method.
Programming
Using forward substitution in the equation , then, the component in iteration can be computed as follows:
(3)
The following code utilizes the procedure “GS” that takes a matrix , a vector , and a vector of some guesses to return the new guess for the components of the vector .
View Mathematica Code
GS[A_, b_, x_] := ( n = Length[A]; xnew = Table[0, {i, 1, n}]; Do[xnew[[i]] = (b[[i]] - Sum[A[[i, j]] x[[j]], {j, i + 1, n}] - Sum[A[[i, j]] xnew[[j]], {j, 1, i - 1}])/A[[i, i]], {i, 1, n}]; xnew ) A = {{3, -0.1, -0.2}, {0.1, 7, -0.3}, {0.3, -0.2, 10}} b = {7.85, -19.3, 71.4} x = {{1, 1, 1.}}; MaxIter = 100; ErrorTable = {1}; eps = 0.001; i = 2; While[And[i <= MaxIter, Abs[ErrorTable[[i - 1]]] > eps], xi = GS[A, b, x[[i - 1]]]; x = Append[x, xi]; ei = Norm[x[[i]] - x[[i - 1]]]/Norm[x[[i]]]; ErrorTable = Append[ErrorTable, ei]; i++] x // MatrixForm ErrorTable // MatrixForm
Convergence of the Jacobi and the Gauss-Seidel Methods
If the matrix is diagonally dominant, i.e., the values in the diagonal components are large enough, then this is a sufficient condition for the two methods to converge. In particular, if every diagonal component satisfies , then, the two methods are guaranteed to converge. Generally, when these methods are used, the programmer should first use pivoting (exchanging the rows and/or columns of the matrix ) to ensure the largest possible diagonal components. Use the code above and see what happens after 100 iterations for the following system when the initial guess is :
The system above can be manipulated to make it a diagonally dominant system. In this case, the columns are interchanged and so the variables order is reversed:
This system will now converge for any choice of an initial guess!
Successive Over-Relaxation (SOR) Method
The successive over-relaxation (SOR) method is another form of the Gauss-Seidel method in which the new estimate at iteration for the component is calculated as the weighted average of the previous estimate and the estimate using Gauss-Seidel :
where can be obtained using Equation 3 above. The weight factor is chosen such that . If , then the method is exactly the Gauss-Seidel method. Otherwise, to understand the effect of we will rearrange the above equation to:
If , then, the method gives a new estimate that lies somewhere between the old estimate and the Gauss-Seidel estimate, in this case, the algorithm is termed: “under-relaxation” (Figure 1). Under-relaxation can be used to help establish convergence if the method is diverging. If , then the new estimate lies on the extension of the vector joining the old estimate and the Gauss-Seidel estimate and hence the algorithm is termed: “over-relaxation” (Figure 1). Over-relaxation can be used to speed up convergence of a slow-converging process. The following code defines and uses the SOR procedure to calculate the new estimates. Compare with the Gauss-Seidel procedure shown above.
View Mathematica Code
SOR[A_, b_, x_, w_] := (n = Length[A]; xnew = Table[0, {i, 1, n}]; Do[xnew[[i]] = (1 - w)*x[[i]] + w*(b[[i]] - Sum[A[[i, j]] x[[j]], {j, i + 1, n}] - Sum[A[[i, j]] xnew[[j]], {j, 1, i - 1}])/A[[i, i]], {i, 1, n}]; xnew) A = {{3, -0.1, -0.2}, {0.1, 7, -0.3}, {0.3, -0.2, 10}} b = {7.85, -19.3, 71.4} x = {{1, 1, 1.}}; MaxIter = 100; ErrorTable = {1}; eps = 0.001; i = 2; omega=1.1; While[And[i <= MaxIter, Abs[ErrorTable[[i - 1]]] > eps], xi = SOR[A, b, x[[i - 1]],omega]; x = Append[x, xi]; ei = Norm[x[[i]] - x[[i - 1]]]/Norm[x[[i]]]; ErrorTable = Append[ErrorTable, ei]; i++] x // MatrixForm ErrorTable // MatrixForm
Problems
- Create a recursive function in Mathematica that calculates the determinant and compare it with the built-in determinant function. Test your function on matrices in for , and .
- Use Cramer’s rule to solve the linear system of equations defined as:
- Use the naive Gauss elimination method to solve the linear system of equations defined as:
- Use the Cramer’s rule code, the naive Gauss elimination method code, and the Gauss-Jordan elimination method code given above to test how many “simple” equations the algorithm can solve in less than a minute for each method. You can do so by constructing the identity matrix of dimension (using the command IdentityMatrix[n]) and a vector of values 1 of dimension and then running the code. The solution to this should be . Repeat for higher values of . If you surround your code with the command Timing[] it should give you an indication of how long the computations are taking. Draw the curve between the size of the system and the timing taken by each method to find the solution. Is the relationship between the time taken to solve the system of equations and the size of the matrix linear or nonlinear?
- Repeat problem 4, however, in this case, use the commands:
A = RandomReal[{-1, 1}, {n, n}]; b = RandomReal[{-1, 1}, n];
to generate a matrix that is with random coefficients between -1 and 1 and a vector with random numbers between -1 and 1. Does this affect the results of the time taken to solve the system as a function of the matrix size and the method used?
- Using random numbers similar to the previous problem, draw the curve showing the time taken to solve the system of equations as a function of the matrix size. Compare the LU decomposition, the naive Gauss elimination, and the Gauss-Jordan method. Try to optimize the code of the three methods and see if the generated curves are affected.
- Calculate the Cholesky Decomposition for the following positive definite symmetric matrix:
- Calculate the Cholesky Decomposition for the following positive definite symmetric matrix:
Employ the results to find the solution to the system of equations:
- Find the LU factorization of the following matrix:
Employ the results to find two solutions for the vector corresponding to the following two linear systems of equations:
- Using the “Timing” keyword in Mathematica, compare the efficiency of the Cholesky Decomposition code provided above in comparison the built-in “CholeskyDecomposition” function in Mathematica by finding the decomposition for the identity matrix of size . Can you produce a more efficient version of the provided code?
- Use the “CholeskyDecomposition” built-in function in Mathematica to produce a code to utilize the Cholesky Decomposition method to solve the linear system when is positive definite symmetric matrix. Compare with the Gauss elimination, the Gauss-Jordan elimination and the LU decomposition methods when solving the simple system of equations where is a vector whose components are all equal to 1. Similar to problem 4, draw the relationship showing the timing as a function of the size of the matrix .
- Use Mathematica to program the Jacobi method such that the code would re-arrange the rows of the matrix to ensure that the diagonal entries are all non-zero.
- Compare the Jacobi method and the Gauss elimination method to solve the simple system of equations where is a vector whose components are all equal to 5. Similar to problem 4, draw the relationship showing the timing as a function of the size of the matrix . What do you notice about the Jacobi method in this case?
- Use the Gauss-Seidel method to find a solution to the following system after converting it into a diagonally dominant system:
- The following system is not diagonally dominant. Check whether the Jacobi and Gauss-Seidel methods converge or not using an initial guess of the zero vector.
- Assuming a stopping criterion of , find the optimal value of in the SOR method producing the fastest convergence in finding the solution to the following system. Compare with the Gauss-Seidel method.
- Use the Gauss-Seidel method with to solve the following system of linear equations:
- An electrical engineer supervises the production of three types of electrical components. Three kinds of material: metal, plastic, and rubber are required for production. The amounts needed to produce one unit of component 1 are 15 g of metal, 0.25 g of plastic, and 1.0 g of rubber. The amounts needed to produce one unit of component 2 are 17g of metal, 0.33g of plastic, and 1.2g of rubber. The amounts needed to produce one unit of component 3 are 19 g of metal, 0.42 g of plastic, and 1.6 g of rubber. If a total of 2.12, 0.0434, and 0.164 kg of metal, plastic, and rubber, respectively, are available each day. How many units of each component on average can be produced every day?
- Use the Gauss-Jordan elimination method to solve the following linear system of equations: