Numerical with NumLib | Examples¶
The notes and examples on this page are based on the Free Pascal NumLib official doc.
Credits
Thanks to Kees van Ginneken, Wil Kortsmit and Loek van Reij of the Computational centre of the Eindhoven University of Technology for making it available for the Free Pascal project.
Thanks to Marco van de Voort (marco@freepascal.org) and Michael van Canneyt (michael@freepascal.org) for porting to FPC and documenting NumLib.
Note
The code snippets are not exactly the same as the ones in the Free Pascal NumLib official doc. I modified them for my learning.
Routines in NumLib
- Basic operations with matrices and vectors - unit
omv
- Calculation of determinants - unit
det
- Matrix inversion - unit
inv
- Solution of systems of linear equations - unit
sle
- Calculation of eigenvalues - unit
eig
- Finding roots of equations - unit
roo
- Calculates integrals - unit
int
- Oridnary differential equations - unit
ode
- Fitting routines - unit
ipf
- Calculation of special functions - unit
spe
Unit omv
- Operations with Matrices and Vectors¶
Inner Product of Two Vectors¶
NumLib provides the function omvinp
for calculation of the inner product:
Parameters
a
andb
are the first elements of 1-dimensional arrays representing the vectors \(a\) and \(b\), respectively.n
defines the dimension of the vectors (count of array elements). Both vectors must have the same number of elements.
Example
Calculate the dot product of these vectors.
Product of a Matrix with a Vector¶
Example
Calculate the product of the following.
Transpose Matrix¶
The transpose matrix \(A^{T}\) of matrix \(A\) is obtained by flipping rows and columns:
Use the procedure omvtrm to perform this operation with NumLib:
Parameters
a
is the first element of the input matrix \(A\). The elements of this array are not changed by the procedure.m
is the number of rows of matrix \(A\).n
is the number of columns of matrix \(A\).rwa
is the number of allocated columns of \(A\). This way the array of \(A\) can be larger than acutally needed.c
is the first element of the transposed matrix \(A^{T}\). It has \(n\) rows and \(m\) columns.rwc
is the number of allocated columns for the transposed matrix.
Example
Transpose the following 2 x 4 matrix.
Norm of Vectors and Matrices¶
A norm assigns a positive "length" to a vector or matrix.
For vectors, NumLib
supports these norms:
- The 1-norm is the sum of the absolute values of vector components, also known as the "Taxicab" or "Manhattan" norm because it represents the distance a taxi drives in a grid.
- The 2-norm (Euclidean norm) is the distance from point a to the origin.
- The infinity norm is the largest absolute value of the vector components.
For matrices, NumLib
defines norms based on rows or columns:
- The 1-norm is the largest absolute column sum.
- The infinity norm is the largest absolute row sum.
- The Frobenius norm sums the squares of all matrix elements, similar to the vector 2-norm.
These are the NumLib routines for calculating norms:
function omvn1v(var a: ArbFloat; n: ArbInt): ArbFloat; // 1-norm of a vector
function omvn1m(var a: ArbFloat; m, n, rwidth: ArbInt): ArbFloat; // 1-norm of a matrix
function omvn2v(var a: ArbFloat; n: ArbInt): ArbFloat; // 2-norm of a vector
function omvnfm(Var a: ArbFloat; m, n, rwidth: ArbInt): ArbFloat; // Frobenius norm of a matrix
function omvnmv(var a: ArbFloat; n: ArbInt): ArbFloat; // Maximum infinite norm of a vector
function omvnmm(var a: ArbFloat; m, n, rwidth: ArbInt): ArbFloat; // Maximum infinite norm of a matrix
Parameters
a
: First element of the vector or matrix for norm calculation.m
: Number of rows (for matrix norms).n
: Number of elements (for vector norms) or columns (for matrix norms).rwidth
: Allocated column count, allowing for a larger matrix than required.
Example
Determinant of a Matrix¶
Coming soon.
Inverse of a Matrix¶
Let's say we have a square matrix \( A \). The matrix \( A^{-1} \) is called the inverse of \( A \) if, when you multiply \( A^{-1} \) by \( A \), you get the identity matrix \( I \). The identity matrix is special because it has 1s on the diagonal (from the top left to bottom right) and 0s everywhere else.
The inverse of a matrix only exists if the determinant of \( A \) is not zero. If the determinant is zero, \( A \) doesn't have an inverse.
Using NumLib to Find the Inverse¶
NumLib is a tool that has several ways (called routines) to calculate the inverse of a matrix. Which method it uses depends on what kind of matrix you have. All the routines expect the matrix to be stored in a common way—as a 2D array of real numbers, or a 1D array of numbers arranged in rows and columns.
Here are some routines that NumLib uses:
- invgen: This routine works for any square matrix.
- invgsy: This routine works for matrices that are symmetric (meaning the left side mirrors the right side).
- invgpd: This routine works for symmetric positive definite matrices (a special kind of symmetric matrix).
procedure invgen(n, rwidth: ArbInt; var ai: ArbFloat; var term: ArbInt); // generic matrix
procedure invgsy(n, rwidth: ArbInt; var ai: ArbFloat; var term: ArbInt); // symmetric matrix
procedure invgpd(n, rwidth: ArbInt; var ai: ArbFloat; var term: ArbInt); // symmetric positive definite matrix
Parameters
- \( n \): The size of the matrix.
- \( rwidth \): The width of the rows (usually, this is the same as \( n \)).
- \( ai \): The first element of the matrix (the top-left corner).
- \( term \): A number that tells if the calculation was successful:
- 1: Success! The inverse was found.
- 2: The inverse couldn't be found because the matrix is too close to being singular (meaning its determinant is almost zero).
- 3: There was a mistake with the input data (like if \( n \) was less than 1).
Example
Calculate the inverse of the symmetric matrix.
Note, this matrix is symmetric and positive definite (the product \(b^{T} A b\) with any vector \(b = [b1, b2, b3]^{T}\) cannot become zero or negative since all elements of this matrix are positive). Therefore, invgpd
is best-suited for this task although the other routines can be used as well (uncomment their calls below).
Unit sle
- Solving Systems of Linear Equations¶
A system of linear equations (or linear system) is a collection of two or more linear equations involving the same set of variables, \(x_1 ... x_n\).
In order to find a solution which satisfies all equations simultaneously the system of equations is usually converted to a matrix equation \(A x = b\), where
NumLib has several procedures to solve the matrix equation depending on the properties of the matrix.
Square Matrices¶
Pass the matrix to the procedures in the standard NumLib way as the first element of a 2D or 1D array. The 2D array must be dimensioned to contain at least \(n\) rows and \(n\) columns, the 1D array must contain at least \(n^2\) elements.
procedure slegen(n, rwidth: ArbInt; var a, b, x, ca: ArbFloat; var term: ArbInt); // generic matrix
procedure slegsy(n, rwidth: ArbInt; var a, b, x, ca: ArbFloat; var term: ArbInt); // symmetric matrix
procedure slegpd(n, rwidth: ArbInt; var a, b, x, ca: ArbFloat; var term: ArbInt); // symmetric positive definite matrix
Parameters
slegen
is the General-purpose for any square matrix (Gaussian elimination with partial pivoting).
slegsy
is specialised for symmetric matrices (more stable reduction to tridiagonal form).
slegpd
is optimisedfor symmetric positive definite matrices (Cholesky decomposition).
n
is the matrix size.rwidth
is the number of allocated columns (must ben <= rwidth
).a
is the first element of the coefficient matrix (unchanged during calculation).b
is the constant vector (also unchanged).x
is where the solution vector will be stored.ca
indicates solution accuracy.term
returns an error code:1
: Success, solution inx
.2
: Matrix is nearly singular, no solution.3
: Invalid input,n < 1
.
Example
Solve this system of linear equations:
Band Matrix¶
slegba
is the optimised solution for band matrices, i.e. matrices in which all elements are zero outside a band of width l
below and of width r
above the main diagonal.
Warning
Note that this routine cannot work with a 2D array.
Parameters
n
is the number of columns and rows of the matrix (it must be a square matrix).l
is the left bandwidth, i.e. the number of diagonals the band extends below (or to the left of) the main diagonal.r
is the right bandwidth, i.e. the number of diagonals the band extends above (or to the right of) the main diagonsl.a
is the first element of a 1D array which contains the elements of the diagonal band, see this.- This array contains only the band elements and is obtained by running across the rows of the band matrix from left to right and top to bottom, starting at \(A_11\).
- It must be dimensioned to contain at least
n*(l+1+r) - (r*(r+1)+l*(l+1)) div 2
elements. - Note that a 2D array cannot be used for this routine.
b
is the first element of the array containing the constant vector \(b\). The array length at least must be equal to \(n\). The vector will not be changed during the calculation.x
is the first element of the array to receive the solution vector \(x\). It must be allocated to contain at least \(n\) values.ca
is a parameter to describe the accuracy of the solution.term
returns an error code:- 1 - successful completion, the solution vector \(x\) is valid
- 2 - the solution could not have been determined because the matrix is (almost) singular.
- 3 - error in input values: n < 1, l < 0, l >= n, r < 0, or r >= n
Example
Solve this system of linear equations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
Symmetric Positive Definite Band Matrix¶
slegpb
is the optimised solution for symmetric positive band matrices.
Warning
Note that this routine cannot work with a 2D array.
Parameters
n
is the number of rows and columns in the matrix (the matrix must be square).w
is the bandwidth (one-sided), which is the number of diagonals that extend below (or to the left of) the main diagonal. This value is also the number of diagonals that extend above (or to the right of) the main diagonal.a
points to the first element of a 1D array that contains the elements of the diagonal band.- This array holds only the band elements and is constructed by traversing the rows of the band matrix from top-left to bottom-right, stopping at the main diagonal.
- As a result, it contains only the left band and the main diagonal elements.
- The array must be dimensioned to hold at least
n*(w+1) - (w*(w+1)) div 2
elements. - Note that this routine cannot work with a 2D array.
b
points to the first element of the array holding the constant vector \(b\). The length of this array must be at least \(n\), and the vector remains unchanged during the computation.x
is the first element of the array where the solution vector \(x\) will be stored. It must be large enough to hold at leastn
values. If term is not equal to 1, the solution array may contain invalid data.ca
specifies the accuracy of the solution.term
is an error code that indicates the result of the calculation:- 1: The computation was successful, and the solution vector \(x\) is valid.
- 2: The matrix is (nearly) singular, so a solution could not be determined.
- 3: There was an error in the input values (e.g.,
n < 1
,w < 0
, orw >= n
).
Example
Solve this system of linear equations.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
Tridiagonal Matrix¶
NumLib provides two special procedures to solve linear equations based on a tridiagonal matrix. These procedures use an efficient way to store tridiagonal matrices, with slight differences in their calculation methods and handling of certain cases.
sledtr(n: ArbInt; var l, d, u, b, x: ArbFloat; var term: ArbInt)
slegtr(n: ArbInt; var l, d, u, b, x, ca: ArbFloat; var term: ArbInt)
Parameters
n
is the number of unknown variables. It must be the same as the number of columns and rows of the coefficent matrix.l
specifies the first element in the subdiagonal of the matrix \(A\). This 1D array must be dimensioned to at leastn-1
elements.d
specifies the first element along the main diagonal of the matrix \(A\). This 1D array must be dimensioned to contain at leastn
elements.u
specifies the first element along the superdiagonal of the matrix \(A\). The array must be dimensioned to have at leastn-1
elements.b
is the first element of the array containing the constant vector \(b\). The array length at least must be equal ton
. The vector will not be changed during the calculation.x
is the first element of the array to receive the solution vector \(x\). It must be allocated to contain at leastn
values.ca
is a parameter to describe the accuracy of the solution.term
returns an error code:1
- successful completion, the solution vector \(x\) is valid2
- the solution could not have been determined because the matrix is (almost) singular.3
- error in input values:n < 1
.
The sledtr
routine is numerically stable if matrix \(A\) fulfills one of these conditions:
- Matrix \(A\) is regular (i.e. its inverse matrix exists), and \(A\) is columnar-diagonally dominant, this means:
- |d1| ≥ |l2|,
- |di| ≥ |ui-1| + |li+1|, i = 2, ..., n-1,
- |dn| ≥ |un-1|
- Matrix \(A\) is regular, and \(A\) is diagonally dominant, this means:
- |d1| ≥ |l2|,
- |di| ≥ |ui| + |li|, i = 2, ..., n-1,
- |dn| ≥ |un|
- Matrix \(A\) is symmetric and positive-definite.
Note, the sledtr
routine does not provide the parameter ca
from which the accuracy of the determined solution can be evaluated. If this is needed the (less stable) procedure slegtr
must be used.
Example
Solve this tridiagonal system of linear equations where \(n=8\).
Overdetermined Systems (Least Squares)¶
Unlike other routines in the sle unit that require a square matrix \(A\), slegls
can solve systems with a rectangular matrix, where there are more equations than unknowns. These systems usually can't be solved exactly. However, an approximate solution can minimize the sum of squared residuals, i.e. the norm \(\displaystyle{ \|\mathbf{b} - A \mathbf{x}\|_2 }\) is as small as possible. This method is commonly used for fitting equations to data (regression analysis).
Parameters
a
is the first element of an array of matrix \(A\). The array won't be modified.m
is the number of rows in matrix \(A\) (i.e., the number of equations).n
is the number of columns in matrix \(A\) (i.e., the number of unknown variables).- Note: \(n\) must not exceed \(m\).
rwidtha
specifies the allocated number of columns for matrix \(A\), which can be larger than needed, with \(n≤rwidth\).b
is the first element of vector \(b\), with a lenth matching \(m\), though it can be allocated larger.x
is the first element of the array for the solution vector \(x\), with length matching \(n\), though it can also be allocated larger.term
returns an error code:- 1 - Success, solution \(x\) is valid
- 2 - No clear solution due to linearly dependent columns.
- 3 - Input error:
n < 1
, orn > m
.
The method uses Householder transformation to reduce \(A\) to an upper triangular form.
Example
Find the least-squares solution for the system \(A x = b\) of 4 equations and 3 unknowns.
The output is:
Solve A x = b with the least-squares method
A =
1 0 1
1 1 1
0 1 0
1 1 0
b =
21 39 21 30
Solution x =
10 20 10
Residuals A x - b =
-1 1 -1 -0
Sum of squared residuals
3
----------------------------------------------------------------------------
Modified solution x' (to show that it has a larger sum of squared residuals)
11 19 10
Sum of squared residuals
5
Why Calculate the Sum of Squared Residuals and Why Modify the Solution to Increase the Sum of Squared Residuals?¶
Why Calculate the Sum of Squared Residuals?
The sum of squared residuals is a measure of how well the computed solution fits the data. It tells us how far off the computed values (from \(A × x\)) are from the actual values in \(b\).
In a least-squares solution, this sum is minimised. By calculating it, the code verifies how close the solution is to the actual results. A smaller sum of squared residuals means a better fit.
A smaller sum indicates a better fit. In this case, a sum of 3 is reasonable and shows that the solution is a close approximation.
What about the residual of \((-1, 1, -1, 0)\)?
The residuals \(A × x - b\) being \((-1, 1, -1, 0)\) indicate that the least-squares solution is a close approximation, but not perfect.
However, in an over-determined system (more equations than unknowns, as here with 4 equations and 3 unknowns), an exact solution is usually not possible. The least-squares method tries to find a solution \(x\) that minimizes the sum of the squared residuals.
This means that, when solving the system, the predictions for some rows of \(b\) are slightly off:
- The first equation is off by -1 (under-predicted).
- The second equation is off by 1 (over-predicted).
- The third equation is off by -1 (under-predicted).
- The fourth equation is off by 0 (almost perfect prediction).
Why Modify the Solution to Increase the Sum of Squared Residuals?
In the second part of the code, the solution vector \(x\) is manually modified:
This shows what happens if the solution is perturbed. By changing \(x\), the code demonstrates that the sum of squared residuals increases:
The point of this modification is to highlight the importance of the least-squares solution. The original solution minimizes the residuals, but altering the solution results in a worse fit (higher residuals).
Deviating from the least-squares solution results in a higher sum of squared residuals, indicating a poorer fit.
Unit eig
- Eigenvalues and eigenvectors¶
An eigenvector is a special type of vector that keeps its direction even when a linear transformation (like multiplying by a matrix) is applied to it.When you multiply a square matrix \(A\) by a non-zero vector \(x\) and the result is the same vector multiplied by a number (called a scalar), then: - \(x\) is the eigenvector - The number is called the eigenvalue (represented by \(λ\)).
NumLib has tools to calculate eigenvalues and eigenvectors for different types of matrices. Each matrix type has two to four procedures with numbers 1 to 4 that indicate their complexity:
- Procedures with "1": Calculate all eigenvalues.
- Procedures with "2": Calculate specific eigenvalues in a certain range.
- Procedures with "3": Calculate all eigenvalues and eigenvectors.
- Procedures with "4": Calculate specific eigenvalues and eigenvectors in a certain range.
Matrices with General Storage¶
The routines assume the \(n \times n\) matrix is stored as a 2D (or 1D) array of ArbFloat values.
For Generic Matrices
- eigge1: Calculates all eigenvalues.
- eigge3: Calculates all eigenvalues and eigenvectors.
For Generic Symmetric Matrices
- eiggs1: Calculates all eigenvalues.
- eiggs2: Calculates some eigenvalues in a specific range.
- eiggs3: Calculates all eigenvalues and eigenvectors.
- eiggs4: Calculates some eigenvalues and eigenvectors in a specific range.
For Symmetric Positive Definite Matrices
- eiggg1: Calculates all eigenvalues.
- eiggg2: Calculates some eigenvalues in a specific range.
// Generic matrix (without any special symmetries)
procedure eigge1(var a: ArbFloat; n, rwidth: ArbInt; var lam: complex; var term: ArbInt); // all eigenvalues
procedure eigge3(var a: ArbFloat; n, rwidtha: ArbInt; var lam, x: complex; rwidthx: ArbInt; var term: ArbInt); // all eigenvalues and eigenvectors
// Generic symmetric matrix
procedure eiggs1(var a: ArbFloat; n, rwidth: ArbInt; var lam: ArbFloat; var term: ArbInt); // all eigenvalues
procedure eiggs2(var a: ArbFloat; n, rwidth, k1, k2: ArbInt; var lam: ArbFloat; var term: ArbInt); // some eigenvalues (index k1..k2)
procedure eiggs3(var a: ArbFloat; n, rwidtha: ArbInt; var lam, x: ArbFloat; var term: ArbInt); // all eigenvalues and eigenvectors
procedure eiggs4(var a: ArbFloat; n, rwidtha, k1, k2: ArbInt; var lam, x: ArbFloat; var term: ArbInt); // some eigenvalues and eigenvectors (index k1..k2)
// Symmetric positive definite matrix
procedure eiggg1(var a: ArbFloat; n, rwidth: ArbInt; var lam: ArbFloat; var term: ArbInt); // all eigenvalues
procedure eiggg2(var a: ArbFloat; n, rwidth, k1, k2: ArbInt; var lam: ArbFloat; var term: ArbInt); // some eigenvalues (index k1..k2)
procedure eiggg3(var a: ArbFloat; n, rwidtha: ArbInt; var lam, x: ArbFloat; var term: ArbInt); // all eigenvalues and eigenvectors
procedure eiggg4(var a: ArbFloat; n, rwidtha, k1, k2: ArbInt; var lam, x: ArbFloat; var term: ArbInt); // some eigenvalues and eigenvectors (index k1..k2)
Parameters
a
is the first element of an array containing the matrix \(A\) for which the eigenvalue/eigenvector has to be calculated.- The array must be dimensioned to provide space for at least \(n^{2}\) floating point values.
n
specifies the size of the matrix \(A\), i.e. the number of rows or columns. Note that the input matrix must be square, i.e. the number of rows and columns is equal.rwidth
is the allocated row length of the array a. It can be larger than n if the array is allocated larger than necessary, but normallyrwidth = n
.lam
is the first element of an array receiving the calculated eigenvalues.- In case of a generic matrix (
eigge1
,eigge3
) the eigenvalues can be complex; therefore, the array must be dimensioned for values of typecomplex
as declared in unittyp
. - In the other cases (
eiggs1..4
oreiggg1..4
) the eigenvalues are real, and the array must be dimensioned for datatypeArbFloat
. - Since a \(n \times n\) matrix has \(n\) eigenvalues, the array must be allocated for at least
n
values, in case of the procedures with appended 2 or 4 onlyk2-k1+1
values are sufficient (see below).
- In case of a generic matrix (
term
returns an error code:- 1 -- successful calculation
- 2 -- calculation failed
- 3 -- error in input data: n<1, k1<1, k1>k2, or k2>n.
Additionally, in case of eigge3
:
x
is the first element of a matrix to receive the calculated eigenvectors.- Again, the eigenvectors of a generic matrix can have complex components. Therefore, the matrix must be declared for the datatype
complex
, and it must be large enough to hold at least \(n^{2}\) values. - If the matrix is symmetric or positive definite, the eigenvectors are real, and the array must be declared for datatype
ArbFloat
. In any case, the eigenvectors are normalized to unit length and arranged in the columns of this matrix.
- Again, the eigenvectors of a generic matrix can have complex components. Therefore, the matrix must be declared for the datatype
rwidthx
denotes the allocated row length of the matrix \(x\). Thus it is possible to dimension the result matrix larger than actually needed.
Additionally, in case of procedures eiggs2
, eiggs4
, eiggg2
and eiggg4
:
k1
andk2
define the interval of indexesk
for which the eigenvalues (\(\lambda k\)) and eigenvalues (\(c_k\)) are to be calculated. They are integers and must be ordered such that \(1<=k1<=k2<=n\).
Example
Calculate the eigenvalues and eigenvectors of the matrix.
Output
Matrix a =
8.000 -1.000 -5.000
-4.000 4.000 -2.000
18.000 -5.000 -7.000
Eigenvalues: lambda =
2.000 + 4.000i 2.000 - 4.000i 1.000 + 0.000i
Eigenvectors (as columns): x =
0.316 + 0.316i 0.316 - 0.316i 0.408 + 0.000i
0.000 + 0.632i 0.000 - 0.632i 0.816 + 0.000i
0.632 + 0.000i 0.632 + 0.000i 0.408 + 0.000i
Press enter to quit
Symmetric Band Matrices¶
NumLib provides four routines for calculating the eigenvalues and eigenvectors of symmetric band matrices (matrices with non-zero elements in a band around the main diagonal):
-
eigbs1: Calculates all eigenvalues.
-
eigbs2: Calculates some of the eigenvalues, within a specified range (from index
k1
tok2
). -
eigbs3: Calculates all eigenvalues and their corresponding eigenvectors.
-
eigbs4: Calculates some eigenvalues and their corresponding eigenvectors (within the range
k1
tok2
).
Parameters
-
a: This is the first element of a 1D array that contains the matrix's diagonal and left band elements. The right band is ignored because the matrix is symmetric. The array needs to be large enough to hold at least
n*(w+1) - (w*(w+1)) div 2
elements. -
n: The number of rows and columns in the matrix (since the matrix is square).
-
w: The bandwidth, or the number of diagonals the band extends on either side of the main diagonal (the left and right bandwidth must be the same since the matrix is symmetric).
-
lam: The first element of an array where the computed eigenvalues will be stored. Since a matrix of size
n x n
hasn
eigenvalues, this array should have space for at leastn
values of typeArbFloat
. -
x: (Used in
eigbs3
andeigbs4
) The first element of a matrix that will store the computed eigenvectors. The matrix must be large enough to hold at leastn x n
values and will contain the eigenvectors as columns. These vectors are normalized (scaled to unit length). -
rwidthx: (Used in
eigbs3
andeigbs4
) The allocated row length for the eigenvector matrixx
. This allows the matrix to be larger than needed if desired. -
k1, k2: (Used in
eigbs2
andeigbs4
) These define the range of eigenvalues and eigenvectors to compute. For example, settingk1 = 1
andk2 = 3
calculates the first three eigenvalues and their vectors. The values must satisfy1 <= k1 <= k2 <= n
. -
m2: (Used in
eigbs4
) The index of the largest eigenvalue for which an eigenvector has been computed. -
term: A return code indicating the result:
- 1: Calculation was successful.
- 2: Calculation failed.
- 3: There was an error in the input parameters, such as
n < 1
,w < 0
,w >= n
,k1 < 1
,k1 > k2
, ork2 > n
.
If the bandwidth w
is larger than one-third of the matrix size (n/3
), it is generally more efficient to compute all eigenvalues and eigenvectors, even if you don't need all of them.
Example
Calculate the eigenvalues and eigenvectors of the symmetric 7 x 7 matrix
The eigenvalues are
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
|
Output
n = 7
(One-sided) band width w = 2
Diagonal elements of A = 5 -4 6 1 -4 6 1 -4 6 1 -4 6 1 -4 6 1 -4 5
Reconstructed A =
5 -4 1 0 0 0 0
-4 6 -4 1 0 0 0
1 -4 6 -4 1 0 0
0 1 -4 6 -4 1 0
0 0 1 -4 6 -4 1
0 0 0 1 -4 6 -4
0 0 0 0 1 -4 5
Expected eigenvalues:
0.023 0.343 1.524 4.000 7.647 11.657 14.805
Calculated eigenvalues: lambda =
0.023 0.343 1.524 4.000 7.647 11.657 14.805
Eigenvectors (as columns): x =
-0.191 -0.354 0.462 0.500 0.462 0.354 0.191
-0.354 -0.500 0.354 0.000 -0.354 -0.500 -0.354
-0.462 -0.354 -0.191 -0.500 -0.191 0.354 0.462
-0.500 -0.000 -0.500 0.000 0.500 0.000 -0.500
-0.462 0.354 -0.191 0.500 -0.191 -0.354 0.462
-0.354 0.500 0.354 -0.000 -0.354 0.500 -0.354
-0.191 0.354 0.462 -0.500 0.462 -0.354 0.191
Press enter to quit
Symmetric Tridiagonal Matrices¶
Symmetric tridiagonal matrices can be handled using the following routines: eigts1
, eigts2
, eigts3
, and eigts4
.
-
All Eigenvalues
This procedure computes all eigenvalues of the matrix.
-
Some Eigenvalues (with indices k1..k2)
This computes the eigenvalues within the index range
k1
tok2
. -
All Eigenvalues and Eigenvectors
This calculates all eigenvalues and their corresponding eigenvectors.
-
Some Eigenvalues and Eigenvectors (with indices k1..k2)
This computes eigenvalues and eigenvectors within the specified index rangeprocedure eigts4(var d, cd: ArbFloat; n, k1, k2: ArbInt; var lam, x: ArbFloat; rwidth: ArbInt; var m2, term: ArbInt);
k1
tok2
.
Parameters
- d: A 1D array representing the main diagonal of the matrix. It must contain at least
n
elements. - cd: A 1D array representing the subdiagonal of the matrix. It must have at least
n-1
elements. - n: The size of the matrix (number of rows or columns), which must be square.
- k1, k2: The index range for the eigenvalues and eigenvectors to be calculated, where
1 <= k1 <= k2 <= n
. - lam: A 1D array to store the calculated eigenvalues. For routines
eigts1
andeigts3
, it must contain at leastn
elements. Foreigts2
andeigts4
, it only needs to storek2 - k1 + 1
elements. - x: A 2D array to store the calculated eigenvectors. For
eigts3
, the array must have at leastn^2
elements. Foreigts4
, it must provide space forn * (k2 - k1 + 1)
elements. Eigenvectors are normalized to unit length and stored in columns. - rwidth: The row length of the matrix
x
. This allows the result matrix to be dimensioned larger than necessary. - term: An error code returned after calculation:
1
: Successful calculation.2
: Calculation failed.3
: Error in input data (e.g.,n < 1
,k1 < 1
,k1 > k2
, ork2 > n
).
This explanation organizes the routines, clearly defines the parameters, and explains the error codes in a structured way for easier understanding.
Example
Calculate eigenvalues and eigenvectors of the matrix:
The expected eigenvalues are:
Output
n = 4
Elements of main diagonal = 1 1 1 1
Elements of subdiagonal = 1 2 1
Reconstructed A =
1 1 0 0
1 1 2 0
0 2 1 1
0 0 1 1
Expected eigenvalues:
-1.414 0.586 1.414 3.414
Calculated eigenvalues: lambda =
-1.414 0.586 1.414 3.414
Eigenvectors (as columns): x =
-0.271 -0.653 0.653 0.271
0.653 0.271 0.271 0.653
-0.653 0.271 -0.271 0.653
0.271 -0.653 -0.653 0.271
Press enter to quit
Unit roo
- Finding the roots of a function¶
Roots of a Polynomial¶
A polynomial of degree n
always has n
complex solutions, though they may not all be distinct. The datatype complex
is described in the section on complex numbers. You can calculate the roots using the roopol
function.
Note
The polynomial must be normalized, meaning the coefficient of the highest-degree term should be 1.
Parameters
-
a: An array containing the polynomial coefficients, ordered from highest to lowest degree.
- The polynomial must be normalized, meaning the coefficient of the highest-degree term should be 1. This coefficient is not included in the array, so the array must have at least
n
elements. - Since only real polynomials are handled, the array elements should be of type
ArbFloat
. Note: the data organization for this array is different from other polynomial routines in this library.
- The polynomial must be normalized, meaning the coefficient of the highest-degree term should be 1. This coefficient is not included in the array, so the array must have at least
-
n: The degree of the polynomial. It must be a positive integer.
-
z: An array of complex values that will store the roots of the polynomial. The array must have at least
n
elements. The values returned are undefined if an error occurs (i.e., ifterm <> 1
). -
k: The number of roots found. This value should always equal
n
; if it doesn't, an error has occurred. -
term: An error code returned by the function:
1
: Successful completion, and the arrayz
contains valid root data.2
: Not all roots were found (k < n
).3
: Error in input data (n < 1
).
Example
Calculate the roots of the polynomial \(z^5 + 3 z^4 + 4 z^3 - 8 z^2\). The expected zero points are:
Roots of a Quadratic Equation¶
A quadratic equation is a polynomial of degree 2. It always has two complex roots, though they may not be distinct. You can find these roots using the rooqua
procedure.
Parameters
- p: The coefficient of the linear term in the quadratic equation.
- q: The constant term of the quadratic equation.
- z1, z2: Variables that will store the two complex roots of the equation. The type
complex
is declared in the unittyp
.
rooqua
assumes that the quadratic term has been normalized, meaning the coefficient of the quadratic term is set to 1.
Example
Determine the roots of the equation \(z^2 + 2 z + 5 = 0\).
Roots of the Binomial Equation¶
The binomial equation \( z^n = a \) is a special type of polynomial where all terms except for the highest- and lowest-order terms have zero coefficients. It has n
complex solutions, which can be calculated using the roobin
procedure.
Parameters
- n: The exponent in the binomial equation. It must be a positive integer.
- a: The constant term on the right-hand side of the equation. This is expected to be a complex number (refer to the section on complex numbers).
- z: An array of type
complex
that will store the calculated roots. It must have space for at leastn
complex values. - term: An error code returned by the procedure:
1
: Successful termination.2
: Error in input data (n < 1
).
Example
Calculate the roots of the equation:
The exact solutions are
Output
Results of procedure roobin:
Solution #1: 0.707107 + 0.707107 i
Solution #2: 0.707107 - 0.707107 i
Solution #3: -0.707107 + 0.707107 i
Solution #4: -0.707107 - 0.707107 i
Expected results:
Solution #1: 0.707107 + 0.707107 i
Solution #2: 0.707107 - 0.707107 i
Solution #3: -0.707107 + 0.707107 i
Solution #4: -0.707107 - 0.707107 i
Press enter to quit
Bisection Method¶
The bisection method is used to estimate the root of a function by identifying two values, a
and b
, such that the function has opposite signs at those points. The midpoint of the interval is calculated, and the subinterval where the function's signs differ is selected for the next iteration. This process continues until the desired precision (i.e., interval length) is achieved.
In NumLib, this method is implemented using the roof1r
procedure:
Parameters
- f: The function for which the root is to be determined. It must take a single floating-point argument of type
ArbFloat
. The function typerfunc1r
is declared in thetyp
unit. - a, b: The endpoints of the interval. The root must lie between these two values, meaning the function values
f(a)
andf(b)
should have opposite signs. - ae, re: These determine the absolute (
ae
) and relative (re
) precision of the root.re
is relative to the maximum ofabs(a)
andabs(b)
. Higher accuracy is achieved by settingae
toMachEps
(from thetyp
unit). Both parameters must be non-negative. - x: The variable that returns the found root.
- term: An error code indicating the result of the process:
1
: Successful termination. A root has been found with the specified absolute or relative precision.2
: The required accuracy could not be achieved, but the value ofx
is the "best achievable" approximation.3
: Input error:ae < 0
orre < 0
, or the function values ata
andb
do not have opposite signs.
Example
The following program uses the bisection method to find the square root of 2. The root is the value of x
where the function \( f(x) = x^2 - 2 \) equals zero. Since \( f(1) = -1 \) and \( f(2) = 2 \), the root lies between 1 and 2.
Output
Here’s a tidied version of your text:
Roots of a System of Nonlinear Equations¶
We want to find the roots of a system of nonlinear equations:
The roofnr
procedure can be used to find the roots of such a system:
procedure roofnr(f: roofnrfunc; n: ArbInt; var x, residu: ArbFloat; ra: ArbFloat; var term: ArbInt);
Parameters
- f: The address of the procedure that calculates the function values \( f_i(x_1, x_2, \dots, x_n) \). The function type
roofnrfunc
is declared in thetyp
unit: - x: An array of at least
n
ArbFloat
values, providing the \( x_j \) values at which the functions are evaluated. - fx: An array to store the calculated values of the functions \( f_i \).
- deff: A boolean flag that can be used to stop the root-finding process based on a condition.
For example, to solve the system of equations:
The function f
can be written as:
procedure func(var x1, f1x: real; var deff: boolean);
var
x: array[1..2] of real absolute x1;
f: array[1..2] of real absolute f1x;
begin
f[1] := sqr(x[1]) + sqr(x[2]) - 2;
f[2] := -sqr(x[1] - 1) + x[2];
end;
To stop the process when \( x_1 < 1 \), set deff
to false
:
procedure func(var x1, f1x: real; var deff: boolean); far;
var
x: array[1..2] of real absolute x1;
f: array[1..2] of real absolute f1x;
begin
deff := x[1] >= 1;
if deff then begin
f[1] := sqr(x[1]) + sqr(x[2]) - 2;
f[2] := -sqr(x[1] - 1) + x[2];
end;
end;
Parameters
- n: The number of equations or variables ( \( x_i \) ) in the system.
- x: The first element of an array (with at least
n
values) used for both input and output. Initially, it should contain estimates of the roots. After computation, it holds the found roots. - residu: The 2-norm of the vector of residuals in the solution: $$ |f|2 = \sqrt{\sum^{n} f_i^2} $$
- ra: The relative accuracy for calculating the solution. Typical values are \( 10^{-3}, 10^{-5}, 10^{-8} \) depending on the precision (
single
,real
, ordouble
). - term: An error code indicating the result:
1
: Successful solution with desired accuracy.2
: The solution accuracy could not be achieved, but the returnedx
values are the best achievable.3
: Incorrect input data (e.g.,n < 0
orre < 0
).4
: The calculation process was stopped due to exceeding function call limits.5
: Insufficient progress—no solution found or needs a different starting value.6
: The procedure attempted to compute a value outside the range defined bydeff
.
Example
Solve the system of equations:
Output
term = 1
Results found by procedure roofnr:
Solution #1: 1.404698
Solution #2: 0.163780
Norm of residuals: 0.000000000000000
Press enter to quit
Unit int
- Numerical integration of a function¶
The int1fr
function in the NumLib
library calculates the integral of a given function between limits a
and b
, with a specified absolute accuracy ae
:
Parameters
-
f
: A pointer to the function to be integrated. The function must take a single real variable of typeArbFloat
and return a value of typeArbFloat
. The function type is declared asrfunc1r
in thetyp
unit. -
a, b
: The integration limits. These can also be set to+/-Infinity
to allow integration over infinite intervals. The order ofa
andb
follows correct mathematical conventions. -
ae
: The required absolute accuracy for the result. -
integral
: The result of the integration. It is valid only ifterm = 1
. -
err
: The error in accuracy if the requested precision could not be achieved. In this case,term = 2
. -
term
: The termination status code:1
: Integration successful with absolute accuracyae
.2
: Requested accuracy not reached, but the result is approximated with errorerr
.3
: Invalid input (e.g.,ae < 0
, or botha
andb
are infinite).4
: Integration failed due to divergence or slow convergence.
Example
This example demonstrates calculating the integral:
for various integration limits a
and b
. Since the function diverges at x = 0
, the integration interval must not include this point. The expected analytic result is provided for comparison.
Notes
-
recipx2
: This function returns \( \frac{1}{x^2} \), which is the integrand. -
integral_recipx2
: Computes the expected analytic result for the given integration limits. -
Execute
: Runs the integration for the specified limitsa
andb
, handles potential errors, and prints the result.
Output
Integral of f(x) = 1/x^2
The integral from 1 to 2 is 0.500000000, expected: 0.500000000
The integral from 1 to 1 is 0.000000000, expected: 0.000000000
The integral from 2 to 1 is -0.500000000, expected: -0.500000000
The integral from 1 to +Inf is 1.000000000, expected: 1.000000000
The integral from -Inf to -1 is 1.000000000, expected: 1.000000000
The integral from 0 to +Inf cannot be calculated: Divergence or slow convergence.
Press enter to quit
Unit ode
- Ordinary differential equations¶
Solving a Single First-Order Differential Equation¶
The procedure odeiv1
solves an initial value problem for a first-order differential equation of the form:
where a
, b
, f
, and the initial condition y(a) = α
are given.
Parameters
f
: The function to be solved, depending on two real variables and returning a real value. Defined astype rfunc2r = function(x, y: ArbFloat): ArbFloat
.a
: The startingx
value of the interval.ya
: The initial valueα
atx = a
.b
: The endx
value of the interval. After the calculation, ifterm = 2
,b
contains the new endpoint with the required accuracy,ae
.yb
: Returns the computed valuey(b)
ifterm < 3
. Ifterm = 3
, the result is undefined.ae
: Specifies the absolute accuracy required fory(b)
.term
: Returns the following error codes:1
: Successful completion.2
: The solution could not reachb
with the required accuracy.yb
is an approximation at the deliveredb
.3
: Input error,ae <= 0
.
This algorithm is based on a fifth-order adaptive Runge-Kutta method. It may not be accurate for stiff differential equations, and accuracy issues can arise in unstable problems where small variations in y(a)
cause large variations in y(b)
.
If you want to solve for multiple points, like from x = 0
to x = 1
with a step size of 0.1, avoid restarting at each step and "integrate" instead.
Example
Solve the equation \(y'' = -10 (y - x^2)\) with initial condition \(y(0) = 0\) and compare it to the exact solution \(y(x) = -0.02 exp(-10 x) + x^2 - 0.2 x + 0.02\).
Output
x y exact error code
0.00000 0.00000 0.00000 -
0.50000 0.16986 0.16987 1
1.00000 0.82000 0.82000 1
1.50000 1.97000 1.97000 1
2.00000 3.62000 3.62000 1
2.50000 5.77000 5.77000 1
3.00000 8.42000 8.42000 1
3.50000 11.57000 11.57000 1
4.00000 15.22000 15.22000 1
4.50000 19.37000 19.37000 1
5.00000 24.02000 24.02000 1
Press enter to quit
Solving a System of First-Order Differential Equations¶
To solve a system of first-order differential equations:
where:
y
is a vector \([y_1(x), y_2(x), ..., y_n(x)]\),f(x, y)
is a vector function \([f_1(x, y), f_2(x, y), ..., f_n(x, y)]\),- The initial conditions are given as \(y(a) = [y_1(a), y_2(a), ..., y_n(a)]\).
This can be solved using the procedure odeiv2
, which uses an adaptive fifth-order Runge-Kutta method with variable step size.
procedure odeiv2(f: oderk1n; a: ArbFloat; var ya, b, yb: ArbFloat; n: ArbInt; ae: ArbFloat; var term: ArbInt);
Parameters
f
: The procedure calculating \(f_i(x, y_i)\) values. It is of typeoderk1n = procedure(x: ArbFloat; var y, fxy: ArbFloat)
.a
: Starting point of the calculation.ya
: Initial values for the system, stored in an array.b
: Endpoint of the interval. Ifterm = 2
,b
will be updated.yb
: Stores the results.n
: Number of equations in the system.ae
: Specifies absolute accuracy.term
: Error codes:1
: Successful completion.2
: Solution couldn't reachb
with desired accuracy.yb
contains approximations.3
: Input error,n < 1
orae <= 0
.
Example
Integrate the following system of ODEs between x = 0
and x = 1
:
The exact solutions are:
Output
x y[1] y[2] exact[1] exact[2] error code
0.00000 0.00000 1.00000 0.00000 1.00000 -
0.10000 0.10084 1.00500 0.10084 1.00500 1
0.20000 0.20678 1.02006 0.20678 1.02006 1
0.30000 0.32335 1.04530 0.32335 1.04530 1
0.40000 0.45699 1.08088 0.45699 1.08088 1
0.50000 0.61559 1.12684 0.61559 1.12684 1
0.60000 0.80932 1.18298 0.80932 1.18298 1
0.70000 1.05157 1.24846 1.05157 1.24846 1
0.80000 1.36045 1.32129 1.36045 1.32129 1
0.90000 1.76085 1.39732 1.76085 1.39732 1
1.00000 2.28736 1.46869 2.28736 1.46869 1
Press enter to quit
Unit ipf
- Interpolation and fitting¶
The unit ipf
contains routines for:
- Fitting a set of data points with a polynomial.
- Interpolating or fitting a set of data points with a natural cubic spline. A spline is a piecewise function of polynomials with a high degree of smoothness at the connection points (called "knots"). In the case of a natural cubic spline, the polynomials are of degree 3, and their second derivatives are zero at the first and last knots.
Definitions
- Fitting: Finding an approximating smooth function such that deviations from the data points are minimized.
- Interpolation: Determining a function that passes exactly through the data points.
Use of Polynomials or Splines
Polynomials or splines are recommended unless the data follows a known function but with some unknown parameters. For example, if the data is modeled by the function \( f(x) = a + b e^{-c x} \), fitting to that form may be better.
If the data contains measurement errors, fitting with a low-degree polynomial (e.g., degree ≤ 5) is usually a good first approach. If the data’s shape is complex, consider fitting with a spline.
If the data has no measurement noise, it’s often best to interpolate using a spline.
Fitting with a Polynomial¶
You can fit a polynomial to your data using the ipfpol
routine. This routine takes a set of data points \((x_i, y_i)\) and calculates a polynomial of the form:
The coefficients \( b_0, \dots, b_n \) are adjusted to minimize the sum of squared deviations from the data points, using the least squares fitting method:
Parameters
m
: Number of data points (must be \( m \geq 1 \)).n
: Degree of the polynomial (must be \( n > 0 \)).x
: Array containing the \( x \)-values of the data points.y
: Array containing the \( y \)-values of the data points.b
: Array to store the calculated polynomial coefficients.term
: Error code.1
: Success.3
: Input error (e.g., \( n < 0 \) or \( m < 1 \)).
Once the coefficients are determined, you can calculate the value of \( p(x) \) using the spepol
procedure from the spe
unit.
Example: Fitting a Degree 2 Polynomial
Fit a polynomial of degree 2 to these data points:
Output
Fitting a polynomial of degree 2: p(x) = b[0] + b[1] x + b[2] x^2
Data points
i x_i y_i
1 0.00 1.26
2 0.08 1.37
3 0.22 1.72
4 0.33 2.08
5 0.46 2.31
6 0.52 2.64
7 0.67 3.12
8 0.81 3.48
Fitted coefficients b = 1.222195 2.224385 0.759178
Interpolated values:
0.20 1.70
0.40 2.23
0.60 2.83
0.80 3.49
1.00 4.21
Press enter to quit
Interpolation with a Natural Cubic Spline¶
To interpolate data with a cubic spline, use the ipfisn
routine. This calculates the spline’s second derivatives. Once these are known, you can calculate the spline value at any point using the ipfspn
procedure.
Parameters
n
: Number of data points minus one.x
,y
: Arrays containing the data points' \( x \)- and \( y \)-values.d2s
: Array to store the second derivatives at each point (except at the endpoints, where it's set to 0).term
: Error code.1
: Success.2
: Error in calculating second derivatives.3
: Invalid input parameters (e.g., \( n \leq 1 \)).
Once the second derivatives are determined, you can use ipfspn
to evaluate the spline at any point.
Example: Natural Cubic Spline Interpolation
Interpolate a natural cubic spline through the following data points:
Output
Cubic spline interpolation
Data points
i x_i y_i
0 0.00 0.990
1 0.09 0.927
2 0.22 0.734
3 0.34 0.542
4 0.47 0.388
5 0.58 0.292
6 0.65 0.248
7 0.80 0.179
8 0.93 0.139
Second derivatives
-10.810021
0.374744
4.380190
1.978794
3.200753
1.357095
1.268361
Interpolated values
0.20 0.77
0.40 0.46
0.60 0.28
0.80 0.18
1.00 0.12
Press enter to quit
Unit spe
- Special functions¶
Coming soon.