Home Page
1. The two programs
2. Fundamental features of the two programs
3. Computation of [X] ( = arccot(X) )
4. Precision
5. The "common factors" phenomenon
6. Outline of implementation
7. Some practical time-saving features incorporated

PROGRAMS USED FOR EVALUATING PI

1. The two programs

Two complementary programs, here referred to as "ArcCot" and "CompPI", were used. Both were written in 'C' by M.R.Wetherfield, for use in a MS-DOS environment on PCs; they incidentally make use of some of the MASM subroutines which form part of our Machination software for finding new identities. A "self-checking" pair of identities has to be used for any evaluation; the choice of pair is up to the user, subject to the limitations on cotangent values mentioned in the next paragraph.

ArcCot requires two parameters, the first being an integer or half-integer cotangent value; as in all our identities, integers and the (odd) numerators of half-integers may take any value up to, but not including, 2^63. The second parameter is the "target precision", i.e. number of decimal places - permitted target precisions range from one thousand up to five million. The program computes the inverse cotangent, in radians, in the form of a binary file, created by the program, long enough to achieve the required precision - the file is assigned a name which associates it uniquely with the two parameters (and also with whichever series, Gregory's or Euler's, is used in the evaluation - however the user may in practice ignore this aspect simply by renaming the file).

CompPI requires a matching "target precision" parameter and also the name of a text file containing details, in a defined format (see section 6.2, below), of the two self-checking identities to be used (i.e. the cotangent values involved and their coefficients in the two identities). For each inverse cotangent term in either identity, this program will look for a pre-existing file matching the required cotangent value and precision; if it does not find one it will generate one itself, using exactly the same procedures as the "ArcCot" program. It then evaluates the two resultant approximations to PI. These are simultaneously converted to decimal and output, five digits at a time, continuing until the first discrepancy between the two results is detected.

Apart from its built-in accuracy, the advantage of this system lies in the opportunity to use the "ArcCot" program to compute all the required inverse cotangents, not only in advance but (provided that sufficient PCs are available) in parallel.

The first digit of any permitted target precision has to be 1, 2 or 5; the following digits must all be zeroes - at least three, up to a maximum of six. This artificial limit of 5,000,000 decimal digits has been set, partly because it is not known how long, in practice, MS-DOS permits the binary files on which the computation depends to grow, and partly because testing the programs to greater lengths would have taken too much time. The only real limitations are that the ‘C’ low-level input-output procedures restrict the sizes of binary files to 2^31 bytes (equivalent to more than twice this number of decimal digits), and also that the "iteration counter", which counts the number of terms in whichever series is used to approximate to the inverse cotangent (and is therefore used, in some cases multiplied by 8, in the computation of each term), must be less than 2^29.

The compiled programs should be completely portable between PCs supporting the Intel 80386 instruction set. Filenames are restricted to 8 characters to allow for running under Windows 98, etc.

The source coding does assume that the ‘C’ "char", "int", and "long int" storage units are respectively 8, 16, and 32 bits long, and moreover relies on the Intel "byte-swapping" feature whereby the addresses of 16-bit, 32-bit and 64-bit integers refer to their least-significant bytes; specifically, the "casting" of 32-bit and 64-bit integer types (so that they can be referred to as byte-strings, for transferring to and from binary files) is assumed not to involve any change in internal format. For portability to any other type of computer, quite significant changes to the source code of the programs would almost certainly be required, unless the processor and its ‘C’ compiler supported 8-bit characters, 32-bit integer arithmetic, and byte-swapping.

Top of page

2. Fundamental features of the two programs

Two key features to be incorporated in the implementation were considered fundamental from the outset of the project.

(A) The "ArcCot" program, which computes the inverse of the given cotangent value to the specified precision, would generate the sum of the requisite number of terms of either Euler’s or Gregory’s series in the form of exact "Numerator" and "Denominator" values, and would then (after removal of common factors) divide the Numerator by the Denominator and store the resultant quotient in a file, for subsequent use by the "ComputePI" program. The number of terms summed would depend on the target precision.

This approach, avoiding the need for multiple division operations, derives from one programmed by M.R.Wetherfield in 1957 to perform an undocumented evaluation of PI to 150 decimal places, using an English Electric DEUCE computer. However that implementation, which relied on Gregory’s series, only halved the number of divisions needed. The present implementation was initially based on Euler’s series; it was then found that it could be adapted without difficulty to use Gregory’s series, so both versions are now incorporated in the source code. Each program therefore exists in two executable versions, derived from one source coding; the choice (of which series the object code will exploit) is made using conditional compilation.

(As mentioned below in Section 3, practical measurements showed that the program's performance when using Gregory's series was slightly inferior, suggesting that a fresh design, rather than an adaptation of existing code, might have been preferable).

(B) The Numerator and Denominator values, their Quotient, and any "outsize" numbers used in the process of generating them, would be stored as unsigned (i.e. positive) binary numbers in files created and manipulated using the ‘C’ low-level file procedures: ‘open’, ‘read’, ’write’, ‘lseek’, ‘filelength’, ‘chsize’, etc. In practice the number of such files which a program can have "open" at any time appears to be limited to 16.

These files would all be a multiple of 4 bytes in length, the least significant end of the number being at the start of its file. The numbers could be either pure integers, or rational values with an integer part and a fractional part; in the latter case, the last four bytes of the file would hold the integer part, and the length of the fractional part would be predetermined by the target decimal precision (see section 4, below). "Pure integer" files would be allowed to grow, or shrink, naturally, in units of 4 bytes, as various operations were performed on the stored integers.

It had originally been thought that some more "secure" method of filing inverse cotangent values might be needed, as it was quite possible that they would have to be transported between computers via external media, telephone lines, etc.; but this idea was rejected, reliance being placed in the use of mutually-checking identities to show up any adventitious errors.

Top of page

3. Computation of [X] ( = arccot(X) )

Two infinite series were considered, Gregory’s (1671) and Euler’s (1755).

As mentioned in 2(A) above, the initial decision was to implement Euler’s series. This was to be performed by an iterative process, starting with initial versions of the "Numerator" and "Denominator" values (defined in section 2, above) called N0 and D0, each held in a binary integer file (referred to respectively as the N and D files). The iteration count, i, would start at 1, at each stage generating revised values Ni and Di. It was found not difficult to adapt this process to be applicable to Gregory's series as well (however, as already mentioned, a different approach might have proved beneficial).

The iterative process, for either series, utilises additional binary files A (for "Addend"), I (for "Increment") and M (for "Multiplier"). The use of these files in computing Euler’s series may be illustrated by describing their initialisation and the subsequent iterative process of updating Ni-1 and Di-1 to become Ni and Di (in the course of which the files A and M are also updated - the Increment value is invariant).

Cotangent:   C (even integer)   C (odd integer)   C/2 (half integer - C odd)
N0           C                  C                 2C 
D0           C²+1               C²+1              C²+4 
A0           C                  C                 2C 
M0           C²+1 (odd)         (C²+1)/2          C²+4 (odd)
I            2(C²+1)            C²+1 (even)       2(C²+4)

It will be observed that in each case the initial value in the A (Addend) file is the same as the initial Numerator, and that the (constant) Increment value is double the initial M (multiplier) value, the latter having the same initial value as the Denominator except when the cotangent is an odd integer; in this case the initial I and M values are halved, thereby ensuring that, although all subsequent Numerator and Denominator values will be even, when it comes to the removal of their common factors ‘2’ will only occur once.

To update Ni-1 and Di-1 (where i is the iteration number), the following steps are carried out:

  1. The Increment value is added to the Multiplier (since the former is initially twice the latter, the effect is to make the Multiplier successively 3, 5, 7, …, (2i + 1) times the Increment).
  2. The Numerator is multiplied by the updated Multiplier.
  3. The Addend value is multiplied by a factor which is a multiple of i (the iteration number); this factor is i if the cotangent value is odd, 2i if it is even, 8i if it is a "half-integer". The Addend is therefore a multiple of i! (factorial i).
  4. The updated Addend value is added to the revised Numerator (this completes the updating of the Numerator).
  5. The Denominator is multiplied by the updated Multiplier.

For Gregory’s series, N0 is 1 for integer cotangents and 2 for half-integers; A0 is the same as N0; D0 is C in both cases (no distinction is made between odd and even integer cotangents). I and M0 are respectively 2C² and C² (as with Euler's series, I is twice M0; but M0 does not have the same relationship to D0). The updating process is the same, except that the factor which multiplies the Addend (in step 3) is (2i - 1) for integer cotangents and (8i - 4) for half-integers; and (since Gregory’s is an alternating series) in step 4 the updated Addend is subtracted from, rather than added to, the Numerator when the iteration number ‘i’ is odd.

Note that this approach precludes the removal of common factors from N and D until the summation is complete.

Practical tests with this implementation indicate that the code to implement Euler's series seems to be more efficient than that for Gregory's; it may be that a fresh approach to the design of the latter might have been beneficial (however, there are no plans for this).

 

Top of page

4. Precision

The lengths in bytes of binary files (in particular those used to hold the Numerator, Denominator, Addend, and Quotient values referred to in the preceding two sections) play an important part in the estimation of precision. In order to achieve a target precision of 1000N decimal digits, the fractional part of the quotient needs to be accurate to approximately 415.25N bytes (since 1/Log10256 = 0.4152410). Therefore, for a specified target precision of 1000N digits, and to allow a "safety margin", the program allots 440N bytes for the fractional part (and an additional 4 bytes for the integer part) of those binary files which hold Quotients, the products of Quotients and coefficients (of the corresponding terms in the pair of identities), and also the partial sums of these products while the two values of PI are computed.

The obvious question which arises is: how should the program determine, at run time, how many terms of a series are needed to ensure this level of accuracy for the quotient? Having established the "44%" ratio for specifying the length of the fractional part of the files containing Quotients (etc.), it seemed sensible to settle on a "safe" value for the difference in lengths of the Addend file and the updated Numerator file (which is a rough measure of "the number of leading zero bytes which would be present in a fractional quotient produced by dividing the Addend by the Numerator", and hence helps to answer the question: "Will the addition of another term make any difference in practice to the ‘visible’ fractional part of the Numerator / Denominator Quotient?"). It has to be borne in mind that this number (of leading zero bytes) will inevitably be reduced proportionally by the diminution of the Numerator when common factors are removed from the Numerator and Denominator (in practice, the number and multiplicity of common prime factors proves to be rather startling), so 44% should be assumed "perhaps unsafe". Practical experience led to the conclusion that 46% can be considered a "safe" value; so, again assuming a target precision of 1000N decimal digits, the program stops adding further terms when the difference in lengths of the current Numerator and Addend files exceeds 460N bytes.

Top of page

5. The "common factors" phenomenon

After the completion of summation, common prime factors are removed from the Numerator and Denominator values before performing the actual division. Although this obviously makes no difference to the Quotient, it is desirable because the time thereby saved in the division process is normally greater than the extra time needed to remove the common factors. In practice consecutive (increasing) primes are removed (note that 2 cannot occur as a common factor to a higher power than 1), the process ceasing either when the next prime in sequence is not a factor, or after removing the prime 65521.

Knowing the cotangent value, and the number of terms included in the summed series (which is 1 greater than the number of iterations performed ), one might expect to be able to determine from the definition of each term which primes would be present in the Denominator. What in actuality seems rather remarkable is that it is feasible to predict with startling accuracy not only the value of the last of the common consecutive primes removed, but also the multiplicity of most of the removed primes. This is because, if the number of terms is T, and P (a prime greater than 2) is one of the consecutive common factors of the Numerator and Denominator, with multiplicity m - i.e. P^m (P to the power m) is a common factor - then the following empirical rule appears to apply:

      (P - 1)(m + ½) <= T

For example, in a case where the number of terms T in an Euler summation happened to be 6543, it was observed that the common primes present with multiplicity 1 ranged (in reverse order) from 4363 (the largest, and last, prime removed) to 2621; those with multiplicity 2, from 2617 to 1871; and so on. In this case, also, the first few primes removed occur with the multiplicities shown below:

  3^3264  5^1630  7^1087  11^651  13^542  17^406  19^360  23^295  29^232  31^216

Anomalies have been observed, e.g. a "rogue" prime in a range of consecutive primes with the same multiplicity unexpectedly occurring with a multiplicity 1 greater than its neighbours - so this "rule" has exceptions. It is also interesting in that it seems to apply quite independently of the actual cotangent value, and of whether Euler’s or Gregory’s series is used in the summation - though, of course, the cotangent value and the target precision together indirectly determine the number of terms in the series.

Top of page

6. Outline of implementation

6.1 ArcCot program

The initial task of this program is to read and check the two parameters - target precision, followed by cotangent value. These determine the names assigned to the Quotient file and the Control and back-up files needed by the restart mechanisms.

All the required files (including two work files used by various functions) are then opened, before a function ‘InvCot’ is called. This function assigns the initial values to the various files, as described in section 3 above, and then performs the "Series Summation" and "Common Factor Removal" phases.

Finally the ‘Division’ function (not part of ‘InvCot’) is called, to fill the Quotient file.

After exit from ‘Division’, the Quotient file is saved as ‘read-only’; all other files are deleted.

6.2 ComputePI program

The two parameters required by this program are the target precision (exactly as supplied to the ArcCot program) and the name of a text file whose contents specify the pair of mutually-checking identities used in the evaluation of PI. The first line of this file contains the coefficients of [1] on the LHS of the two identities; each subsequent line starts with an integer or half-integer cotangent value (enclosed in square brackets, e.g. "[23866411/2]"), followed by the two signed coefficients of the corresponding RHS term in the two identities (0 if it only occurs in the other identity). "White space" is used to separate items in each line, but must not occur between a coefficient and its sign (+ signs may be omitted). For example:

                   1   7
           [15]    0  83
          [107]   83   0
         [1710]   17 -47
       [103697]  -22  12
    [2513489/2]  -12  -1
[18280007883/2]  -22  12          

The two identities defined in this example are:
[1]= 83[107]+17[1710]-22[103697]-12[2513489/2]-22[18280007883/2]
7[1]= 83[15] -47[1710]+12[103697] -[2513489/2]+12[18280007883/2].

Files are opened as in the ArcCot program, with the addition of another four files to hold fractional quantities - initially, the (unsigned) subtotals of the positive and negative terms of both identities; subsequently, they are used to hold the two representations of PI from which successive decimal digits are extracted and also their back-up copies ("snapshots").

The program deals with each cotangent term specified in the text file in turn; if the corresponding quotient file (whose name depends on the cotangent and precision values, and on the series used) does not exist, ‘InvCot’ and ‘Division’ are called to generate it. Each inverse cotangent value is multiplied by the specified coefficients (treated as unsigned) and the two products are added to the appropriate subtotals, depending on the signs of the two coefficients. When all the cotangent terms have thus been incorporated, the "negative" subtotal for each identity is subtracted from the "positive" subtotal (the difference replacing the latter in its file), and the results multiplied by 4 and then divided by the appropriate coefficients of [1] specified in the first line of the text file. At this stage checks are made that the "integer" parts of both the resultant values are exactly 3 (if not, the program terminates after issuing recommendations to "check the specifications of the two identities" - this may also happen earlier if, for instance, the product of an inverse cotangent and its coefficient appears too large. The easiest way to ensure that this does not happen is to pre-run the program with the smallest allowed target precision value, 1000).

At this stage most of the files (apart from the subtotal files) opened for use by ‘InvCot’ will be deleted, except for the two work files which are still needed by the decimal conversion and comparison phase, and one other file. The two "negative subtotal" files, and the "one other", are now renamed and pressed into service for "snapshots" and "Resumption Control" (explained in section 6.3, below).

Decimal conversion and comparison and output of decimal digits follows, operating on the "positive subtotal" files which now contain the two approximations to PI. Output, five digits at a time, is to the stdout stream, so it is usual to specify, in the DOS command which calls the ComputePI program, the name of a file to which output will be "piped".

Output continues until two corresponding groups of five decimal digits fail to match; within this pair of groups, any leading digits which do match are also output (though the last matching digit may be considered "suspect").

On successful conclusion, all files are deleted, the back-up and "Resumption Control" files last of all.

6.3 Restart opportunities

Since run times for these programs may be measured in weeks, it is essential to include facilities for resumption (as opposed to restarting from scratch) in the event of one of those momentary "blips", or worse, in the power supply which tend to stop PCs in their tracks. This necessitates periodically recording "snapshots" of all necessary data in back-up files (especially the data in the binary files described in sections 3 and 6.1, above), and, in the case of the ArcCot program, recording additional data identifying the "phase" which the program is currently engaged in.

In the preliminary phase of the ArcCot program, once the Precision and Cotangent value parameters have been read (and therefore the unique file names associated with the program’s output can be determined), the program checks for the existence of a "Restart Control file" which might have been left over from a prematurely-terminated previous run of the program with the same parameters. If no such file exists (or if any of the associated back-up files appear in some way "incorrect") the program will have to start from scratch anyway, which will involve creating new back-up and Restart Control files; however if there is a valid Restart Control file and the necessary back-up files appear in order, the program branches to the interrupted phase, reloading the appropriate files from the back-up copies, and restarts that phase - either from the beginning of the phase, or (in the "Series Summation" phase) at the iteration following the last one at which a "snapshot" was successfully taken - the relevant iteration number (a multiple of 2000) will have been recorded in the Restart Control file as part of the phase identification information.

Assuming that all the necessary inverse cotangents have been computed in advance, there is only one really time-consuming phase in the ComputePI program, which starts when the two identities have been evaluated, and which involves the generation of the decimal representation. It therefore proved more convenient to write a separate "Resume ComputePI" program capable only of resuming this phase (in which "snapshots" take place after the generation of every 5000 digits). The "Resumption Control" and back-up files have names dependent only on the target precision value, which is the only parameter required by this program; the preliminary checking process is similar to that used in the ArcCot program, the penalty for failure being an invitation to re-run the whole ComputePI program.

As a "fail-safe" measure for both ArcCot and ComputePI programs, every time the back-up files are updated the Control file is first overwritten in such a way that, in the event of a mishap at this time, only a restart from scratch would be feasible; it is only updated itself when the snapshot has been completed.

Top of page

7. Some practical time-saving features incorporated

7.1 Avoidance of "factor 2 multiplicity"
When Euler’s series is used to evaluate the inverse cotangent of an odd value, the method of initialising and updating various files, described in section 3, ensures that the factor 2 only occurs with multiplicity 1 in the Denominator.

7.2 Speeding-up common prime factor removal
Primes up to 1621 are initially tested with the highest multiplicity (greater than 2) consistent with a 32-bit unsigned divisor - e.g. when testing whether 3 is a factor, initial tests will be made by dividing by 3^20.

7.3 Progressive truncation of Denominator during Division
The ‘Division’ operation, fully described below, follows the call of ‘InvCot’ (see sections 6.1 and 6.2).

The unsigned Denominator, whose length is a multiple of 4 bytes, is first shifted up (if necessary) until the most significant bit of its most significant 32 bits is a 1. The contents of these top 32 bits are referred to as the ‘divisor’. The Numerator, which is less (and possibly shorter) than the Denominator, is shifted up by the same amount, at the same time ensuring that its length remains a multiple of 4.

A ‘residual dividend’ is created in a work file by copying the shifted-up Numerator into it, with a suitable number (see below) of 4-byte zero slices inserted at its least significant end (i.e. at offset 0). If the most-significant non-zero 32-bit slice of the shifted-up Numerator is not less than the ‘divisor’, the length of the residual dividend is further extended with 32 zero bits at its most significant end - ensuring that the 32-bit ‘quotient slice’, obtained by dividing the number in the top 64 bits of the residual dividend by the 32-bit divisor, does not exceed 32 bits (the number of 4-byte zero slices inserted at the least significant end of the residual dividend is chosen to ensure that the total size of the latter, including the 4 zero bytes possibly added at its more significant end, is 4 bytes more than that of the shifted-up Denominator). The relative sizes of the residual dividend and the shifted-up Denominator determine the offset of the first 4-byte quotient slice to be inserted in the Quotient file.

Division proceeds by repeating the following sequence of steps until the fractional part of the Quotient file (whose length in bytes is preset to be 44% of the target precision) is "full":

  1. The number in the top 64 bits of the residual dividend is divided by the divisor to obtain a new 32-bit "quotient slice", and the product of this quotient slice and the truncated Denominator is computed and stored in a work file ("truncation" of the Denominator is explained in the Note below).
  2. This product is compared with the residual dividend (this will usually only involve comparing the top 4 bytes of each), and, if found to be larger than the latter, is adjusted by deducting the truncated Denominator, and (for consistency) subtracting 1 from the quotient slice. This is repeated until the product is less than (or equals) the residual dividend.
  3. The quotient slice is stored in the next 4 bytes (working down from the most significant end of the fractional part) of the Quotient file. When the offset of this slice is zero, the division process is complete.
  4. If the process is not complete, the final version of the product (unless zero) is subtracted from the residual dividend; the length of the latter (whose top 4 bytes will now be zero) is reduced by 4; and the "truncation" of the Denominator is increased by 4 bytes.

Note: "Truncation" of the Denominator means that a steadily-increasing number of 4-byte slices at its least significant end are excluded from participating in the multiplication, comparison, and subtraction operations specified in the first two steps above. Initially the complete (shifted-up) Denominator participates; as each 4-byte quotient slice is inserted in the Quotient file, the amount of truncation is increased by 4 bytes (in practice, the un-truncated Denominator is always longer than the fractional part of the Quotient file). "Truncation" significantly reduces the time taken by Division.

7.4 Generation of decimal digits (last phase of the ‘ComputePI’ program)
As a precaution against the possibility that the two binary values of PI are identical (such a case, resulting in the output of spurious extra digits, did actually occur in initial testing) a check is made, before the generation of decimal digits begins, that their least significant 32 bits differ; if not, a difference of 1 is introduced, which effectively rules out the possibility, without prejudicing precision.

Subsequently the fractional parts of the two files being repeatedly multiplied by 100000 (during the digit-by-digit generation of the decimal representation - see section 6.2) are also "truncated" in rather the same way as the Denominator during division (see previous section). A current "Truncation Index" (TI) is maintained, and a certain number of bytes (equal to the value of TI, rounded down to the nearest multiple of 4) at each fraction’s least significant end are excluded from participating in the multiplication. Initially TI is zero. After every multiplication by 100000, TI is increased by 2 bytes; after 200 such increments TI is further increased by 15 - thus, TI increases by 415 for every thousand decimal digits generated (the rationale for this value 415 is outlined in section 4). The effect is to save as much time as possible - in fact, the rate at which decimal digits are generated steadily increases - without affecting the point of detection of the first non-matching digit in the two fractions.

Top of page