Through a generalization of the binary algorithm on page 80 the number of modular multiplications for exponentiation can be reduced even further. The idea is to represent the exponent in a base greater than 2 and to replace multiplication by a in step 3 by multiplication by powers of a. Thus let the exponent e be given by e = (en−1 en−2...e0)M, to a base M yet to be determined. The following algorithm calculates the powers ae mod m.
M-ary algorithm for exponentiation ae mod m
Calculate and store a2 mod m, a3 mod m,..., aM−1 mod m as a table.
Set p ←
mod m and i ← n − 2.
Set p ← pM mod m.
If ei ≠ 0, set p ←
mod m.
Set i ← i − 1; if i ≥ 0, go to step 3.
Output p.
The number of necessary multiplications evidently depends on the number of digits of the exponent e and thus on the choice of M. Therefore, we would like to determine M such that the exponentiation in step 3 can be computed to the greatest extent possible by means of squaring, as in the example above for 216, and such that the number of multiplications by the precomputed powers of a be minimized to a justifiable cost of storage space for the table.
The first condition suggests that we choose M as a power of 2: M = 2k. In view of the second condition we consider the number of modular multiplications as a function of M:
We require
| (6.1) |
squares in step 3 and on average
modular multiplications in step 4, where
is the probability that a digit ei of e is nonzero. If we include the M − 2 multiplications for the computation of the table, then the M-ary algorithm requires on average
| (6.4) | ![]() |
modular squarings and multiplications.
For exponents e and moduli m of, say, 512 binary places and M = 2k we obtain the numbers of modular multiplications for the calculation of ae mod m as shown in Table 6.1. The table shows as well the memory requirement for the precomputed powers of a mod m, which result from the product (2k − 2)CLINTMAXSHORT · sizeof(USHORT).
|
k |
Multiplications |
Memory (in bytes) |
|---|---|---|
|
1 |
766 |
0 |
|
2 |
704 |
1028 |
|
3 |
666 |
3084 |
|
4 |
644 |
7196 |
|
5 |
640 |
15420 |
|
6 |
656 |
31868 |
We see from the table that the average number of multiplications reaches a minimum of 640 at k = 5, where the required memory for each larger k grows by approximately a factor of 2. But what are the time requirements for other orders of magnitude of the exponents?
Table 6.2 gives information about this. It displays the requirements for modular multiplication for exponentiation with exponents with various numbers of binary digits and for various values of M = 2k. The exponent length of 768 digits was included because it is a frequently used key length for the RSA cryptosystem (see Chapter 16). The favorable numbers of multiplications appear in boldface type.
|
Number of binary digits in the exponent |
||||||||
|---|---|---|---|---|---|---|---|---|
|
k |
32 |
64 |
128 |
512 |
768 |
1024 |
2048 |
4096 |
|
1 |
45 |
93 |
190 |
766 |
1150 |
1534 |
3070 |
6142 |
|
2 |
44 |
88 |
176 |
704 |
1056 |
1408 |
2816 |
5632 |
|
3 |
46 |
87 |
170 |
666 |
996 |
1327 |
2650 |
5295 |
|
4 |
52 |
91 |
170 |
644 |
960 |
1276 |
2540 |
5068 |
|
5 |
67 |
105 |
181 |
640 |
945 |
1251 |
2473 |
4918 |
|
6 |
98 |
135 |
209 |
656 |
954 |
1252 |
2444 |
4828 |
|
7 |
161 |
197 |
271 |
709 |
1001 |
1294 |
2463 |
4801 |
|
8 |
288 |
324 |
396 |
828 |
1116 |
1404 |
2555 |
4858 |
In consideration of the ranges of numbers for which the FLINT/C package was developed, it appears that with k = 5 we have found the universal base M = 2k, with which, however, there is a rather high memory requirement of 15 kilobytes for the powers a2, a3,..., a31 to base a that are to be precomputed. The M-ary algorithm can be improved, however, according to [Cohe], Section 1.2, to the extent that we can employ not M − 2, but only M/2, premultiplications and thus require only half the memory. The task now additionally consists in the calculation of the power ae mod m, where e = (en−1en−2...e0)M is the representation of the exponent to the base M = 2k.
M-ary Algorithm for exponentiation with reduced number of premultiplications
Compute and store a3 mod m, a5 mod m, a7 mod m,...,
mod m.
If en−1 = 0, set p ← 1.
If en−1 ≠ 0, factor en−1 = 2tu with odd u. Set p ← au mod m.
In each case set i ← n − 2.
If ei = 0, set p ←
mod m by calculating
mod m (k-fold squaring modulo m).
If ei ≠ 0, factor ei = 2tu with odd u; set p ←
mod m and then p ← pau mod m; now set p ←
mod m.
Set i ← i − 1; if i ≥ 0, go to step 3.
Output p.
The trick of this algorithm consists in dividing up the squarings required in step 3 in a clever way, such that the exponentiation of a is taken care of together with the even part 2t of ei. Within the squaring process the exponentiation of a by the odd part u of ei remains. The balance between multiplication and squaring is shifted to the more favorable squaring, and only the powers of a with odd exponent need to be precomputed and stored.
For this splitting one requires the uniquely determined representation ei = 2tu, u odd, of the exponent digit ei. For rapid access to t and u a table is used, which, for example, for k = 5 is displayed in Table 6.3.
|
ei |
t |
u |
ei |
t |
u |
ei |
t |
u |
|---|---|---|---|---|---|---|---|---|
|
0 |
0 |
0 |
11 |
0 |
11 |
22 |
1 |
11 |
|
1 |
0 |
1 |
12 |
2 |
3 |
23 |
0 |
23 |
|
2 |
1 |
1 |
13 |
0 |
13 |
24 |
3 |
3 |
|
3 |
0 |
3 |
14 |
1 |
7 |
25 |
0 |
25 |
|
4 |
2 |
1 |
15 |
0 |
15 |
26 |
1 |
13 |
|
5 |
0 |
5 |
16 |
4 |
1 |
27 |
0 |
27 |
|
6 |
1 |
3 |
17 |
0 |
17 |
28 |
2 |
7 |
|
7 |
0 |
7 |
18 |
1 |
9 |
29 |
0 |
29 |
|
8 |
3 |
1 |
19 |
0 |
19 |
30 |
1 |
15 |
|
9 |
0 |
9 |
20 |
2 |
5 |
31 |
0 |
31 |
|
10 |
1 |
5 |
21 |
0 |
21 |
To calculate these values we can use the auxiliary function twofact_l(), which will be introduced in Section 10.4.1. Before we can program the improved M-ary algorithm there remains one problem to be solved: How, beginning with the binary representation of the exponent or the representation to base B = 216, do we efficiently obtain its representation to base M = 2k for a variable k > 0? It will be of use here to do a bit of juggling with the various indices, and we can "mask out" the required digits ei to base M from the representation of e to base B. For this we set the following: Let (εr−1εr−2...ε0)2 be the representation of the exponent e to base 2 (we need this on account of the number r of binary digits). Let (eu−1eu−2...e0)B be the representation of e as a CLINT type to base B = 216, and let
be the representation of e to the base M = 2k, k ≤ 16 (M should not be greater than our base B). The representation of e in memory as a CLINT object e_l corresponds to the sequence [u + 1],[e0],[e1],...,[eu−1],[0] of USHORT values e_l[i] for i = 0,..., u + 1; one should note that we have added a leading zero.
Let f :=
, and for i = 0,..., f let si :=
and di := ki mod 16. With these settings the following statements hold:
There are f + 1 digits in
; that is, n − 1 = f.
contains the least-significant bit of the digit
.
di specifies the position of the least-significant bit of
in
(counting of positions begins with 0). If i < f and di > 16 − k, then not all the binary digits of
are in
; the remaining (higher-valued) bits of
are in
+1. The desired digit
thus corresponds to the k least-significant binary digits of
As a result we have for i Î {0,..., f} the following expression for determining
:
Since for the sake of simplicity we set e_l [sf + 2] ← 0, this expression holds as well for i = f.
We have thus found an efficient method for accessing the digits of the exponent in its CLINT representation, which arise from its representation in a power-of-two base 2k with k ≤ 16, whereby we are saved an explicit transformation of the exponent. The number of necessary multiplications and squarings for the exponentiation is now
where in comparison to μ1(k) (see page 86) the expenditure for the precomputations has been reduced by half. The table for determining the favorable values of k (Table 6.4) now has a somewhat different appearance.
|
Number of binary digits in the exponent |
||||||||
|---|---|---|---|---|---|---|---|---|
|
k |
32 |
64 |
128 |
512 |
768 |
1024 |
2048 |
4096 |
|
1 |
47 |
95 |
191 |
767 |
1151 |
1535 |
3071 |
6143 |
|
2 |
44 |
88 |
176 |
704 |
1056 |
1408 |
2816 |
5632 |
|
3 |
44 |
85 |
168 |
664 |
994 |
1325 |
2648 |
5293 |
|
4 |
46 |
85 |
164 |
638 |
954 |
1270 |
2534 |
5066 |
|
5 |
53 |
91 |
167 |
626 |
931 |
1237 |
2459 |
4904 |
|
6 |
68 |
105 |
179 |
626 |
924 |
1222 |
2414 |
4798 |
|
7 |
99 |
135 |
209 |
647 |
939 |
1232 |
2401 |
4739 |
|
8 |
162 |
198 |
270 |
702 |
990 |
1278 |
2429 |
4732 |
Starting with 768 binary digits of the exponent, the favorable values of k are larger by 1 than those given in the table for the previous version of the exponentiation algorithm, while the number of required modular multiplications has easily been reduced. It is to be expected that this procedure is on the whole more favorable than the variant considered previously. Nothing now stands in the way of an implementation of the algorithm.
To demonstrate the implementation of these principles we select an adaptive procedure that uses the appropriate optimal value for k. To accomplish this we rely again on [Cohe] and look for, as is specified there, the smallest integer value k that satisfies the inequality
| (6.7) | ![]() |
which comes from the formula μ2(k) given previously for the number of necessary multiplications based on the condition μ2(k + 1) − μ2(k) ≥ 0. The constant number of modular squarings ⌊log2 e⌋ for all algorithms for exponentiation introduced thus far is eliminated; here only the "real" modular multiplications, that is, those that are not squarings, are considered.
The implementation of exponentiation with variable k requires a large amount of main memory for storing the precomputed powers of a; for k = 8 we require about 64 Kbyte for 127 CLINT variables (this is arrived at via (27 − 1) * sizeof(USHORT) * CLINTMAXSHORT), where two additional automatic CLINT fields were not counted. For applications with processors or memory models with segmented 16-bit architecture this already has reached the limit of what is possible (see in this regard, for example, [Dunc], Chapter 12, or [Petz], Chapter 7).
Depending on the system platform there are thus various strategies appropriate for making memory available. While the necessary memory for the function mexp5_l() is taken from the stack (as automatic CLINT variables), with each call of the following function mexpk_l() memory is allocated from the heap. To save the expenditure associated with this, one may imagine a variant in which the maximum needed memory is reserved during a one-time initialization and is released only at the end of the entire program. In each case it is possible to fit memory management to the concrete requirements and to orient oneself to this in the commentaries on the following code.
One further note for applications: It is recommended always to check whether it suffices to employ the algorithm with the base M = 25. The savings in time that comes with larger values of k is relatively not so large in comparison to the total calculation time so as to justify in all cases the greater demand on memory and the thereby requisite memory management. Typical calculation times for various exponentiation algorithms, on the basis of which one can decide whether to use them, are given in Appendix D.
The algorithm, implemented with M = 25, is contained in the FLINT/C package as the function mexp5_l(). With the macro EXP_L() contained in flint.h one can set the exponentiation function to be used: mexp5_l() or the following function mexpk_l() with variable k.
static int twotab[] =
{0,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5, ...};
static USHORT oddtab[]=
{0,1,1,3,1,5,3,7,1,9,5,11,3,13,7,15,1,17,9,19,5,21,11,23,3,25,13, ...};
int
mexpk_l (CLINT bas_l, CLINT exp_l, CLINT p_l, CLINT m_l)
{
CLINT a_l, a2_l; clint e_l[CLINTMAXSHORT + 1]; CLINTD acc_l; clint **aptr_l, *ptr_l; int noofdigits, s, t, i; unsigned int k, lge, bit, digit, fk, word, pow2k, k_mask;
if (EQZ_L (m_l))
{
return E_CLINT_DBZ;
}
if (EQONE_L (m_l))
{
SETZERO_L (p_l); /* modulus = 1 ==> residue = 0 */
return E_CLINT_OK;
}
cpy_l (a_l, bas_l); cpy_l (e_l, exp_l);
if (EQZ_L (e_l))
{
SETONE_L (p_l);
return E_CLINT_OK;
}
if (EQZ_L (a_l))
{
SETZERO_L (p_l);
return E_CLINT_OK;
}
lge = ld_l (e_l);
k = 8;
while (k > 1 && ((k - 1) * (k << ((k - 1) << 1))/((1 << k ) - k - 1)) >= lge - 1)
{
--k;
}
pow2k = 1U << k;
k_mask = pow2k - 1U;
if ((aptr_l = (clint **) malloc (sizeof(clint *) * pow2k)) == NULL)
{
return E_CLINT_MAL;
}
mod_l (a_l, m_l, a_l);
aptr_l[1] = a_l;
if (k > 1)
{
if ((ptr_l = (clint *) malloc (sizeof(CLINT) * ((pow2k >> 1) - 1))) == NULL)
{
return E_CLINT_MAL;
}
aptr_l[2] = a2_l;
for (aptr_l[3] = ptr_l, i = 5; i < (int)pow2k; i+=2)
{
aptr_l[i] = aptr_l[i - 2] + CLINTMAXSHORT;
}
msqr_l (a_l, aptr_l[2], m_l);
for (i = 3; i < (int)pow2k; i += 2)
{
mmul_l (aptr_l[2], aptr_l[i - 2], aptr_l[i], m_l);
}
}
*(MSDPTR_L (e_l) + 1) = 0;
noofdigits = (lge - 1)/k; fk = noofdigits * k;
word = fk >> LDBITPERDGT; /* fk div 16 */ bit = fk & (BITPERDGT-1U); /* fk mod 16 */
switch (k)
{
case 1:
case 2:
case 4:
case 8:
digit = ((ULONG)(e_l[word + 1] ) >> bit) & k_mask;
break;
default:
digit = ((ULONG)(e_l[word + 1] | ((ULONG)e_l[word + 2]
<< BITPERDGT)) >> bit) & k_mask;
}
if (digit != 0) /* k-digit > 0 */
{
cpy_l (acc_l, aptr_l[oddtab[digit]]);
t = twotab[digit];
for (; t > 0; t--)
{
msqr_l (acc_l, acc_l, m_l);
}
}
else /* k-digit == 0 */
{
SETONE_L (acc_l);
}
for (--noofdigits, fk -= k; noofdigits >= 0; noofdigits--, fk -= k)
{
word = fk >> LDBITPERDGT; /* fk div 16 */
bit = fk & (BITPERDGT - 1U); /* fk mod 16 */
switch (k)
{
case 1:
case 2:
case 4:
case 8:
digit = ((ULONG)(e_l[word + 1] ) >> bit) & k_mask;
break;
default:
digit = ((ULONG)(e_l[word + 1] | ((ULONG)e_l[word + 2]
<< BITPERDGT)) >> bit) & k_mask;
}
if (digit != 0) /* k-digit > 0 */
{
t = twotab[digit];
for (s = k - t; s > 0; s--)
{
msqr_l (acc_l, acc_l, m_l);
}
mmul_l (acc_l, aptr_l[oddtab[digit]], acc_l, m_l);
for (; t > 0; t--)
{
msqr_l (acc_l, acc_l, m_l);
}
}
else /* k-digit == 0 */
{
for (s = k; s > 0; s--)
{
msqr_l (acc_l, acc_l, m_l);
}
}
}
cpy_l (p_l, acc_l);
The various processes of M-ary exponentiation can be clarified with the help of a numerical example. To this end let us examine the calculation of the power 1234667 mod 18577, which will be carried out by the function mexpk_l() in the following steps:
Precomputations
The representation of the exponent e = 667 can be expressed to the base 2k with k = 2 (cf. the algorithm for M-ary exponentiation on page 86), whereby the exponent e has the representation e =
.
The power a3 mod 18577 has the value 17354. Further powers of a do not arise in the precomputation because of the small size of the exponent.
Exponentiation loop
|
exponent digit ei = 2tu |
21 · 1 |
21 · 1 |
20 · 1 |
21 · 1 |
20 · 3 |
|
p ← p2 mod n |
- |
14132 |
13261 |
17616 |
13599 |
|
p ← |
- |
- |
4239 |
- |
17343 |
|
p ← pau mod n |
1234 |
13662 |
10789 |
3054 |
4445 |
|
p ← p2 mod n |
18019 |
7125 |
- |
1262 |
- |
Result
p = 1234667 mod 18577 = 4445.
As an extension to the general case we shall introduce a special version of exponentiation with a power of two 2k as exponent. From the above considerations we know that this function can be implemented very easily by means of k-fold exponentiation. The exponent 2k will be specified by k.
int
mexp2_l (CLINT a_l, USHORT k, CLINT p_l, CLINT m_l)
{
CLINT tmp_l;
if (EQZ_L (m_l))
{
return E_CLINT_DBZ;
}
if (k > 0)
{
cpy_l (tmp_l, a_l);
while (k-- > 0)
{
msqr_l (tmp_l, tmp_l, m_l);
}
cpy_l (p_l, tmp_l);
}
else
{
mod_l (a_l, m_l, p_l);
}
return E_CLINT_OK;
}
|
Team-Fly |