Users >> Pisto-UniversalCodes
Pisto-Universal Codes

The C++ codes listed here are protected by GPL v3. The algorithms behind these codes should remain free of any patent.

Universal Codes to Compress Data

Please, take a look at this topic on Wikipedia.org, and look at the graphs on the right of that topic's page.

Once it is done, the current page on 3niti.org will present you three different universal codes adapted for balanced ternary integers: Fibonacci, Hybrid Fibonacci/Elias (my own creation) and Levenstein.

These universal codes could be used to compress an array of integers that are generally very close to 0. In page Pisto-DataTypes, I have introduced the concept of encoding two integer components in a single integer. This could be useful for addresses (ID+Address) and for compacted/compressed float (cfloat: exponent+mantissa).

With the results shown in the current page, I am not able to suggest one preferable universal code for 90-trits integers. I will let the choice open.

Fibonacci Code with Trits - SignedFibo3

First of all, the name "Fibonacci" may not be appropriate for the coding method with trits, since the current method no longer uses numbers of the Fibonacci sequence. Maybe we should call it "Zeckendorf coding". In fact, the Zeckendorf's theorem is the basis of the Fibonacci coding. On the other hand, the sequence that is used in the current method (A028859) has been found by many other researchers (including myself). So, maybe it is simplest calling the current method "Fibonacci" or SignedFibo3.

The main characteristic of this coding method is that we have a criterion that tells us where the code stops. In the original method for bits, the last two bits of an encoded integer are "11". So, we can decode the integer until we reach two consecutive bits "1". All bits except the last one are multiply by their corresponding Fibonacci number, and all products are added together. All this works because of the Zeckendorf's theorem. Fortunately, there is an equivalent for ternary integers.

For ternary integers, the criterion that ends the code is "PP". As in the original method for bits, the method only encodes positive integers. In order to encode negative numbers, and value 0, we must prepend a trit which tells us the sign.

Before going any further, let me introduce you the chosen sequence (A028859):

 1 3 8 22 60 164 ...

 f(0) = 1
 f(1) = 3
 f(n) = 2 * (f(n-1) + f(n-2))

Here is an example with integer 40:

Sequence1382260
TritsNPONP
Products-130-2260
Sum40
SignedFibo3 CodeP NPONPP

In the previous table, the first "P" stands for the sign (positive), and the last "P" is the stopping criterion. So, to encode 40, we need 7 trits. Integer -40 would have been encoded as "N NPONPP" instead.

Here is a table containing the maximum value for each number of trits (the sign counts in the number of trits):

Nb TritsSignFibo3 CodeValue
1O 0
3PPP1
4POPP3
5PPOPP9
6POPOPP25
7PPOPOPP69
8POPOPOPP189
9PPOPOPOPP517
10POPOPOPOPP1413
11PPOPOPOPOPP3861
12POPOPOPOPOPP10549
13PPOPOPOPOPOPP28821
14POPOPOPOPOPOPP78741
15PPOPOPOPOPOPOPP215125
16POPOPOPOPOPOPOPP587733
............
47P...POPOPOPOPOPP1.99679e+019
48P...OPOPOPOPOPOPP5.45533e+019
............
89P...POPOPOPOPOPOPP4.29399e+037

Please note that code "P NNNNN...NNNPP" is allowed, since the stopping criterion is only "PP". It would have been possible to use two stopping criteria (PP and NN) instead of using an additional trit for the sign, but this method would have generated longer codes. So, we are better keeping the sign apart, since it gives us the opportunity to encode value "0" with only one trit.

The way it should work for cfloat in CPU's multimedia registers is described in the following table:

cfloat with SignedFibo3-coded exponent
SignExponent (absolute value)Mantissa
O LST ................ MST
P or N(P?(O or N))*PPLST ................ MST
Min. 1 trit, max. 89 tritsMin. 1 trit, max. 89 trits
Min. 2 trits, max. 90 trits

In practice, we need at least 4 trits for the whole cfloat, because we need to support all the following cases: +Inf, -Inf, NaN, -1, 0, 1. If not, the 2-trits cfloat would only support -1, 0, 1 and NaN (because the encoded exponent would be corrupted if it is not O).

With at least 4 trits, the NaN case is defined by a O at the most significant trit of the mantissa, but the encoded floating point value is not 0 (which is all made of "O"s). For example: PPPO is NaN because the most significant trit of the mantissa is "O", but the encoded value is not 0. Finally, if the Fibo3 code is corrupted (the sign is P or N, but the whole cfloat contains no "PP"), then this is NaN.

The + or - Infinity is defined by a sign P or N and a valid Fibo3 code that leaves no trit for the mantissa. Then, the sign of the exponent becomes the sign of Infinitiy. For example with 4 trits: NOPP is -Inf, and POPP is +Inf.

Hybrid Fibonacci/Elias Code with Trits - FiboElias3

There are three main Elias codes: Elias gamma code, Elias delta code and Elias omega code. All these methods have the following structure:

In the hybrid method I am presenting here, I have used the (unsigned) Fibo3 code (previous section) to encode the length of the integer. The sign is given by the integer itself, so all trits are preserved. Finally, everything is encoded in little endian.

Here is a table containing the maximum value for each number of trits:

Nb TritsLengthIntegerValue
3PPO0
3PPP1
5NPPPP4
6OPPPPP13
8NNPPPPPP40
9ONPPPPPPP121
10PNPPPPPPPP364
11NOPPPPPPPPP1093
12OOPPPPPPPPPP3280
13POPPPPPPPPPPP9841
15NNNPPPPPPPPPPPP29524
16ONNPPPPPPPPPPPPP88573
17PNNPPPPPPPPPPPPPP265720
18NONPPPPPPPPPPPPPPP797161
............
47OPONPP...PPPPPPPPPPP1.82365e+019
48NNPNPP...PPPPPPPPPPPP5.47095e+019
............
89OOONNPP...PPPPPPPPPPPPP6.6514e+038

The way it should work for cfloat in CPU's multimedia registers is described in the following table:

cfloat with FiboElias3-coded exponent
LengthSigned ExponentMantissa
(P?(O or N))*PPLST ......... MSTLST ................ MST
Min. 3 trits, max. 89 tritsMin. 1 trit, max. 87 trits
Min. 4 trits, max. 90 trits

Here are the special cases for the smallest cfloat with FiboElias3-coded exponent:

-1: PP O N0: PP O O1: PP O P
-Inf: PP P NInf: PP P P
NaN: PP (P|N) O or corrupted FiboElias3 code

Levenstein Code with Trits - SignedLeven3

As the Elias Omega code, the Levenstein method is a technique that recursively encodes the length of the last prepended integer in a code. In other words, you write down the integer you want to encode. Then, you write down the number of bits or trits that have just been written down, minus 1. And so on, until you obtain a one or two bits or trits integer. Finally, you use 0 or O to finalize the code.

In the case of the Levenstein code, the value 0 is encoded as 0. Furthermore, all most significant bits or trits of all written integers are placed together at the beginning of the code. This let us use the little endian encoding. Here is an example with bits in little endian:

 10 in base 10 is "1010" in base 2 or "0101" in little endian:

 0101 -> 4 bits, encode 4-1=3
 11 -> 2 bits, encode 2-1=1
 1 -> 1 bit, encode 1-1=0
 0 -> end of process

 0 1 11 0101 -> Take all most significant bits

 0 1  1    1 -> reverse the order -> 1110
 _ _ 1_ 010_

 Finale code: 1110 1 010

What happens when using trits instead of bits? The original integer is written as it is. Then, the number of written trits, minus 1, is the next number to write down. But, before we write it down, we may discard the most significant trit P only if the second most significant trit is a N. Finally, when we reach the trit O (end of process), all remaining most significant trits are placed together in the header of the code. Here is an example:

 365 in base 10 is NNNNNNP in balanced ternary notation:

     Even if the original integer satisfies the (N|O|P)*NP pattern, keep all its trits.
     This is mandatory since we want to encode (and being able to decode) signed integers.
 NNNNNNP -> 7 trits, encode 7-1=6
     ONP -> Pattern (N|O|P)*NP is present, so get rid of the last P
 ON -> 2 trits, encode 2-1=1
     P -> Pattern (N|O|P)*NP is absent
 P -> 1 trit, encode 1-1=0
     O -> Pattern (N|O|P)*NP is absent
 O -> End of process

 O P ON NNNNNNP -> Take all most significant trits

 O P  N       P -> reverse the order -> PNPO (the header)
 _ _ O_ NNNNNN_ -> The body

 Finale code: PNPO O NNNNNN

Here is a table containing the maximum value for each number of trits:

Nb TritsHeaderBodyValue
1O 0
2PO 1
4PPOP4
5PNOPP13
8PPPOO PPP40
9PPPOP PPPP121
10PNPON PPPPP364
11PNPOO PPPPPP1093
12PNPOP PPPPPPP3280
14PPNONO PPPPPPPP9841
15PPNOOO PPPPPPPPP29524
16PPNOPO PPPPPPPPPP88573
17PPNONP PPPPPPPPPPP265720
18PPNOOP PPPPPPPPPPPP797161
............
89PPPPOP PNOO PPP...PPP7.39044e+037

We may observe that the first trit of the header is the sign of the encoded integer.

The way it should work for cfloat in CPU's multimedia registers is described in the following table:

cfloat with SignedLeven3-coded exponent
HeaderBodyMantissa
(P or O or N)*O...LST ................ MST
Min. 1 trits, max. 89 tritsMin. 1 trit, max. 89 trits
Min. 2 trits, max. 90 trits

To support -Inf, -1, 0, 1, +Inf and NaN, we need at least 3 trits for the whole cfloat:

-1: O ON0: O OO1: O OP
-Inf: PO NInf: PO P
NaN: (P|N)O O or corrupted SignedLeven3 code

Comparison

The three previous coding methods for integer numbers are now compared together with two other types of codes which are not universal: PureInt3, which is the integer written normally, and FixedInt3Length, which is like a Elias code (length+integer) with the length of the integer that is always encoded with 5 trits. I have chosen 5 trits because, in our application, the maximum number of trits is less than 121 (PPPPP).

The comparison has been done with these C++ files.

The next graph shows, for each coding method, the number of trits needed for each value greater than 0 (because of the semi-log graph).

The next graph stops at 32 trits, so it looks like a zoom on smaller values of the previous graph:

So! Which coding method is the best?

While none of the previous method is optimal for all cases, it is practically possible to choose the best of them according to the length of a cfloat:

Conclusion

While all this theory is very interesting (my personnal opinion), I did not talk about how it should be implemented in processors. In fact, which method is the easiest and the fastest one to compute? Another, more important, question: is all this really useful for scientists and programmers? I will leave these questions open.

Page last modified on November 15, 2009, at 12:45 PM

  © 2015 TERNARY RESEARCH CORPORATION All rights reserved. Users' works are copyrighted by their respective authors.