© 2014 W.Ehrhardt Last update Aug. 25, 2014
Home CRC/Hash Crypto Misc. Info Links Deutsch

Cryptographic algorithms

On this page there are several cipher algorithms: the 128-bit block ciphers AES and Twofish, the 64-bit block cipher Blowfish, and the stream ciphers Salsa20/XSalsa20/ChaCha and Sosemanuk. Please note: In the section other block ciphers there are some more block ciphers with short descriptions and source code archives: AnubisCamelliaSEEDSerpentSHACAL-2SkipJackXTEA.

The basic/test/demo source codes can be compiled with most current Pascal (TP 5/5.5/6, BP 7, VP 2.1, FPC 1.0/2.0/2.2/2.4/2.6) and Delphi versions (tested with V1 up to V7/9/10/12/17).

Before downloading any software from this site please read this legal notice.


A E S

Here are the AES (Advanced Encryption Standard) related Pascal / Delphi sources: basic AES routines and recommended block cipher modes of operation (with test programs that verify compilation and results), demo programs for file encryption / authentication and decryption / verification, and a look at other implementations.


Last changes:  D17/64-bit adjustments

The block level routines in aes_2013-01-07.zip include separate units for encryption and decryption, the source code for basic encryption/decryption is split into several include files. At the lowest level there are type definitions and common routines. Key sizes of 128, 192, and 256 bits are supported.

Building on these routines, the following block cipher modes of operation are implemented: CBC, CFB128, CFB8, CTR, ECB, OFB as well as OMAC, CMAC, CCM, EAX, GCM, XTS. All modes allow plain and cipher text lengths that need not be multiples of the block length (for ECB, CBC, and XTS cipher text stealing is used for the short block; only one short block is allowed and there must be at least one full block). CTR mode can use 4 built-in incrementing functions or a user supplied one, and provides seek functions for random access reads.

All AES routines are included in the AES_DLL.DLL, there are two interface units for this DLL (one for Virtual Pascal, the second for the other Win32 compilers). The source code archive contains many test programs which are used to check compilation processes, the implementation against known test values, etc.

With the t_gsp128 test program the CPU cycles per block and the (theoretical) encryption speed in MB/s were measured for the different compilers (this program needs the high resolution timer routines from hrtimer). Here are the results for a Pentium 4, 1.8 GHz, Win98, 128 bit key size; the first data pairs are for full tables, the second for compressed tables (8 byte aligned for 32 bit compiler):

Compiler Cyc/Bl-F MB/s-F Cyc/Bl-C MB/s-C
TP5 6302 4.6 5356 5.4
TP55 6321 4.5 6980 4.1
TP6 1436 20.0 1762 16.3
BP7 1493 19.2 1927 14.9
Delphi2 365 78.6 398 72.1
Delphi3 373 76.9 398 72.1
Delphi4 386 74.3 398 72.1
Delphi5 375 76.5 398 72.1
Delphi6 380 75.5 398 72.1
Delphi7 380 75.5 398 72.1
Delphi9 381 76.3 397 72.3
Delphi10 380 75.0 398 72.1
VPC 426 67.3 425 67.3
FPC 2.2 -O3 416 69.0 417 68.8
FPC 1 GoV2 542 53.0 541 53.1

TP5 and 5.5 have pure 8086 Pascal code, TP6/BP7 use 16-bit 80386 BASM code in the core routines. The Delphi 2..7/9/10, VPC, and FPC sources are written in pure 32-bit Pascal code.



In the following table my routines (WE/.., program t_aes_ws) are compared with other open source packages. The source code to build the test programs for the third party packages is in aes_cmpsrc_2006-07-30.zip. The original sources are required in order to compile the test programs, the URLs are given on the Links page. The times are in s for encrypting 512 MB (8000000 blocks of 64 byte) with 128-bit keys using a 1.8 GHz P4 with Win98.

Package/Compiler CTR CFB OFB ECB CBC OMAC
LibTom117/VC6 13.0 16.3 16.4 12.9 13.9 15.6
LTC117/GCC4.2.1 10.1 14.5 10.6 9.0 9.3 14.2
dcpcrypt2/D6 28.8 32.7 28.6 - 32.7 -
DEC5.1/D6 - 13.9 10.9 10.2 11.8 -
StrSecII/D6 9.0 11.5 9.1 7.7 9.3 -
WE/D3 9.0 8.1 8.1 7.7 9.1 9.1
WE/D6 9.0 8.0 8.0 7.7 8.4 9.1
WE/FPC 2.2 -O3 9.9 9.1 9.1 9.0 11.2 9.0
WE/VPC 2.1 10.4 10.2 10.3 9.3 13.9 12.0
WE/BP7 47.1 41.4 41.4 34.3 51.0 45.3


The comparison with one of the fastest known open source implementations (Brian Gladman using C/ASM) is a little more detailed. With the program t_gspeed the cycles/block of the encryption and decryption core routines are measured for all AES key lengths, in fact with the same method that Gladman uses in his aestmr.cpp file. ASM is Gladman's highly optimized NASM version, C++ his optimized standard version, the other entries are my versions from above.

Func/Bit ASM C++ D6 VPC FPC2 BP7
Enc/128 295 385 370 425 546 1490
Dec/128 293 376 382 405 551 1545
Enc/192 352 439 434 532 648 1768
Dec/192 346 443 451 476 653 1723
Enc/256 403 497 498 580 751 1948
Dec/256 407 507 518 549 755 1971


Since the July 2006 release there are conditional defines to support compressed tables: one 2K encryption table (calculated with t_mkctab) replaces the four 1K tables (same for decryption, here the inverse SBox is no longer needed). Besides using less static memory, compressed tables are considered as a countermeasure against cache timing attacks.

It seems that modern x86 processors can use these inherently unaligned/overlapping data from cache with very little speed penalty, IF (a big if) the whole table is at least aligned to a 8 byte boundary. Unfortunately Delphi has no means to force this alignment for a single table. So there are some other defines and diagnostic variables to deal with the proper alignment (gathered in the configuration file aes_conf.inc). If the compressed tables are not 8 byte aligned, the cycle counts nearly double.

Another countermeasure against cache timing attacks is the forced disabling of large tables for the last rounds if compressed tables are used (the compressed tables have copies of the SBox and InvSBox bytes).

Compressed tables should be used if timing data are available for an attacker, he may be able to recover some or all key bits. Here are some links to background material on AES cache timing attacks:
The basic routines are used to build the simple FCA demo program for file encryption / authentication and decryption / verification (a dual OS program running as 16-bit EXE under DOS and as a 32-bit console application supporting long file names under Win32). It uses two modes, the (old) one was inspired by Brian Gladman's encfile with 128 bit AES-CTR mode and HMAC-SHA1-128; the second mode is EAX. The 96 bit salt is a truncated SHA1 hash over date, time, randseed, and TSC; keys and pass phrase verifier are derived from the pass phrase according to RFC 2898.

Since Nov. 2006 the archive contains the contributed fcaes256 unit started by Giorgio Tani using 256 bit AES-EAX mode or AES-CTR and HMAC-Whirlpool-128; fcaes256 is used in Giorgio's PeaZip project. fca256 is the 256 bit equivalent of FCA.

Last changes:  D17/64-bit adjustments

To compile the demo programs from the sources in fca_2013-01-07.zip you need units from aes_2013-01-07.zip, util_YYYY-MM-DD.zip, and crc_hash_YYYY-MM-DD.zip. The FZCA program optionally compresses the data with zlib before encryption / authentication, not available for 64-bit.

Note: the demo programs should only be applied to files with sizes less than 2 GB if compiled with Delphi or 16-bit Pascal (Delphi eof bug and/or 32-bit filesize function). Newer FreePascal versions with 64 bit filesize function may work for such big files.


Twofish

The Twofish algorithm as an alternative to AES was implemented as a response to user requests. The source layout found in tf_2013-01-07.zip is similar to AES, there is code for a DLL and the following modes of operation are supported: CBC, CFB128, CTR, ECB, OFB, OMAC, and EAX. All modes allow plain and cipher text lengths that need not be multiples of the block length and (except OMAC/EAX) support a reset function that re-initializes the chaining variables without the (time consuming) recalculation of the round keys. For the ECB and CBC modes cipher text stealing is used for the last short block, CTR mode can use 4 built-in incrementing functions or a user supplied one, and provides seek functions for random access reads.

Last changes:  D17/64-bit adjustments

My implementation is compared to the same or updated libraries as used in the AES part, the source for the test programs is in tf_cmpsrc_2006-06-15.zip. The times are in s for encrypting 512 MB (8000000 blocks of 64 byte) with 128-bit keys using a 1.8 GHz P4 with Win98.

Package/Compiler CTR CFB OFB ECB CBC OMAC
LibTom112/VC6 11.9 14.6 14.4 10.3 11.2 14.4
LTC112/GCC4.0.1 13.4 14.9 13.5 12.5 11.7 14.2
dcpcrypt2/D6 21.5 25.4 21.3 - 25.7 -
DEC5.1/D6 - 31.9 28.7 27.7 29.6 -
StrSecII/D6 13.0 14.7 13.0 10.7 12.1 -
WE/D3 14.8 14.0 13.9 13.8 16.0 16.3
WE/D6 15.2 14.0 13.9 13.9 15.6 15.0
WE/FPC2 20.4 20.4 20.3 19.5 21.9 20.1
WE/VPC 17.6 17.3 17.6 17.0 22.9 18.4
WE/BP7 72.0 68.5 67.9 61.6 76.0 72.2


Blowfish

The Blowfish source layout found in bf_2013-12-02.zip is similar to AES. There is code for a DLL and the following modes of operation are supported: CBC, CFB64, CTR, ECB,OFB, OMAC, and EAX . All modes allow plain and cipher text lengths that need not be multiples of the block length (for ECB and CBC cipher text stealing is used for the short block). CTR mode can use 4 built-in incrementing functions or a user supplied one, and provides seek functions for random access reads.

All modes (except OMAC/EAX) support a reset function that re-initializes the chaining variables without the (time consuming) recalculation of the round keys.

Last changes:  New BCrypt unit for OpenBSD-style password hashing

My implementation is compared to the same or updated libraries as used in the AES part, the source for the test programs is in bf_cmpsrc_2006-06-15.zip. The times are in s for encrypting 512 MB (8000000 blocks of 64 byte) with 128 bit keys using a 1.8 GHz P4 with Win98.

Package/Compiler CTR CFB OFB ECB CBC OMAC
LibTom112/VC6 12.0 14.1 13.8 11.3 15.6 16.7
LTC112/GCC4.0.1 13.1 12.8 11.8 9.9 10.5 17.8
dcpcrypt2/D6 21.0 28.5 20.8 - 30.3 -
DEC5.1/D6 - 13.7 12.0 12.1 13.7 -
StrSecII/D6 14.1 15.1 14.0 10.9 13.5 -
WE/D3 14.0 12.2 10.8 10.2 12.4 11.5
WE/D6 14.1 12.3 10.8 10.6 12.4 11.4
WE/FPC2 18.2 17.4 18.8 14.5 16.8 16.5
WE/VPC 20.0 21.0 19.2 19.2 26.3 22.6
WE/BP7 64.8 57.1 57.1 40.7 72.8 58.3


Salsa20 / XSalsa20 / ChaCha

Salsa20 by D.J. Bernstein is a stream cipher that is included in the final eSTREAM Portfolio. The core of Salsa20 is a hash function with 64 bytes input and 64 bytes output, the input blocks consist of 128 or 256 bits of key material, the nonce or IV, and block numbers. The hashed output is used as a pseudo random key stream or is xored with the plain text to produce the cipher text.

Last ChaCha changes:  Improved PurePascal code by Martok (speed factor about 3), and 16-bit code with inline RotL (factor ~5 for BP7),

The original ECRYPT Salsa20 specification demands 20 rounds (or 10 double rounds) for the hash function, but D.J. Bernstein himself has formally proposed two variants with 8 and 12 rounds; the portfolio version has 128 bit keys and 12 rounds. My defaults are: 12 rounds for 128 bit keys and 20 rounds for 256 bit keys.

In the following table my implementations (program t_salwe, 1.8 GHz P4 with Win98) are compared with the reference C and ASM code from D.J. Bernstein. The times in s for processing 576 MB (1 Mio 576 byte blocks) are measured: the columns kx show the values for key stream generation with x rounds, and the ex columns have the corresponding encryption times (all with 128 bit keys). The reference C code for the key stream generation is suboptimal because it encrypts a zero stream.

Compiler k8 k12 k20 e8 e12 e20 Rem.
ASM P4 F12 - - - 2.54 3.62 5.57 (1)
GCC 4.3.2 -O3 5.71 7.36 9.50 5.44 7.03 9.12 (2)
VC6 SP4 /O2 6.65 8.29 11.59 6.37 8.07 11.37 (2)
Delphi 3 3.57 4.93 7.93 6.16 7.53 10.23 -
Delphi 6 3.57 4.94 7.94 6.25 7.62 10.32 -
Delphi 10 3.57 4.93 7.94 6.12 7.48 10.21 -
VPC 2.1.279 4.40 5.77 8.51 6.89 8.25 10.98 -
FPC 2.2.4 -O3 4.57 5.93 8.66 7.92 9.33 11.92 -
BP7 Real 4.85 6.19 8.88 5.34 6.68 9.37 -

(1) Calculated from D.J. Bernstein's cycles for 576 bytes
(2) Modified reference C code with static rounds variable

My source implementation in the archive salsa_2013-10-07.zip uses BASM for the core hash function (PurePascal for BIT64, TP5.X uses Pascal with some inline).

Salsa20 is used as cryptographic random number generator in the salsar unit, see the PRNG section on the Misc. page.

XSalsa20 uses 256 bit keys and 192 bit IVs, there is no separate key/IV setup, and packet processing functions have no user supplied context.

ChaCha is one of D.J. Bernstein's variant of Salsa with improved diffusion per round (and conjectured increased resistance to cryptanalysis). My implementation allows 128/256 bit keys, ChaCha functions are exact counterparts of the Salsa functions, except that chacha_xkeysetup allows any even number of rounds > 0. Special thanks goes to EddyHawk for requesting the inclusion of ChaCha and his contributed functions, and to Martok for contributing optimized PurePascal source.


Sosemanuk

Sosemanuk by C. Berbain et al. is a stream cipher that is included in the final eSTREAM Portfolio. The key length is variable between 128 and 256 bits, and IVs up to 128 bit are supported (although my code uses 128 bit IVs). A reduced variant of the Serpent block cipher is used for key and IV setup. The core key stream is generated in 80 byte blocks using (among others) an LFSR (Linear Feedback Shift Register) with ten 32 bit values and a FSM (Finite State Machine) with 64 bits.

Last changes:  D17/64-bit adjustments

In the following table my Pascal implementations (program t_sosewe, 1.8 GHz P4 with Win98) are compared with the reference C code (rows GCC and VC6) from the eSTREAM archive. Times and speeds for processing 800 MB (1 Mio 800 byte blocks) are measured for key stream generation (columns k) and encryption (columns e). The speed does not depend on the key length.

Compiler   k [s] k [MB/s]   e [s] e [MB/s]
GCC 4.3.2 -O3 7.253 110.3 10.220 78.3
VC6 SP4 /O2 7.520 106.4 10.930 73.2
Delphi 3 3.022 264.7 6.868 116.5
Delphi 6 3.013 265.5 6.897 116.0
Delphi 10 3.634 220.1 7.565 105.7
VPC 2.1.279 4.613 173.4 8.356 95.7
FPC 2.2.4 -O3 6.284 127.3 11.543 69.3
BP7 Real 5.490 145.7 7.679 104.2

My source in the archive sosemanuk_2013-01-07.zip is based on sosemanukfast.java from the eSTREAM submission for the 32 bit code and 16 bit key stream generator; for 16 bit key/IV setup a modification of my Serpent code is used.


Other block ciphers

Last changes:  Updated Serpent archive with new fast BASM32 routines contributed by EddyHawk

The source layout in the archives is as usual: the CBC, CFB, CTR, ECB, OFB modes of operation are implemented; Anubis, Camellia, Serpent, and SEED additionally support OMAC and EAX. CCM mode is available for Camellia.
Home CRC/Hash Crypto Misc. Info Links Deutsch