Skip to content

Methods

Paweł Górski edited this page Jan 14, 2021 · 5 revisions

Extended Line algorithm

The general idea is presented in this paper. The implementation does add up several things to the original idea, but the core concept stays the same.

This method does work by encoding secret bits line-by-line into the cover text. In each pass, it does encode three bits using smaller encoders. You can say it is a combination of the so-called Random Whitespace, Line Extend, and Trailing Whitespace methods.

How does it work?

So the first bit is encoded by putting double ASCII whitespace (the hex equivalent of 0x20) between randomly selected two words in the line. The next one is embedded by using the so-called pivot, which determines the line length. It boils down to extending the line length above the pivot when the encoder receives bit 1. The last bit is encoded by simply putting the ASCII whitespace (the 0x20) at the end of the line - it is worth noting that such whitespace does not count as a line extension.

How good is it?

The capacity of this method is low in each line we can only encode 3 bits - the exact bitrate is largely dependent on the used pivot. As for imperceptibility, the random double whitespace could raise an alert that there is something encoded in the text - the same goes if the text has some shape. For example, we did not use the most of the cover text capacity, the last lines would be smaller and we would see a drop in line length there. This method is not very robust either, as simple formatting change or sanitization of the text breaks the method.

Security

As this is a kind of classic steganography method, it does work mostly by security through obscurity. The first thing is the usage of double whitespace in randomly selected words - the attacker might want to find the pattern, so it can make this method more obscure and therefore more enduring.

Assuming we only have access to some limited amount of stegotext, we have to get two things - applied permutation of methods (as the order of methods may obviously be changed), and the pivot (which is some kind of a key in our steganography method). The first thing is actually not so hard as we can just brute-force it (we only have 3 possibilities) and take a look at every result. As for pivot, let's take a look at example:

As this story continues, - 24 
I shall  have to⁦         - 15 
speak  again, and at‌     - 19
length,  of this⁠         - 15
creature and record      - 19
his speech. I confess    - 21
I find it very           - 14
difficult to do          - 15
so because I             - 12
could not say            - 13
now, as I could          - 15
never understand         - 16
then, what               - 10
language he              - 11
spoke. It was            - 13
not Latin, in            - 13

The number at the end of the line indicates the line length (skipping doubled and trailing whitespace). At first, we can see that the lines at the beginning of the text are longer than the ones at the end of the text. Let's assume that the pivot is somewhere near the number 15 because at the end (where possibly there were no data left to encode). If we were about to take take the next word on each such line we would get numbers like 18, 18, 17, 21, 22, 19, 17, 17 (starting from the line difficult to do). We can see that indeed every line is longer than 15 or 16, so we could check these numbers as the pivots. Of course, brute-force is still viable, as in this case, we would have to check no more than 24 pivots (without any optimizations). The problem arises when the encoded secret is binary, as we might not be sure which result is the correct one.

ELUV algorithm (Extended Line Unicode Variation)

It builds upon the foundation made by the Extended Line algorithm, but it does change one thing - the trailing whitespace method. Instead of using one character at the line end, it does stretch the set of possible line endings by incorporating several other invisible or wider whitespace characters defined by Unicode. See the source code for the available sets, if you wish.

The number of encoded bits is tied to the size of the whitespace set and is equal to the binary logarithm of the aforementioned size. Is it good to note that the size of the whitespace set should be a power of 2, so we always get the integer number.

How does it work?

So the first bit is encoded by putting double ASCII whitespace (the hex equivalent of 0x20) between randomly selected two words in the line. The next one is embedded by using the so-called pivot, which determines the line length. It boils down to extending the line length above the pivot when the encoder receives bit 1. The last n-bits are encoded by selecting the concrete character from the used set. For example, let's assume we encode 5 bits per pass:

  • if the bits 0b00000 are received, no character is appended,
  • if the bits 0b00010 are received, the character having an index equal to 1 is selected and appended at the end of the line.

How good is it?

The capacity of this method depends on the used set of the characters, though it cannot exceed 7 bits per line. As this method is re-iteration on Extended Line algorithm, it shares the same pros and cons. One thing to note, if the medium in which we share the text does show some of the used invisible characters this might definitely hint the attacker that there is something hidden in it.

Security

Most of the ideas explored in the Extended Line's security section are true, but we have to take in the account that we also don't know the used invisible character set. If we use 32 possible line endings (including no character as the one possibility), we have 32! possibilities when brute-forcing. When we also take into account the pivots and the method ordering it gets even larger, so the brute-force method might not be viable.

Also, the attacker might not even know the exact size of the used character set (unless of course he/she possess large enough amount of stegotexts which exhaust all possibly used characters). This method theoretically seems to be much more resistant to a naive brute-force attack.