Storing Data in QR Codes
Encoding data on physical media for Dummy Programmers

In “Paper as a Digital Storage Medium” I described the benefits of storing data on physical media, including long-term storage capacity, publisher anonymity, and privacy of distribution. The question that was not answered was, how do you do it?
Computers are computers, and use disks and files… how would you manage to store digital information on paper!?
This is an idea I've had kicking around since 2017 while I was teaching introductory programming at my local community college. It was my hope that it could be used as a way to identify students interested in very particular types of computing. It was also seen as a way to tap into student interest in social justice and social change. I would put posters up around the school with an invitation to a Data Analysis club, but encoded in a way that only interested students would spot it.
The idea of trying to keep it a secret while not panicking administrators didn't sound like it would fly. Instead, it became a lesson plan, and as such, a demonstrative, hands-on tutorial was created.
A simple demonstration is probably the easiest… it certainly is the most fun. I encourage you to play along.
How do you store digital as an image?
Remember when you were in grade school, and your teacher separated you from your friend so you couldn't talk to one another? Naturally, you created a secret code with your friend and started passing notes.
We are going to create a secret message to pass on to a friend.
Pre-Requisites
Optionally, going through this exercise with a friend might be fun too. It's like passing secret messages around the class in elementary school.
Step 1: Inspect the file
Open your sample file with a hex editor. You should see something like this…
|50 4B 03 04 14 00 00 08 00 00 25 30 8B 51|PK........%0.Q│ |6F 61 AB 2C 14 00 00 00 14 00 00 00 08 00│oa.,..........│
|00 00 6D 69 6D 65 74 79 70 65 61 70 70 6C│..mimetypeappl│
│69 63 61 74 69 6F 6E 2F 65 70 75 62 2B 7A│ication/epub+z│
│69 70 50 4B 03 04 14 00 00 08 08 00 25 30│ipPK........%0│
│8B 51 00 00 00 00 02 00 00 00 00 00 00 00│.Q............│
│09 00 00 00 4D 45 54 41 2D 49 4E 46 2F 03│....META-INF/.│
│00 50 4B 03 04 14 00 00 08 08 00 25 30 8B│.PK........%0.│
│51 90 9F 06 74 9A 00 00 00 F4 00 00 00 16│Q...t.........|
Most people don't bother to inspect the actual contents of files (HINT: That's why people prefer data transfers as text) but you can get a lot of interesting information by bypassing the computer programs designed to use them.
For example, with a bit of know-how, we can immediately tell two things about the file we have open:
- It is probably a
zipfile
We can see that the first two bytes of the file are the values50and4B(in hex notation). Interestingly, these values correspond to theASCIIcharactersPK. Many years ago, it became customary to sign the start of your application's files so that you could tell your files apart from other formats.PKstands for PKZip by PKware, the original company that created the file format. - It is an
epubfile
Secondly, I can see that themimetypeisapplication/epub+zip. So it's an ePUBfile (and confirmed as a zip).
There is a lot of information at the binary level.
Serialise
The goal is to convert the file to a readable format. The easiest way to do this is to convert one byte at a time.
This has the advantage of doing it in order. Order counts, so by reading from the start to the end in order, we ensure that the person we send the message to gets it in the correct order.
So let's read the first byte, it is the hexadecimal value 50.
Write that down on your grid paper (and maybe the next couple of values while we are at it)

Convert to Binary
What we are looking for is a sequence of bits, and each hex digit represents 4 bits (half a byte or a “nibble”). So we need to convert each digit to its binary form.
Taking the first one:
- 5₁₆
- 5₁₀
- 0111₂
Don't be afraid to use your computer's calculator.

Now, because this is a secret note, we need to remove our original working numbers. Grab your scissors and cut the first column off the paper.
(I'm switching my notation to ASCII art… for those that want to play along in a text editor)
0101
0000
0100
1011
0000
0011
????
????
____
Convert to Image
A barcode is just an image that can be interpreted as numbers. The key for us is that we don't have to use the symbols 0 and 1, any two symbols that are easily distinguishable from one another would work just fine.
This is similar to how Morse Code works, in which a binary sequence of characters are represented by different lengths of tones. It doesn't matter what is used, as long as the two things are distinguishable.
One really good symbol that would be easily distinguishable by a computer with a camera would be light and dark. This is convenient because colour can easily be printed on paper. We can use "the absence of pigment" (light) to represent 0, and "the presence of pigment" to represent 1.
Remember how I said to use a pencil?
-
Take your eraser, and erase every
0 -
Take your pencil, and colour every
1
______
_ █ █_
_ _
_ █ _
_█ ██_
______
_ _
_ ██_
_????_
_????_
______
My Eyes are Buggy
This is coming along nicely. We now have a series of binary digits encoded as bars of colour. This is also known as a “barcode”.
There's still one problem.
I'm getting old.
My eyes aren't what they used to be.
It's hard for me to follow where the lines start and stop.
This is especially problematic on lines with nothing in them at all. The number zero (line number 2) has “nothing” to show that it is a zero. To help our friend that needs to decode our secret message, let's put some guidelines in place. This will help them see where lines start and stop, or that there is a line at all. The decoder also needs some way to know how big the squares are to help distinguish where digits start and stop.
You will notice I left some placeholders in my notation, let's fill them in:
- Colour all the blocks down the left
- Colour every other block across the top
█ █ █
█ █ ██
█
█ █ █
██ ██
██████
█ █ █
█ █
█ ██
█????█
█????
██████With these guides in place, we can tell where blocks start and how big each bit square is on the paper.
Huh… that looks an awful lot like a 2-d barcode.

Homework
Considering an ANSI-character table, and considering bytes come in 8-bit sets, it is probably a little easier to write the blocks in 8x8 grids:

Extra: Parallel Delivery
You will notice, that with these guidelines in place, we can treat each 8x8 grid as a separate block to decode. This makes it a little bit easier on us mentally, as well as offering another way to make our message easier to decode by the receiver:
- On the back of each block, write its sequence number.
- Cut-out each block
Now, when your receiver gets all the blocks, they can share the work with some helpers. Each person can encode their little block, and the blocks can be stitched back together later.
I write stuff about software and development in general. If you have found anything I've done useful, let me knowwww.buymeacoffee.com
Further Reading
It is time to point out that this is a simplified example. This was a demonstration that such a thing is possible.
When going from nothing to something, the first step is to understand that it's possible. Now that you understand that such a thing is possible, its time to go on to the insane ways to make it better:
- The Unicode Miracle One of the big problems with the above solution to the barcode problem is the number of wasted bits. Take a look at Unicode if you want to see how much information can be packed into a bit.
- Wikipedia: Datamatrix Once you get your head wrapped around that, consider DataMatrix and how it packs more data into the same space. (then just know that there is an actual spec to conform to GS1 Datamatrix Specification )
- 1D, 2D, and 3D Barcodes Now have your mind blown by 3D Barcodes. (For the record, I reject 4D Barcodes as a matter of principle)
If you are interested in a practical application and this just whets your appetite, I encourage you to check out Barcode Epub, a barcode-to-epub converter suitable for anonymous transfer and archiving of everything from Digital Marketing posters to publishing data used in your thesis.
Maybe you can even pin it to a message board.