Thanks for the Memory

Storage Solutions, Open Experimentation, and their Relationship to Progress

Animated Shape

Thankfully for our data engineering team, memory & storage are now mostly solved issues.

In computing, two technologies dominate –  Semiconductor memory (flash & RAM) and magnetic data encoding (HDDs). This was not always the case, and the history of computer science is filled with bizarre alternative storage technologies that didn’t quite endure. Brand new memory solutions are much less common today, but some computer scientists are still finding joy in hacking existing services, tools, and even organisms to find fresh ways of managing data.

Join us as we explore the history of strange storage, and look at how technologists are still finding ways to push the envelope – showing how experimentation, play, and the impractical can reveal new horizons for computer science.

Paper Pusher: Early Storage Tech

On paper (literally), computer storage originated in 1801. Data was encoded using holes punched in sheafs of card. So-called punch cards were then chained together in long sequences which stored complex information. Initially, punch cards were  implemented in a decidedly analogue context:  Joseph Marie Jacquard’s eponymous Loom. Here the holes in each card instructed the loom which threads should be raised on each pass of the shuttle – permitting unskilled workers to produce highly complex textile designs without extensive training. This concept was taken by the father of modern computing, Charles Babbage, and in 1837 implemented in his designs for both the Difference and Analytical Engine. Though these devices were never fully constructed during Babbage’s lifetime, with assistance from pioneers like Ada Lovelace punch card driven designs would go on to form the backbone of program and storage methodologies into even the 20th century. Early card computer systems worked using punch cards organised into 45 columns & 12 rows, which were passed between wire brushes and a metal plate. When these brushes were able to pass through punched holes, a circuit was completed. This circuit corresponded to a given number depending on its columnar position – slowly sketching out a numerical function.

The Babbage Difference Engine. Photo By Geni, CC BY-SA 4.0

This process was revolutionary – but of course, very unwieldy. As such, alternatives like magnetic encoding were quickly adopted as standard once discovered. Indeed, with the most extreme modern pipelines handling petabytes of data, a punch card system would be utterly impractical today. Imagine the warehouses needed to run even consumer-familiar systems like Microsoft’s Windows OS! This process of digitalisation of memory technology has shaped the physicality of computing and information design – removing the need for users to organise and store piles of paper, and pushing computers  from room-filling behemothic mainframes to desktop devices. This in turn has facilitated much more complex programs to be written, stored and read – a great example of the positive feedback loop between hardware and software.

Stack of Punched Cards with Red Sorting Line. Photo By Arnold Reinhold, CC BY-SA 3.0

Despite the relative ease of modern computing, some enthusiasts still like to see what they can do with punch card computing. Tools like Masswerk’s Virtual Keypunch allow users to translate text into cards for classic languages including FORTRAN & COBOL, as well as more modern options like Python. Programs stored as cards can then be run using a digital reader. Though this is obviously a totally inefficient method of storing a program, it can be a lot of fun to enjoy some retrofiction by writing something decidedly modern in such a relict format. Other technologists have reversed this structure, interpreting classical physical punchcards using modern technology. One such example from Kyle Owen read punch card data using an older ‘Documation’ reader able to interface with a modern laptop via an Arduino Uno. The Arduino was used to translate IBM’s complex keypunch format to serial and pass data to the laptop, which revealed a lovely linear pattern when explored in a spreadsheet. Blending new and older tech in this case constitutes little more than fun hobby programming – but there is always value in practicing this kind of exploratory process to keep the inquisitive nature of science alive.

The Power of Voice: Fun with Mumble Tubs

Moving on to memory, things get a little more complicated. Even an early solution for RAM demanded some advanced physics, and involved the use of Williams-Kilburn vacuum tubes. These were heated glass cylinders containing electron guns targeted at a phosphorescent screen – operating under the same cathode ray principles that enabled the first non-silk-screen televisions. Basically, electron beams contacting the phosphorescent screen force electrons out of the phosphor. This generates positive charge at the point of beam impact and a negative charge halo around this point – a charge well. The charge well is temporary, and leveraged to store one bit of information in the data writing process. Thus, charge well patterns could encode numerical systems just like punched cards. The major drawback here was the need for large racks of vacuum tubes – meaning the advent of semiconductor memory paralleled magnetic storage’s rise in that it sliced down the necessary size of a computer system.

Memory Pattern on Williams-Kilburn CRT Photo By National Institute of Standards and Technology, Public Domain, Link

The major benefit of Williams-Kilburn vacuum tubes is that they could operate at the speed of light, which is why in the 1950s they replaced a much slower, acoustic-based  technology known as Delay Line Memory (DLM). In classical DLM devices, mercury-filled glass tubes were heated to 40°C, allowing ultrasonic transducers to pass a ‘pulse’ of sound through the hot metal medium – transferring encoded data in this audio format. The delay between the sound signal being emitted and received allowed computers of the time a RAM-like ‘temporary memory’ functionality. Given they emitted constant, irregular pulses of audio – DLM devices were audible; known for an ominous speech-like sound that earned them the nickname ‘mumble tub’. Obviously, giant tubes of burning hot mercury are not the safest thing to have lying around a computer lab, and burn risk combined with strange gurgling to earn DLM the ire of most technicians of the time.

For modern data enthusiasts though, DLM offers some novel, if impractical and unreliable models for storing data. One such application was devised by Dr. Tom Murphy, and uses internet ICMP echo (Ping) signals to encode information to memory.  Dr. Murphy wrote extensive code which passes blocks of data in 32 bit format to IPV4 internet addresses, these are then returned to the original computer via ICMP echoes. DLM can refer to any technology which works by using signal receipt time to preserve information in temporary memory – but to make Dr. Murphy’s internet use case more readily comparable to classical acoustic DLM tech; think of the ICMP echo as the audio wave and the internet signal at IPV4 addresses as the mercury. Given the likelihood of a successful ICMP echo, with enough failsafe addresses this internet DLM tech can operate without data loss or permission from any owners of the IPV4 nodes. This tool is far from ethical, and actually fairly anti-social – but it’s certainly an interesting exercise. By experimenting with memory in this fringe manner Dr. Murphy brings a lot of visibility to computer science, creating web content with views in the millions. This is hugely important in securing the future of the field by piquing the interest of the next generation of technologists.

Visualisation of the Web DLM System, where Darkening Block Colour Indicates Weaker Ping Success for Data Blocks. Number Strings are IPV4 Address Hosts and Their Performance. Photo by Dr. Tom Murphy

Data Dissidents: Hacking YouTube

In a similarly transgressive vein, internet anons have found a way to take advantage of YouTube’s free video hosting service to store their data. GitHub power user DvorakDwarf first conceived of bending YouTube’s ToS by encoding non-video files to a video format and unlocking limitless storage capabilities for all files under 128GB. DvorakDwarf went on to write a fairly sophisticated Rust program that performs exactly this process: embedding data into videos which resemble static and uploading them to the YouTube system. These video files can then be redownloaded and passed through the same tool to decode the original file. The principle behind this tool relies on binary encoding of bytes of data. Given all files are ‘made’ of bytes, and each byte can be represented by a number between 0 and 225, byte values can be written using a string of binary bits. These binary bits are translated to 2×2 pixel blocks of black (0) and white (1), and embedded into a video, providing all the information necessary to recompile the embedded file. Just like Dr. Murphy, DvorakDwarf’s work is genuinely valuable in the way it brings visibility and public interest. These kinds of subversive and novel works cultivate informal researcher communities, groups willing to make things for their own sake and develop interesting solutions to small-scale and non-existent problems. This kind of freedom, in combination with more guided science, can spark creativity in unexpected places and bring true innovation.

Liquid Snake: Non-Solid State Storage

There are many ways in which computer science is meaningfully innovative, but perhaps none moreso than quantum computing. A quantum computer replaces the familiar bit with the exotic qubit. Like bits, qubits constitute the smallest possible parcels of computer data. Unlike bits however, qubits possess the qualities of entanglement and superposition. This means each qubit in a quantum computer system can be linked to any other qubit, and also that each qubit can represent every possible configuration of the data it is made up of. These data characteristics allow quantum computers to perform incredibly complex calculations and solve problems with far more variables than a classical computer can handle. Imagine modelling the behaviour of water molecules in a stream, or grains of sand in a desert. This kind of problem is intensive and challenging for a normal computer, but for a quantum computer quite simple.

How then does memory work in a quantum context? Well, there are many solutions, but a leading option is through the pulsing of radiation in gas. This is termed ‘Atomic Raman memory’, in reference to the Raman effect, where the wavelength of a beam of radiation is distended as it scatters through a medium. By timing input radiation pulses with ‘control’ pulses that indicate read and write commands, information can be encoded to and retrieved from quantum memory. If you think this sounds a little bit like DLM or the Williams-Kilburn vacuum tube, you’re right. The common thread is that data exists as a signal of energy in a specific pattern, which is temporarily stored as it passes through a medium. Suddenly, the hard-won knowledge that built systems like Dr. Murphy’s Web Drive doesn’t seem so impractical.

Just like memory, storage isn’t content to remain in a solid state. Researchers at the Universities of Michigan and New York have successfully stored computer data using  nanoparticles inside a liquid. The nanoparticles in this suspension, called a ‘colloid’, exist in 12-particle clusters of varying arrangements. The structure of these clusters represents a numerical value, and can be altered by heating. In this way, a colloid liquid can serve as a reusable hard drive to store a range of data types. Whilst a fully functional liquid hard drive remains in development, these colloids are likely to find a use in environmental and chemical sciences – with changes in their arrangement being monitored to indicate the presence of chemicals in even trace amounts.

Garden Fun or Data Engineering? You Decide. Photo By Rene Asmussen
Garden Fun or Data Engineering? You Decide. Photo By Rene Asmussen

Wet, Wet, Wetware: Transhumanism in Memory Tech

Speaking of liquid, computer memory has overlapped with human memory in wetware applications.  Most living organisms already contain a vast amount of data stored in the form of DNA. This molecule is quite stable, and its double helix structure means even physically small volumes are information dense. Scientists have begun to pilot DNA as a medium for storing non-biological data. The ultimate goal here is to promote sustainable and environmentally friendly data storage, with leaders in the space proposing everything from computer-compatible hard drives to entire data centres that occupy a fraction of their current phsycal space. Much like a liquid hard drive, the ultimate fate of DNA as a data storage method is currently unclear – but the interaction between advancing understanding of data storage, memory, and DNA manipulation is sure to shape the future of the human race. As the advent of magnetic data encoding revolutionised the computer, so too does DNA data encoding revolutionise the relationship between living beings and technology – proving once again that history, open experimentation and progress are all intrinsically linked.

Even Looking at a 3D Render of DNA, it is Easy to See how this Molecule can Encode So Much Data. Photo By Zephyris CC BY-SA 3.0
Even Looking at a 3D Render of DNA, it is Easy to See how this Molecule can Encode So Much Data. Photo By Zephyris CC BY-SA 3.0

Wrapping Up

Memory is a funny thing. It can be physical or digital. It can be a solid, liquid, or gas. It can be loud or silent, bright or dull. Despite being well solved for standard applications, there is still a lot of fun to be had pushing the limits and practicality of computer memory technology, and this exercise is far from useless. Interest-driven experimentation with alternative approaches to storage has always been a driver of broader advancements in computer science – for example, this is highly visible in the way that magnetic storage options made way for more complex programs and reduced the size of computers by removing the need for tube banks, punch card readers and mumble tubs. Even seemingly ‘useless’ novelty tools like Dr. Murphy’s Web Drive and DvorakDwarf’s YouTube hack could spark the next big jump in memory tech, just by inspiring those with an interest to get involved. In fact, some fringe memory technologies have impact beyond the boundaries of computer science – showing real potential to fuel exciting new developments in quantum and transhumanist domains.

Of course, technologies need to be applicable to a real context to have meaning, and are often inspired by this need. But it is negligent to say that there isn’t also value in just ‘messing around’. Trying things out for fun and exploring technologies that might not be feasible for years to come. If necessity is the mother of invention, then irreverence is its father.

To learn more about harder drives, follow DvorakDwarf on GitHub or Dr. Thomas on YouTube. Alternatively, explore academic resources to stay up to date with advancements in computer and data science.

Join the conversation by commenting below – or click to discover more Distributed Analytics Data Engineering blog posts.

Leave a Reply

Your email address will not be published. Required fields are marked *