Could Synthetic DNA Be The Future Of Data Storage?

Synthetic DNA is the creation of sequences of DNA in a laboratory setting, as opposed to cloning them or gathering them from an existing natural specimen. Synthetic DNA is a very useful process in fields such as medicine and biotechnology, where it can be used for gene therapy, vaccine development, and molecular engineering.

But that cutting-edge research is slowly expanding the use of synthetic DNA from the medical and biotech fields to the computer and technology fields. Researchers from the University of Washington, working together with Microsoft, have discovered a way to utilize synthetic DNA to archive data. To conduct the experiment, the research team first selected four separate image files.

The data from each file was converted to one of the four nucleotides that make up DNA, and the synthetic nucleotides were combined to create snippets of synthetic DNA. Making this breakthrough more promising, the researchers were able to reverse this process; they retrieved the proper sequences from a much larger pool of DNA and were able to reconstruct successfully without losing a single byte of data from all four of the images.

In a second experiment, the team was able to not only encode the archival video from the university’s ‘Voices from the Rwanda Tribunal’ project, they were able to archive and retrieve that data without the data suffering any losses. While the research is still in its early stages, if synthetic DNA technology can be made robust enough to use for mainstream data storage, it could potentially take a data storage center that is as large as a Walmart and filled with the highest capacity data storage devices and then shrink-reduce it down to the size of a sugar cube.

Microsoft & University of Washington Experiments

Microsoft and the University of Washington research team developed a novel approach for encoding the data for these two experiments. They converted the digital data into the equivalent of ZIP codes and street addresses in the DNA sequences. This allowed them to convert the digital data -which is composed of ones and zeroes- into the four nucleotides that make up a strand of DNA.

To read this converted data, PCR (Polymerase Chain Reaction) techniques allowed them to identify the specific ZIP codes they were looking for. To convert the DNA back to an image, video, or other data, the team used DNA sequencing techniques to read the DNA strands, using the addresses to reorder the data properly for conversion.

The research into synthetic DNA as a means of data storage has grown by leaps and bounds over the past two decades. Back in 1999, encoding and recovering a 23-character text message was considered a ‘cutting edge’ use of DNA data storage. By 2013, a U.K.-based Bioinformatics firm claimed they had converted and archived an entire MP3 file of Dr. Martin Luther King’s ‘I Have a Dream’ speech.

According to this same team of researchers, 100 million hours of video footage can fit into approximately a cup of DNA, and data stored in synthetic DNA strands can last for tens of thousands of years.

The technology is not without its shortcomings, however. Utilizing current technologies, researchers can only code synthetic DNA into short strings. There are also issues with errors in the coding process, particularly when letters in the sequence are repeated.

There are also issues with being able to randomly access data contained in the DNA sequence; in order to access it, the entire pool has to be sequenced and decoded, which is comparatively slow process. But once these issues and errors are dealt with, it should make the process of synthetic DNA data storage significantly more efficient and ready for the mass-market.

Katrina is a product specialist, specializing in all your server rack needs!

Microsoft & University of Washington Experiments

About Peter son