Leaving an Imprint for Eternity: Microsoft, UWash, Twist Bioscience Project Will Store Data in DNA

Recognizing the potential of synthetic DNA to deliver life-changing benefits in a multitude of ways, a team of researchers from the Molecular Information Systems Lab at the University of Washington, Microsoft and Twist Bioscience are creating breakthroughs in long-chain oligonucleotide uses to encode and store digital data.
In January, the #MemoriesinDNA Project was launched, aimed at collecting 10,000 original images from around the world to preserve them indefinitely in synthetic DNA manufactured by Twist Bioscience. As part of the project, the public is invited to submit original photographs that they’d like to see preserved in DNA for millennia. The images — which can be uploaded at the project website — will be encoded in synthetic DNA and made available to researchers worldwide. 
DNA, because it is much denser and lasts many orders of magnitude longer than current technologies, holds promise as a revolutionary storage medium.
“The #MemoriesinDNA Project highlights the scientific, technical, and cultural importance of DNA,” said Twist Bioscience Co-Founder and CEO Emily Leproust. “With proof-of-concept achieved for DNA as a digital data storage media, we are working to drive down the cost of synthesizing DNA to enable its potential as a widely-available commercial solution for the growing body of precious data in digital format, such as archival data, financial and health record backups, and all long-term data retention where current media is not practical.” 
The human body—by its very DNA nature—is elegant and complex in ways we can’t begin to imagine, Twist Bioscience Co-Founder and CTO Bill Peck explained. DNA is a perfect storage medium, with far more DNA "code" present in a single human body than the equivalent digital data stored in massive data centers, Peck said. There are 75 trillion cells in the human body, with 6.5 billion base pairs per cell. 
DNA can store information millions of times more compactly in contrast to traditional data centers, which require acres of land and account for nearly 2 percent of the total electricity consumption in the U.S. Theoretically, just one gram of DNA can store almost a zettabyte of digital data — or one trillion gigabytes. Fewer than twenty grams of DNA could store all the digital data in the world.  
What’s more, Peck said DNA is “nature’s information medium.”  Methods developed to synthesize and read DNA at such a large scale needed for data storage will provide benefits that can help sustain our planet.  “Synthetic DNA/synthetic biology is truly playing a major role in the drive to address large data storage issues, as well other applications such as personalized medicine, the security of food supply, and more,” Peck said.
The basic process converts the strings of ones and zeros seen in digital data into the four basic building blocks of DNA sequences—adenine, guanine, cytosine and thymine. Since these sequences are unique, the process requires synthetic DNA created in a lab, not naturally pre-existing DNA sequences.
The #MemoriesinDNA Project is a great way for the public to be part of a revolutionary project that will last a lifetime and beyond. The crowdsourced images—which can be uploaded at the project website—will be encoded in synthetic DNA and made available to researchers worldwide. Part of the project allows people to share their images on social media with the hashtag #MemoriesInDNA and include a story about why the photograph or video is important to them.
“It’s your turn to show us what should be preserved in DNA forever,” said Luis Ceze, professor in the UW’s Paul G. Allen School of Computer Science & Engineering. “We want people to go out and take a picture of something that they want the world to remember — it’s a fun opportunity to send a message to future generations and help our research in the process.”
So far, the team of UW computer scientists and electrical engineers, in collaboration with Microsoft researchers and Twist Bioscience, have been able to encode photographic images in DNA and retrieve and convert those individual molecular “files” back into digital photographs. Their next challenge involves exploring how to perform meaningful data processing directly into DNA—without having to convert the images back into their electronic form.
“Let’s suppose you have a trillion images encoded in DNA and want to find all the photographs that have a red car in them, or to find out whether a person’s face exists in those images,” said Ceze. “We want to be able to do that information processing in DNA directly — to search in a smart way and make the molecules themselves carry out that computer vision work.”
The team previously encoded important compositions in DNA molecules, including The Universal Declaration of Human Rights, the top 100 books of Project Gutenberg, songs from the Montreux Jazz Festival and an OK Go video
“It is thrilling to brig computer science and molecular biology together in this project,” said Karin Straus, Microsoft senior researcher and collaborator. “There has been amazing progress recently in both areas and, when combined, they can be very powerful in tackling problems created by the massive amounts of data we’ve been generating.”  
Going forward, the researchers will employ machine learning to devise methods to map and encode all the visual features contained in a photograph—such as colors, curves, lines and objects — in DNA. The main challenge is doing that in a way that allows scientists to extract similar things and perform meaningful data processing.
“We will use neural networks to explore ways to classify visual patterns in the images and video that we encode in DNA,” said Georg Seelig, UW associate professor of electrical engineering. “For example, are there more red cars than blue cars in a photograph? Or are there people riding bicycles?”
#MemoriesInDNA will provide an important library of images to be encoded in a separately funded project supported by the Defense Advanced Research Projects Agency (DARPA) Molecular Informatics program. UW was recently awarded $6.3 million to accelerate the pace at which data can be encoded in DNA, and to develop new capabilities to process this data through image search and classification. The work will build the foundation on which UW can advance its next-generation work in molecular information processing.  
Those interested in participating in the project can get details about how to upload and share images at the #MemoriesInDNA Project website.
For Peck, the team’s work is just the beginning of the discovery of synthetic DNA’s potential to transform our lives in a variety of fields, such as environmental sustainability.
“The message of synthetic biology,” Peck said, “is that Nature’s biological efficiencies provide  us a much more efficient path to the future, from sustainable agriculture to sustainable energy supply.”
 Image source: NASA