Sylvan's DM2572 learning portfolio: Pre-Theme 3: Research and theory

Journal

I found the journal Nature.com from the publisher Nature Publishing Group. It has an impressive impact factor of 38.597.

The main categories, chemistry, clinical, environment, life sciences and physics of this journal do not qualify as relevant to media technology. However, I found a very interesting article on applied genetics that converges into an important part of media technology, thus I find the journal relevant.

Paper

I've chosen a paper with the title Towards practical, high-capacity, low-maintenance information storage in synthesized DNA, found in the Nature journal. It is a multidisciplinary study on how synthetic DNA can be used as a bearer of digital information. The hypothesis is that synthesized DNA would be suitable for carrying large amounts of digital data, and for long term storage.

Problem and background

The main problem is described as the great increase in production of digital media leading to an increasingly complex and maintenance intensive task of archiving the same.

It is already known that DNA stores genetic information, and the technology to sequence synthetic DNA is available. Storing data in DNA has been done before, the main problem seems to be to synthesize long sequences of DNA to an exact design. I think that there is a very logical argument here, for if DNA cannot store an exact representation of binary data, then the data stored will become corrupt.

Research design

This research is applied since it clearly utilizes known technology and previous studies in DNA and data storage. It's a proof-of-concept by selecting a number of digital media files such as texts, audio and picture to be stored in DNA. I find this an excellent method.

The generated DNA was shipped from USA to Germany without any special packaging to prove the durability and it's long term storage capabilities. One flaw here is that there is no data about the environment during transport and how it would or would not affect the DNA.

It is explained that a Illumina HiSeq 2000 system was used to re-read the DNA in order to decode it. It's unclear which equipment produced the DNA.

Findings, discussion and conclusions

Encoding of data files into DNA is clearly explained with comprehensible diagrams. It is also visualized how the DNA then is stored in DNA fragments and how every second fragment is a reverse complement of the previous for redundancy.

Not knowing much about genetics, I could still comprehend the research document. One should understand the concept of base calling in DNA research, the design phase in sequencing.

The method used to overcome the difficulty in sequencing long strings of DNA without error was to produce segments of the same information that overlaps the other segments of the same information, creating redundancy. The methods seems relevant to the problem.

Diagrams are presented for efficiency, base error and relative cost.

While the efficiency diagram clearly explains the relation between amount of data and current synthesis costs, it is unclear how the projected future costs have been obtained.

The base error data is obtained both from the empirical analysis of 5 shipped DNAs and a theoretical model. One of the DNAs contained errors but was manually repaired and used anyway.

The samples match the theoretical plot.

Regarding cost effectiveness, The study gives an application example where CERN currently has 80PB of data, produces 15PB each year while only 10% can be stored on disk. The capacity problem seems possible to solve with the DNA storage.

It is proposed that DNA storage could be cost effective with a breathtaking price tag of $12,400 per MB! A quick look reveals the average cost per MB in 2013 is $0.05. DNA would be 248000 times more expensive than HDDs. The study compares relative cost of long term storage. Since traditional disks needs to transfer data in order to keep the information intact, the argument becomes slightly more relevant.

The conclusions are that DNA storage could be feasible for archiving huge amounts of data while being less expensive over time compared to traditional storage.

Weak points: assumptions of future cost for DNA synthesis, DNA read and write time has been completely overlooked, and is way too long.

Future research

The results are relevant for further research of new storage technology, but currently not of practical use. Reducing access time and a report on how DNA synthesis can become cheaper should be the next steps.

---

What theory Is

Theory is a way of thinking systematically in order to accumulate knowledge.

A theory can be constructed with statements backed up by facts.

There should be consensus in the construction of the theory.

Theories can be used to analyze, explain, enlighten about and predict concepts.

What it is Not

Collected data, hypotheses and references to other works. These are components in a theory but are not theories by themselves.

Theories in my selected paper

My selected paper is a mix of prediction & Design and action theories. This is because it describes the current situation, proposes a future solution and at the same time at least partially explains how to achieve the solution.

Benefits and limitations of using the selected theory or theories

The benefits of using these two theories is that there is a concrete problem to be solved, it can easily be described and there are available technologies that can be developed further to become a solution to the problem. Using the selected theories makes it easy for other researchers to continue where this paper came to its conclusion.

Sylvan's DM2572 learning portfolio

fredag 22 november 2013

Pre-Theme 3: Research and theory