TITLE: Randomness of DNA Sequences and Evolutionary Relationships
AUTHOR: Yuriy Brun
SCHOOL: Massachusetts Academy of Math and Science
SCHOOL ADDRESS: 100 Institute Road, Worcester, MA. 01609
I wanted to determine how random DNA of various organisms is. The primary belief in the field states that when life originated, DNA formed a short string which copied itself over and over again attaching these pieces end to end. The formed strand was not random at all since it had a major repeating pattern. As time went by and organisms genetically advanced, mutations caused the DNA to drift farther and farther from its original state, becoming more and more random. Based on this theory, I reasoned and later hypothesized that the genome of an organism located later on the evolutionary tree must be more random than that of an earlier organism.
I defined randomness largely based on the definition information theory gives it . First I proved that a Markov model of a stochastic process is better for DNA sequences than the conventionally used Bernoulli model. I found the degree of randomness of genomes of 8 bacteria using two measurements: information entropy which produces a score between 0 and 1 and Km factor (used in statistical analysis of synthetic polymers) which produces a score between 0 and 2, providing insight into randomness as well as structure of DNA.
I analyzed the 8 bacteria and 6 yeast genes. Based on the different scores I was able to rate the organisms and genes in order of evolutionary development. The technique I developed and the program I wrote can be used to identify evolutionary relationships as well as ages of organisms.