The Probability of Information -- Part 2
In my first blog entry about the probability of information I used the English language to show that the probability of generating a significant amount of information randomly is essentially 0. It’s important to note that the example is not just an analogy; it illustrates a fact that applies to all kinds of information – including DNA. The calculation may vary depending on the type of information but the end result will always be the same. I only chose English because it is a type of information we can all understand. However in this blog entry I want to talk about the properties of DNA that make it so improbable that a randomly generated sequence of DNA would represent a life form.
Coding segments of DNA consist of a sequence of codons (nucleotide-triplets). There are 64 (4 nucleotide types to the third power) possible codon types. Of these codon types there is one start codon and 3 stop codons. All coding segments of DNA begin with a start codon and they end when the first stop codon is encountered. There may be some start codons between the initial start codon and the end codon but they don’t matter. The length of a coding sequence is determined by the distance between the first start codon and the first stop codon. Given that information, what would be the average length of a randomly generated coding sequence? It would be 22.333 = 1 + 64/3. The 1 is for the start codon and 64/3 for the stop codons.
Above is a Silverlight application that I wrote to generate codon sequences. It uses the 64 characters below to represent the 64 codon types. A is used for the start codon and . ? ! are used for the stop codons.
Click the Generate Codon Sequence button. It will generate a random codon sequence and display it. Click it several times and note that it will keep track of the length of the longest codon generated and compute the average length of all the codon sequences generated. Try clicking this button for a few minutes and see if a codon sequence longer than 300 pops up. If you get tired click the button to generate 1 million codons. The application will display the longest codon sequence generated and its length.
Human DNA contains thousands of coding sequences. The average length of a human coding sequence is about 3000 bases, or approximately 1000 codons. The longest coding sequence in human DNA is over 2.4 million bases or 800,000 codons. Click the Generate 1 Million Sequences button until you get one over 500. Well on second thought that would be a waste of time. Click the Generate Codon Sequences Perpetually button and let the program run. Come back tomorrow and look to see if you get one.
What are the chances that a randomly generated coding sequence would be longer than 1000 codons? About 1 in (64/61)^1000, that is, about one in 7.1e+20 . So what is the probability that a codon sequence could be generated randomly that would be of use to a life form? It’s really impossible to calculate. We have shown that the probability of randomly generating an average length coding sequence are low and we haven't even begin to talk about what properties would need to be satisfied for the sequence to be useful for a viable life form. If we use the example in my first blog entry on the Probability of Information as a basis for comparison it’s like we haven't even checked to see if property 1 is satisfied yet.
Someone is going to argue that natural selection can help us to generate a viable coding sequence. My question is how? Natural selection works on traits that give life forms an advantage in their environment. A minimum a trait would be a by-product of several genes working together to give a life form some advantage. In other words, natural selection can’t begin to work until after viable coding sequences already exist.
ABOUT THE PROGRAM
One of the annoying things about Silverlight is that it doesn’t support DoEvents like Windows applications do. The source code of this Silverlight program demonstrates how to simulate the DoEvents method in Silverlight using a Thread and Dispatcher. This allows the program to update the UI and process the STOP button while running in a tight loop to generate codon sequences perpetually.
CodonSequenceGenerator Source Code
Human Genome Project