The Human Genome Project international consortium announced the publication of a draft sequence and initial analysis of the human genome-the genetic blueprint for a human being. The paper appears in the Feb. 15 issue of the journal Nature.
Rockville, MD-based Celera Genomics (Nasdaq: CRA) announced that it had published its human genome sequence in the journal Science. The company used a combination of its own data and the consortium's data, available freely online, to assemble its sequence.
In a related development, Celera announced on February 12 that the company had also sequenced the mouse genome.
The draft sequence reported by the international consortium covers more than 90 percent of the human genome. The consortium's initial analysis of this text is scientists' first global view of the human genomic landscape, with its extraordinary trove of information about human development, physiology, medicine and evolution.
Among the highlights:
? The distribution of genes on mammalian chromosomes is striking. It turns out that our chromosomes have crowded urban centers with many genes in close proximity to one another, and also vast expanses of unpopulated desert with only non-coding "junk" DNA.
This distribution of genes is in marked contrast to the genomes of many other organisms, such as the mustard weed, the worm and the fly. Their genomes, more closely resemble uniform, sprawling suburbs, with genes relatively evenly spaced throughout.
? Though a definitive count of human genes must await further experimental and computational analysis, scientists now estimate that humans have some 30,000-35,000 genes in their genomes. This new estimate indicates that humans have only about twice as many genes as the worm or the fly.
How can human complexity be explained by a genome with such a paucity of genes? It turns out that humans are very thrifty with their genes, able to do more with what they have than other species. Instead of producing only one protein per gene, the average human gene produces three different proteins.
There may be wide disagreement within the scientific community about the number of human genes. As reported in the New York Times, William Haseltine of Human Genome Sciences (Rockville, MD) believes there there are 120,000 genes, while Incyte (Palo Alto, CA) claims there are at least 140,000 genes.
? The full set of proteins encoded by the human genome is more complex than those of invertebrates because humans and other vertebrates have rearranged old protein domains into a rich collection of new architectures.
? Scientists have identified more than 200 genes in the human genome whose closest relatives are in bacteria. Analogous genes are not found in invertebrates, such as the worm, fly and yeast.
This suggests that these genes were acquired at a more recent evolutionary past, perhaps after the emergence of vertebrates. Scientists didn't find any single bacterial source for the transferred genes, indicating that several independent genes were acquired from different bacteria.
? Our junk DNA, characterized by long stretches of repeating sequences, represents a rich fossil record of clues to our evolutionary past. It is possible to date groups of so-called "repeats" to when in the evolutionary process they were "born" and to follow their fates in different regions of the genome and in different species.
Scientists used 3 million such repeating elements as dating tools. Based on such "DNA dating," scientists can build family trees of the repeats, showing exactly where they came from and when. These repeats have reshaped the genome by rearranging it, creating entirely new genes, and modifying and reshuffling existing genes.
? We have a greater percentage of junk DNA in our genomes -- 50% -- than the mustard weed (11%), the worm (7%), or the fly (3%). Also, shockingly, there seems to have been a dramatic decrease in the activity of repeats in the human genome over the past 50 million years -- as if the human species decided 50 million years ago to stop collecting junk. In contrast, there seems to be no such decline in repeats in rodents.
? By dating the 3 million repeat elements and examining the pattern of interspersed repeats on the Y chromosome, scientists estimated the relative mutation rates in the X and the Y chromosome and in the male and female germ lines.
They found that the ratio of mutations in males versus females is 2:1. Scientists point to several possible reasons for the higher mutation rate in the male germ line, including the fact that there are a greater number of cell divisions involved in the formation of sperm than in the formation of eggs.
? In a companion volume to the Book of Life, scientists have created a catalogue of 1.4 million single-letter differences, or single nucleotide polymorphisms (SNPs)-and specified their exact location in the human genome. This SNP map, the word's largest publicly available catalogue of SNPs, promises to revolutionize both mapping diseases and tracing human history.
The sequence information from the consortium has been immediately and freely released to the world, with no restrictions on its use or redistribution. The information is scanned daily by scientists in academia and industry, as well as by commercial database companies, providing key information services to biotechnologists. Already, many tens of thousands of genes have been identified from the genome sequence, including more than 30 that play a direct role in human disease.
The scientific work reported here will serve as a basis for research and discovery in the coming decades. Such research will have profound long-term consequences for medicine. It will help elucidate the underlying molecular mechanisms of disease. This in turn will allow researchers to design better drugs and therapies for many illnesses.
The consortium's ultimate goal is to produce a completely "finished" sequence-with no gaps and 99.99% accuracy. Although the near-finished version is adequate for most biomedical research, the HGP has made a commitment to filling all gaps and resolving all ambiguities in the sequence by 2003.
Production of genome sequence has skyrocketed over the past year, with more than 90% of the sequence having been produced in the past 15 months alone. Sequencing the human genome was expedited by technological advances in deciphering DNA and the collaborative nature of the effort, which has drawn upon the talents of about 1,000 scientists worldwide.
The international Human Genome Sequencing Consortium includes scientists at 20 institutions in France, Germany, Japan, China, Great Britain and the United States.
The project is funded by grants from government agencies and public charities in the various countries. These include the National Human Genome Research Institute at the US National Institutes of Health, the Wellcome Trust in England, and the US Department of Energy, as well as agencies in Japan, France, Germany and China.
The total cost for Phase One ("working draft") is approximately $300 million worldwide, with roughly half ($150 million) being funded by the US National Institutes of Health.
The Human Genome Project is sometimes reported to have a cost of $3 billion. However, this figure refers to the total projected funding over a 15-year period (1990-2005) for a wide range of scientific activities related to genomics. Human genome sequencing represents only a small fraction of the overall 15-year budget.