A duo of scientists at Penn State
University has achieved a major milestone in understanding how genomic
"dark matter" originates. This "dark matter" -- called non-coding RNA --
does not contain the blueprint for making proteins and yet it comprises
more than 95 percent of the human genome. The researchers have
discovered that essentially all coding and non-coding RNA originates at
the same types of locations along the human genome. The team's findings
eventually may help to pinpoint exactly where complex-disease traits
reside, since the genetic origins of many diseases reside outside of the
coding region of the genome.
The research, which will be published as an Advance Online Publication in the journal Nature on 18 September 2013,
was performed by B. Franklin Pugh, holder of the Willaman chair in
Molecular Biology at Penn State, and postdoctoral scholar Bryan Venters,
who now holds a faculty position at Vanderbilt University.
In their research, Pugh and Venters
set out to identify the precise location of the beginnings of
transcription -- the first step in the expression of genes into
proteins. "During transcription, DNA is copied into RNA -- the
single-stranded genetic material that is thought to have preceded the
appearance of DNA on Earth -- by an enzyme called RNA polymerase and,
after several more steps, genes are encoded and proteins eventually are
produced," Pugh explained. He added that, in their quest to learn just
where transcription begins, other scientists had looked directly at RNA.
However, Pugh and Venters instead determined where along human
chromosomes the proteins that initiate transcription of the non-coding
RNA were located.
"We took this approach because so
many RNAs are rapidly destroyed soon after they are made, and this makes
them hard to detect," Pugh said. "So rather than look for the RNA
product of transcription we looked for the 'initiation machine' that
makes the RNA. This machine assembles RNA polymerase, which goes on to
make RNA, which goes on to make a protein." Pugh added that he and
Venters were stunned to find 160,000 of these "initiation machines,"
because humans only have about 30,000 genes. "This finding is even more
remarkable, given that fewer than 10,000 of these machines actually were
found right at the site of genes. Since most genes are turned off in
cells, it is understandable why they are typically devoid of the
initiation machinery."
The remaining 150,000 initiation
machines -- those Pugh and Venters did not find right at genes --
remained somewhat mysterious. "These initiation machines that were not
associated with genes were clearly active since they were making RNA and
aligned with fragments of RNA discovered by other scientists," Pugh
said. "In the early days, these fragments of RNA were generally
dismissed as irrelevant since they did not code for proteins." Pugh
added that it was easy to dismiss these fragments because they lacked a
feature called polyadenylation -- a long string of genetic material,
adenosine bases -- that protect the RNA from being destroyed. Pugh and
Venters further validated their surprising findings by determining that
these non-coding initiation machines recognized the same DNA sequences
as the ones at coding genes, indicating that they have a specific origin
and that their production is regulated, just like it is at coding
genes.
"These non-coding RNAs have been
called the 'dark matter' of the genome because, just like the dark
matter of the universe, they are massive in terms of coverage -- making
up over 95 percent of the human genome. However, they are difficult to
detect and no one knows exactly what they all are doing or why they are
there," Pugh said. "Now at least we know that they are real, and not
just 'noise' or 'junk.' Of course, the next step is to answer the
question, 'what, in fact, do they do?'"
Pugh added that the implications of
this research could represent one step towards solving the problem of
"missing heritability" -- a concept that describes how most traits,
including many diseases, cannot be accounted for by individual genes and
seem to have their origins in regions of the genome that do not code
for proteins. "It is difficult to pin down the source of a disease when
the mutation maps to a region of the genome with no known function,"
Pugh said. "However, if such regions produce RNA then we are one step
closer to understanding that disease."
No comments:
Post a Comment