New computational tool
Regulatory patterns in genomic sequences
The human genome contains 20,000 genes, around the same number as the tiny roundworm C. elegans has. The reason for the difference in complexity lies not in our genes but in the regulatory sequences that control their expression. These regions encode, for instance, the complex networks which specify the developmental programs that lead from a single fertilized cell to a human being. To understand how such regions encode regulatory programs we need to find out when and how regulatory factors bind to them.
A group led by LMU bioinformatician Johannes Söding has now developed a new computational approach to analyze the regulation of gene expression. Most experimental methods used to probe the regulation of gene expression so far result in a list of sequences regulated by one or several factors. Each such factor binds specifically to a group of similar genomic sequences according to its intrinsic binding preferences. These preferences can be described by so-called positional weight matrices (PWMs).
Patterns in the noise
However, to computationally discover the binding preferences of factors, PWMs are problematic because of their infinite number and because it is time-consuming to calculate the statistical significance of their enrichment in the input sequences. Therefore, Söding developed a new method for finding patterns in sequence data. "We have found a way to directly optimize the statistical significance of PWMs", says Söding. And the approach has paid off: The new method represents a considerable improvement over state-of-the-art tools in terms of its ability to discover weak patterns and in the accuracy of the PWMs generated.
The scientists have also applied their method to human core promoter sequences - the short regions upstream of genes where transcription starts - and found a number of novel patterns that are expected to be important determinants of regulatory properties. "We hope that our method will contribute to a better understanding of the regulation of our genomes by sequence-specific DNA and RNA binding factors", Söding concludes. (Genome Research, September 2012) göd
- Press Release 25.12.2011:
Faster, more accurate, more sensitive - Improved method for protein sequence comparisons
- Press Release 23.02.2009:
Relationships in Rank and File - Better sequence searches of genes and proteins