The findMiRNA and precExtract Software
Source code for both programs findMiRNA and precExtract are available for download (updated for Fedora core 4). They were developed and tested under Red Hat Linux 9, with the GNU C++ compiler version 3.X. You may have problems with other compilers. All source code files fall under the terms of the GNU General Public License. By downloading the source code you agree to the terms of that license. You will need the Vienna RNA Package in order to compile this software. You can retrieve the source code from the original site, or as an RPM from BioLinux.
1/13/2005 Some cosmetic and organizational changes to the source code, and incorporation of another supplemental program, hashedPrecExtract. No changes to functionality were made so results should be identical.
Notes on use:
At the time findMiRNA was developed, high through put sequencing was still under development and thus genome wide large scale direct evidence of miRNAs was not available. For this reason findMiRNA was the only way to see any potential miRNAs and potential targets and the regions of the genome that might contain them. The specificity of findMiRNA is very low in relation to it’s ability to distinguish real miRNAs from potential miRNAs, but at the time was a starting point in the absence of an alternative. In excess of 99.9% of the candidates identified by findMiRNA will be false positives. The application of an enrichment procedure on the results of findMiRNA is essential but nevertheless insufficient from the point of view of miRNA discovery, and it is for this reason that we opted for displaying the resulting miRNA precursor candidates on a genome browser for the community to browse in relation to their own secondary sources of information. For example, a non-protein coding transcript may have been identified as coming from a region for which a miRNA precursor candidate has been predicted, and conservation of a target site across multiple members of a target gene family may be evident. In the absence of other information, findMiRNA is not particularly useful as a tool for predicting miRNAs/precursors. The results that are displayed in the genome browser on our website have been filtered based on known criteria for miRNA-target pairs, however this filtering is not part of findMiRNA and would need to be performed by a separate post-processing step.
In light of the limitations of findMiRNA output, you need to consider how you are going to deal with the large amounts of resulting potential miRNA-target pair data.
Some options after basic filtering are:-
Whole Genome Scan
findmirna_results.gz - (2.1Gb Uncompressed) - All predictions for the entire Arabidopsis thaliana genome. There is also a README explaining the findMiRNA output format. The raw data is available in a gzipped format (uncompress with gzip).
Exploratory Scan - Ranked Clusters
1599.ranked_clusters.gz - The 1599 ranked clusters generated from the exploratory scan of 5701 transcripts. Potential precursor sequences that were predicted to target the same region of a transcript and had candidate miRNA sequences on the same arm of the hairpin were grouped into sequence families. These sequence families were ranked using a formula that takes advantage of the characteristic pattern of sequence conservation observed for miRNA precursor families in which the miRNA/miRNA* regions are more conserved than the interstice (region between miRNA/miRNA* sequences). Known miRNA precursor families were shown to be enriched towards the top of these ranked clusters.