FIMO scans input sequences to identify the positions of matches to each input motif. FIMO has no sequence length or motif number restrictions.
runFimo(
sequences,
motifs,
bfile = "motif",
outdir = "auto",
parse_genomic_coord = TRUE,
skip_matched_sequence = FALSE,
max_strand = TRUE,
text = TRUE,
meme_path = NULL,
silent = TRUE,
...
)
path to fasta file, or stringset input.
path to .meme format file, or universalmotif/universalmotif list input.
path to background file, or special values: "motif" to use 0-order frequencies contained in the motif, or "uniform" to use a uniform letter distribution. (default: "motif")
output directory location. Only used if text = FALSE. Default: "auto" to autogenerate directory name. Note: if not using a fasta path as input, this will be a temporary location unless explicity set.
logical(1)
whether to parse genomic position
from fasta headers. Fasta headers must be UCSC format positions (ie
"chr:start-end"), but base 1 indexed (GRanges format). If names of fasta
entries are genomic coordinates and parse_genomic_coord == TRUE, results
will contain genomic coordinates of motif matches, otherwise FIMO will return
relative coordinates (i.e. positions from 1 to length of the fasta entry).
logical(1)
whether or not to include the DNA
sequence of the match. Default: FALSE
. Note: jobs will complete faster if
set to TRUE
. add_sequence()
can be used to lookup the sequence after data import if
parse_genomic_coord
is TRUE
, so setting this flag is not strictly needed.
if match is found on both strands, only report strand with best match (default: TRUE).
logical(1)
(default: TRUE
). No output files will be created
on the filesystem. The results are unsorted and no q-values are computed.
This setting allows fast searches on very large inputs. When set to FALSE
FIMO will discard 50% of the lower significance matches if >100,000 matches are
detected. text = FALSE
will also incur a performance penalty because it
must first read a file to disk, then read it into memory. For these reasons,
I suggest keeping text = TRUE
.
path to meme/bin/
(optional). Defaut: NULL
, searches
"MEME_PATH" environment variable or "meme_path" option for path to "meme/bin/".
logical(1)
whether to suppress stdout/stderr printing to
console (default: TRUE). If the command is failing or giving unexpected
output, setting silent = FALSE
can aid troubleshooting.
additional commandline arguments to fimo. See the FIMO Flag table below.
GRanges object containing positions of each match. Note: if
parse_genomic_coords = FALSE
, each seqnames
entry will be the full fasta
header, and start/end will be the relative position within that sequence of the
match. The GRanges object has the following additional mcols
:
* motif_id = primary name of matched motif
* motif_alt_id = alternate name of matched motif
* score = score of match (higher score is a better match)
* pvalue = pvalue of the match
* qvalue = qvalue of the match
* matched_sequence = sequence that matches the motif
Additional arguments passed to ...
. See: Fimo web manual
for a complete description of FIMO flags.
FIMO Flag | allowed values | default | description |
alpha | numeric(1) | 1 | alpha for calculating position-specific priors. Represents fraction of sites that are binding sites of TF of interest. Used in conjunction with psp |
bfile | "motif", "motif-file", "uniform", file path, | "motif" | If "motif" or "motif-file", use 0-order letter frequencies from motif. "uniform" sets uniform letter frequencies. |
max_stored_scores | integer(1) | NULL | maximum number of scores to be stored for computing q-values. used when text = FALSE , see FIMO webpage for details |
motif_pseudo | numeric(1) | 0.1 | pseudocount added to motif matrix |
no_qvalue | logical(1) | FALSE | only needed when text = FALSE , do not compute q-value for each p-value |
norc | logical(1) | FALSE | Do not score reverse complement strand |
prior_dist | file path | NULL | file containing binned distribution of priors |
psp | file path | NULL | file containing position specific priors. Requires prior_dist |
qv_thresh | logical(1) | FALSE | use q-values for the output threshold |
thresh | numeric(1) | 1e-4 | output threshold for returning a match, only matches with values less than thresh are returned. |
The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.
If you use runFimo()
in your analysis, please cite:
Charles E. Grant, Timothy L. Bailey, and William Stafford Noble, "FIMO: Scanning for occurrences of a given motif", Bioinformatics, 27(7):1017-1018, 2011. full text
if (meme_is_installed()){
# Generate some example input sequences
seq <- universalmotif::create_sequences()
# sequences must have values in their fasta headers
names(seq) <- seq_along(seq)
# Create random example motif to search for
motif <- universalmotif::create_motif()
# Search for motif in sequences
# parse_genomic_coord set to FALSE since fasta headers aren't in "chr:start-end" format.
runFimo(seq, motif, parse_genomic_coord = FALSE)
}
#> GRanges object with 1 range and 6 metadata columns:
#> seqnames ranges strand | motif_id motif_alt_id score pvalue
#> <Rle> <IRanges> <Rle> | <character> <character> <numeric> <numeric>
#> [1] 2 62-71 + | motif <NA> 12.6024 2.32e-05
#> qvalue matched_sequence
#> <numeric> <character>
#> [1] NA CGCTGACGGT
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths