FIMO scans input sequences to identify the positions of matches to each input motif. FIMO has no sequence length or motif number restrictions.

runFimo(
  sequences,
  motifs,
  bfile = "motif",
  outdir = "auto",
  parse_genomic_coord = TRUE,
  skip_matched_sequence = FALSE,
  max_strand = TRUE,
  text = TRUE,
  meme_path = NULL,
  silent = TRUE,
  ...
)

Arguments

sequences

path to fasta file, or stringset input.

motifs

path to .meme format file, or universalmotif/universalmotif list input.

bfile

path to background file, or special values: "motif" to use 0-order frequencies contained in the motif, or "uniform" to use a uniform letter distribution. (default: "motif")

outdir

output directory location. Only used if text = FALSE. Default: "auto" to autogenerate directory name. Note: if not using a fasta path as input, this will be a temporary location unless explicity set.

parse_genomic_coord

logical(1) whether to parse genomic position from fasta headers. Fasta headers must be UCSC format positions (ie "chr:start-end"), but base 1 indexed (GRanges format). If names of fasta entries are genomic coordinates and parse_genomic_coord == TRUE, results will contain genomic coordinates of motif matches, otherwise FIMO will return relative coordinates (i.e. positions from 1 to length of the fasta entry).

skip_matched_sequence

logical(1) whether or not to include the DNA sequence of the match. Default: FALSE. Note: jobs will complete faster if set to TRUE. add_sequence() can be used to lookup the sequence after data import if parse_genomic_coord is TRUE, so setting this flag is not strictly needed.

max_strand

if match is found on both strands, only report strand with best match (default: TRUE).

text

logical(1) (default: TRUE). No output files will be created on the filesystem. The results are unsorted and no q-values are computed. This setting allows fast searches on very large inputs. When set to FALSE FIMO will discard 50% of the lower significance matches if >100,000 matches are detected. text = FALSE will also incur a performance penalty because it must first read a file to disk, then read it into memory. For these reasons, I suggest keeping text = TRUE.

meme_path

path to meme/bin/ (optional). Defaut: NULL, searches "MEME_PATH" environment variable or "meme_path" option for path to "meme/bin/".

silent

logical(1) whether to suppress stdout/stderr printing to console (default: TRUE). If the command is failing or giving unexpected output, setting silent = FALSE can aid troubleshooting.

...

additional commandline arguments to fimo. See the FIMO Flag table below.

Value

GRanges object containing positions of each match. Note: if parse_genomic_coords = FALSE, each seqnames entry will be the full fasta header, and start/end will be the relative position within that sequence of the match. The GRanges object has the following additional mcols: * motif_id = primary name of matched motif * motif_alt_id = alternate name of matched motif * score = score of match (higher score is a better match) * pvalue = pvalue of the match * qvalue = qvalue of the match * matched_sequence = sequence that matches the motif

Details

Additional arguments passed to .... See: Fimo web manual for a complete description of FIMO flags.

FIMO Flagallowed valuesdefaultdescription
alphanumeric(1)1alpha for calculating position-specific priors. Represents fraction of sites that are binding sites of TF of interest. Used in conjunction with psp
bfile"motif", "motif-file", "uniform", file path,"motif"If "motif" or "motif-file", use 0-order letter frequencies from motif. "uniform" sets uniform letter frequencies.
max_stored_scoresinteger(1)NULLmaximum number of scores to be stored for computing q-values. used when text = FALSE, see FIMO webpage for details
motif_pseudonumeric(1)0.1pseudocount added to motif matrix
no_qvaluelogical(1)FALSEonly needed when text = FALSE, do not compute q-value for each p-value
norclogical(1)FALSEDo not score reverse complement strand
prior_distfile pathNULLfile containing binned distribution of priors
pspfile pathNULLfile containing position specific priors. Requires prior_dist
qv_threshlogical(1)FALSEuse q-values for the output threshold
threshnumeric(1)1e-4output threshold for returning a match, only matches with values less than thresh are returned.

Licensing

The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.

Citation

If you use runFimo() in your analysis, please cite:

Charles E. Grant, Timothy L. Bailey, and William Stafford Noble, "FIMO: Scanning for occurrences of a given motif", Bioinformatics, 27(7):1017-1018, 2011. full text

Examples

if (meme_is_installed()){
# Generate some example input sequences
seq <- universalmotif::create_sequences()
# sequences must have values in their fasta headers
names(seq) <- seq_along(seq)
# Create random example motif to search for
motif <- universalmotif::create_motif()

# Search for motif in sequences
# parse_genomic_coord set to FALSE since fasta headers aren't in "chr:start-end" format.
runFimo(seq, motif, parse_genomic_coord = FALSE)
}
#> GRanges object with 1 range and 6 metadata columns:
#>       seqnames    ranges strand |    motif_id motif_alt_id     score    pvalue
#>          <Rle> <IRanges>  <Rle> | <character>  <character> <numeric> <numeric>
#>   [1]        2     62-71      + |       motif         <NA>   12.6024  2.32e-05
#>          qvalue matched_sequence
#>       <numeric>      <character>
#>   [1]        NA       CGCTGACGGT
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths