Find instances of motifs using FIMO

FIMO scans input sequences to identify the positions of matches to each input motif. FIMO has no sequence length or motif number restrictions.

runFimo(
  sequences,
  motifs,
  bfile = "motif",
  outdir = "auto",
  parse_genomic_coord = TRUE,
  skip_matched_sequence = FALSE,
  max_strand = TRUE,
  text = TRUE,
  meme_path = NULL,
  silent = TRUE,
  ...
)

Arguments

sequences: path to fasta file, or stringset input.
motifs: path to .meme format file, or universalmotif/universalmotif list input.
bfile: path to background file, or special values: "motif" to use 0-order frequencies contained in the motif, or "uniform" to use a uniform letter distribution. (default: "motif")
outdir: output directory location. Only used if text = FALSE. Default: "auto" to autogenerate directory name. Note: if not using a fasta path as input, this will be a temporary location unless explicity set.
parse_genomic_coord: logical(1) whether to parse genomic position from fasta headers. Fasta headers must be UCSC format positions (ie "chr:start-end"), but base 1 indexed (GRanges format). If names of fasta entries are genomic coordinates and parse_genomic_coord == TRUE, results will contain genomic coordinates of motif matches, otherwise FIMO will return relative coordinates (i.e. positions from 1 to length of the fasta entry).
skip_matched_sequence: logical(1) whether or not to include the DNA sequence of the match. Default: FALSE. Note: jobs will complete faster if set to TRUE. add_sequence() can be used to lookup the sequence after data import if parse_genomic_coord is TRUE, so setting this flag is not strictly needed.
max_strand: if match is found on both strands, only report strand with best match (default: TRUE).
text: logical(1) (default: TRUE). No output files will be created on the filesystem. The results are unsorted and no q-values are computed. This setting allows fast searches on very large inputs. When set to FALSE FIMO will discard 50% of the lower significance matches if >100,000 matches are detected. text = FALSE will also incur a performance penalty because it must first read a file to disk, then read it into memory. For these reasons, I suggest keeping text = TRUE.
meme_path: path to meme/bin/ (optional). Defaut: NULL, searches "MEME_PATH" environment variable or "meme_path" option for path to "meme/bin/".
silent: logical(1) whether to suppress stdout/stderr printing to console (default: TRUE). If the command is failing or giving unexpected output, setting silent = FALSE can aid troubleshooting.
...: additional commandline arguments to fimo. See the FIMO Flag table below.

Value

GRanges object containing positions of each match. Note: if parse_genomic_coords = FALSE, each seqnames entry will be the full fasta header, and start/end will be the relative position within that sequence of the match. The GRanges object has the following additional mcols: * motif_id = primary name of matched motif * motif_alt_id = alternate name of matched motif * score = score of match (higher score is a better match) * pvalue = pvalue of the match * qvalue = qvalue of the match * matched_sequence = sequence that matches the motif

Details

Additional arguments passed to .... See: Fimo web manual for a complete description of FIMO flags.

FIMO Flag	allowed values	default	description
alpha	`numeric(1)`	1	alpha for calculating position-specific priors. Represents fraction of sites that are binding sites of TF of interest. Used in conjunction with `psp`
bfile	"motif", "motif-file", "uniform", file path,	"motif"	If "motif" or "motif-file", use 0-order letter frequencies from motif. "uniform" sets uniform letter frequencies.
max_stored_scores	`integer(1)`	NULL	maximum number of scores to be stored for computing q-values. used when `text = FALSE`, see FIMO webpage for details
motif_pseudo	`numeric(1)`	0.1	pseudocount added to motif matrix
no_qvalue	`logical(1)`	FALSE	only needed when `text = FALSE`, do not compute q-value for each p-value
norc	`logical(1)`	FALSE	Do not score reverse complement strand
prior_dist	file path	NULL	file containing binned distribution of priors
psp	file path	NULL	file containing position specific priors. Requires `prior_dist`
qv_thresh	`logical(1)`	FALSE	use q-values for the output threshold
thresh	`numeric(1)`	`1e-4`	output threshold for returning a match, only matches with values less than `thresh` are returned.

Licensing

The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.

Citation

If you use runFimo() in your analysis, please cite:

Charles E. Grant, Timothy L. Bailey, and William Stafford Noble, "FIMO: Scanning for occurrences of a given motif", Bioinformatics, 27(7):1017-1018, 2011. full text

Examples

if (meme_is_installed()){
# Generate some example input sequences
seq <- universalmotif::create_sequences()
# sequences must have values in their fasta headers
names(seq) <- seq_along(seq)
# Create random example motif to search for
motif <- universalmotif::create_motif()

# Search for motif in sequences
# parse_genomic_coord set to FALSE since fasta headers aren't in "chr:start-end" format.
runFimo(seq, motif, parse_genomic_coord = FALSE)
}
#> GRanges object with 1 range and 6 metadata columns:
#>       seqnames    ranges strand |    motif_id motif_alt_id     score    pvalue
#>          <Rle> <IRanges>  <Rle> | <character>  <character> <numeric> <numeric>
#>   [1]        2     62-71      + |       motif         <NA>   12.6024  2.32e-05
#>          qvalue matched_sequence
#>       <numeric>      <character>
#>   [1]        NA       CGCTGACGGT
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths