STREME discovers short, ungapped, *de-novo* motifs that are enriched or relatively enriched relative to a control set of sequences. STREME can be run to discover motifs relative to a shuffled set of input sequences, against a separately provided set of "control" sequences, or to determine whether motifs are centrally enriched within input sequences.

runStreme(
  input,
  control,
  outdir = "auto",
  objfun = "de",
  alph = "dna",
  meme_path = NULL,
  silent = TRUE,
  ...
)

Arguments

input

regions to scan for motifs. If using `objfun = "cd"` to test for centrally enriched motifs, be sure to include sufficient flanking sequence (e.g. +/- 500bp) for an accurate estimate. Can be any of: - path to fasta file - DNAStringSet object (can be generated from GRanges using `get_sequence()`) - List of DNAStringSet objects (generated from `get_sequence()`) - *NOTE:* if using StringSet inputs, each entry must be named (set with `names()`). - *NOTE:* If you want to retain the raw streme output files, you must use a path to fasta file as input, or specify an "outdir"

control

regions to use as background for motif search. These should have a similar length distribution as the input sequences. Can be any of: - path to fasta file - DNAStringSet object (can be generated from GRanges using get_sequence) - A Biostrings::BStringSetList (generated using `get_sequence`), in which case all sequences in the list will be combined as the control set. - if `input` is a list of DNAStringSet objects, a character vector of names in `input` will use those sequences as background. runstreme will not scan those regions as input. - "shuffle" to use streme's built-in dinucleotide shuffle feature (NOTE: if `input` is a list object with an entry named "shuffle", the list entry will be used instead). Optionally can also pass `seed = <any number>` to `...` to use as the random seed during shuffling. If no seed is passed, streme will use 0 as the random seed, so results will be reproducible if rerunning.

outdir

path to output directory of streme files, or "auto" to autogenerate path. Default: location of input fasta in dir named "\<input\>_vs_\<control\>". If input is DNAstringset, will be temporary path. This means that if you want to save the raw output files, you must use fasta files as input or use an informative (and unique) outdir name. memes will **not check** if it overwrites files in a directory. Directories will be recursively created if needed. (default: "auto")

objfun

one of c("de", "cd"). Default: "de" for differential enrichment. "cd" for central distance (control must be set to NA for "cd").

alph

one of c("dna", "rna", "protein") or a path to a MEME format alph file. (default: "dna")

meme_path

path to "meme/bin"

silent

Whether to suppress printing stdout & stderr to console (default: TRUE). Warnings are always printed regardless of this setting.

...

pass any commandline options as R function arguments. For a complete list of STREME options, see [the STREME manual](https://meme-suite.org/meme/doc/streme.html).

Value

a `universalmotif_df` of STREME Motifs

Details

Properly setting the `control` parameter is key to discovering biologically relevant motifs. Often, using `control = "shuffle"` will produce a suboptimal set of motifs; however, some discriminative analysis designs don't have proper "control" regions other than to shuffle.

If you have fewer than 50 sequences, consider using [runMeme()] instead.

# Citation

If you use `runStreme()` in your analysis, please cite:

Timothy L. Bailey, "STREME: Accurate and versatile sequence motif discovery", Bioinformatics, 2021. https://doi.org/10.1093/bioinformatics/btab203

# Licensing The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the [MEME Suite Copyright Page](http://meme-suite.org/doc/copyright.html) for details.

See also

`?universalmotif::tidy-motifs`