Identify motifs with MEME — runMeme • memes

MEME performs de-novo discovery of ungapped motifs present in the input sequences. It can be used in both discriminative and non-discriminative modes.

runMeme(
  input,
  control = NA,
  outdir = "auto",
  alph = "dna",
  parse_genomic_coord = TRUE,
  combined_sites = FALSE,
  silent = TRUE,
  meme_path = NULL,
  ...
)

# S3 method for list
runMeme(
  input,
  control = NA,
  outdir = "auto",
  alph = "dna",
  parse_genomic_coord = TRUE,
  combined_sites = FALSE,
  silent = TRUE,
  meme_path = NULL,
  ...
)

# S3 method for BStringSetList
runMeme(
  input,
  control = NA,
  outdir = "auto",
  alph = "dna",
  parse_genomic_coord = TRUE,
  combined_sites = FALSE,
  silent = TRUE,
  meme_path = NULL,
  ...
)

# S3 method for default
runMeme(
  input,
  control = NA,
  outdir = "auto",
  alph = "dna",
  parse_genomic_coord = TRUE,
  combined_sites = FALSE,
  silent = TRUE,
  meme_path = NULL,
  ...
)

Arguments

input

path to fasta, Biostrings::BStringSet list, or list of Biostrings::BStringSet (can generate using get_sequence())

control

any data type as in input, or a character vector of names(input) to use those regions as control sequences. Using sequences as background requires an alternative objective function. Users must pass a non-default value of objfun to ... if using a non-NA control set (default: NA)

outdir

(default: "auto") Directory where output data will be stored.

alph

one of c("dna", "rna", "protein") or path to alphabet file (default: "dna").

parse_genomic_coord

logical(1) whether to parse genomic coordinates from fasta headers. Requires headers are in the form: "chr:start-end", or will result in an error. Automatically set to FALSE if alph = "protein". This setting only needs to be changed if using a custom-built fasta file without genomic coordinates in the header.

combined_sites

logical(1) whether to return combined sites information (coerces output to list) (default: FALSE)

silent

Whether to suppress printing stdout to terminal (default: TRUE)

meme_path

path to "meme/bin/". If unset, will use default search behavior:

meme_path setting in options()
MEME_PATH setting in .Renviron or .bashrc

...

additional arguments passed to MEME (see below)

Value

MEME results in universalmotif_df format (see: universalmotif::to_df()). sites_hits is a nested data.frame column containing the position within each input sequence of matches to the identified motif.

Details

Note that MEME can take a long time to run. The more input sequences used, the wider the motifs searched for, and the more motifs MEME is asked to discover will drastically affect runtime. For this reason, MEME usually performs best on a few (<50) short (100-200 bp) sequences, although this is not a requirement. Additional details on how data size affects runtime can be found here.

MEME works best when specifically tuned to the analysis question. The default settings are unlikely to be ideal. It has several complex arguments documented here, which runMeme() accepts as R function arguments (see details below).

If discovering motifs within ChIP-seq, ATAC-seq, or similar peaks, MEME may perform best if using sequences flaking the summit (the site of maximum signal) of each peak rather than the center. ChIP-seq or similar data can also benefit from setting revcomp = TRUE, minw = 5, maxw = 20. For more tips on using MEME to analyze ChIP-seq data, see the following tips page.

Additional arguments

runMeme() accepts all valid arguments to meme as arguments passed to .... For flags without values, pass them as flag = TRUE. The dna, rna, and protein flags should instead be passed to the alph argument of runMeme(). The arguments passed to MEME often have many interactions with each other, for a detailed description of each argument see MEME Commandline Documentation.

Citation

If you use runMeme() in your analysis, please cite:

Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. pdf

Licensing

The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.

Examples

if (meme_is_installed()) {
seqs <- universalmotif::create_sequences("CCRAAAW", seqnum = 4)
names(seqs) <- 1:length(seqs)
runMeme(seqs, parse_genomic_coord = FALSE)

}
#>          motif     name altname consensus alphabet strand  icscore nsites  eval
#> 1 <mot:CCAC..> CCACAAAC  MEME-1  CCAMAAAC      DNA      + 5.897346      2 40000
#>   type pseudocount                                bkg width
#> 1  PPM           1 0.52200, 0.46800, 0.00488, 0.00488     8
#>                                          sites_hits
#> 1 3, 2, 10, 18, 0.00356, 0.0101, CCAWAAAC, CCACAWRC
#> 
#> [Hidden empty columns: family, organism, bkgsites, pval, qval.]