MEME performs de-novo discovery of ungapped motifs present in the input sequences. It can be used in both discriminative and non-discriminative modes.
runMeme(
input,
control = NA,
outdir = "auto",
alph = "dna",
parse_genomic_coord = TRUE,
combined_sites = FALSE,
silent = TRUE,
meme_path = NULL,
...
)
# S3 method for list
runMeme(
input,
control = NA,
outdir = "auto",
alph = "dna",
parse_genomic_coord = TRUE,
combined_sites = FALSE,
silent = TRUE,
meme_path = NULL,
...
)
# S3 method for BStringSetList
runMeme(
input,
control = NA,
outdir = "auto",
alph = "dna",
parse_genomic_coord = TRUE,
combined_sites = FALSE,
silent = TRUE,
meme_path = NULL,
...
)
# S3 method for default
runMeme(
input,
control = NA,
outdir = "auto",
alph = "dna",
parse_genomic_coord = TRUE,
combined_sites = FALSE,
silent = TRUE,
meme_path = NULL,
...
)
path to fasta, Biostrings::BStringSet list, or list of
Biostrings::BStringSet (can generate using get_sequence()
)
any data type as in input
, or a character vector of
names(input)
to use those regions as control sequences. Using sequences
as background requires an alternative objective function. Users must pass a non-default value of
objfun
to ...
if using a non-NA control set (default: NA)
(default: "auto") Directory where output data will be stored.
one of c("dna", "rna", "protein") or path to alphabet file (default: "dna").
logical(1)
whether to parse genomic coordinates
from fasta headers. Requires headers are in the form: "chr:start-end", or
will result in an error. Automatically set to FALSE
if alph = "protein"
. This setting only needs to be changed if using a custom-built
fasta file without genomic coordinates in the header.
logical(1)
whether to return combined sites
information (coerces output to list) (default: FALSE)
Whether to suppress printing stdout to terminal (default: TRUE)
path to "meme/bin/". If unset, will use default search behavior:
meme_path
setting in options()
MEME_PATH
setting in .Renviron
or .bashrc
additional arguments passed to MEME (see below)
MEME results in universalmotif_df format (see:
universalmotif::to_df()
). sites_hits
is a nested data.frame
column containing the position within each input sequence of matches to the
identified motif.
Note that MEME can take a long time to run. The more input sequences used, the wider the motifs searched for, and the more motifs MEME is asked to discover will drastically affect runtime. For this reason, MEME usually performs best on a few (<50) short (100-200 bp) sequences, although this is not a requirement. Additional details on how data size affects runtime can be found here.
MEME works best when specifically tuned to the analysis question. The default
settings are unlikely to be ideal. It has several complex arguments
documented here, which runMeme()
accepts as R function arguments (see details below).
If discovering motifs within ChIP-seq, ATAC-seq, or similar peaks, MEME may perform
best if using sequences flaking the summit (the site of maximum signal) of
each peak rather than the center. ChIP-seq or similar data can also benefit
from setting revcomp = TRUE, minw = 5, maxw = 20
. For more tips on using
MEME to analyze ChIP-seq data, see the following
tips page.
runMeme()
accepts all valid arguments to meme as arguments passed to ...
.
For flags without values, pass them as flag = TRUE
. The dna
, rna
, and
protein
flags should instead be passed to the alph
argument of
runMeme()
. The arguments passed to MEME often have many interactions
with each other, for a detailed description of each argument see
MEME Commandline Documentation.
If you use runMeme()
in your analysis, please cite:
Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. pdf
The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.
if (meme_is_installed()) {
seqs <- universalmotif::create_sequences("CCRAAAW", seqnum = 4)
names(seqs) <- 1:length(seqs)
runMeme(seqs, parse_genomic_coord = FALSE)
}
#> motif name altname consensus alphabet strand icscore nsites eval
#> 1 <mot:CCAC..> CCACAAAC MEME-1 CCAMAAAC DNA + 5.897346 2 40000
#> type pseudocount bkg width
#> 1 PPM 1 0.52200, 0.46800, 0.00488, 0.00488 8
#> sites_hits
#> 1 3, 2, 10, 18, 0.00356, 0.0101, CCAWAAAC, CCACAWRC
#>
#> [Hidden empty columns: family, organism, bkgsites, pval, qval.]