DREME discovers short, ungapped, de-novo motifs that are relatively enriched relative to a control set of sequences. DREME can be run to discover motifs relative to a shuffled set of input sequences, or against a separately provided set of "control" sequences.
runDreme(input, control, outdir = "auto", meme_path = NULL, silent = TRUE, ...)
regions to scan for motifs. Can be any of:
path to fasta file
DNAStringSet object (can be generated from GRanges using get_sequence()
)
List of DNAStringSet objects (generated from get_sequence()
)
NOTE: if using StringSet inputs, each entry must be named (set with names()
).
NOTE: If you want to retain the raw dreme output files, you must use a path to fasta file as input, or specify an "outdir"
regions to use as background for motif search. Can be any of:
path to fasta file
DNAStringSet object (can be generated from GRanges using get_sequence)
A Biostrings::BStringSetList (generated using get_sequence
), in which
case all sequences in the list will be combined as the control set.
if input
is a list of DNAStringSet objects, a character vector of names
in input
will use those sequences as background. runDreme will not scan
those regions as input.
"shuffle" to use dreme's built-in dinucleotide shuffle feature (NOTE: if
input
is a list object with an entry named "shuffle", the list entry will
be used instead).
Optionally can also pass seed = <any number>
to ...
to use as the random
seed during shuffling. If no seed is passed, dreme will use 1 as the random
seed, so results will be reproducible if rerunning. NOTE: beware
system-specific differences. As of v5, dreme will compile using the default
python installation on a system (either python2.7 or python3). The random
number generator changed between python2.7 and python3, so results will not
be reproducible between systems using python2.7 vs 3.
path to output directory of dreme files, or "auto" to autogenerate path. Default: location of input fasta in dir named "\<input\>vs\<control\>". If input is DNAstringset, will be temporary path. This means that if you want to save the raw output files, you must use fasta files as input or use an informative (and unique) outdir name. memes will not check if it overwrites files in a directory. Directories will be recursively created if needed.
optional, path to "meme/bin/" on your local machine. runDreme will search 3 places in order for meme if this flag is unset:
the option "meme_bin" (set with options(meme_bin = "path/to/meme/bin"))
the environment variable "MEME_PATH" (set in .Renviron)
"~/meme/bin/" as the default location
If the user sets meme_path in the function call, this value will always be used
whether to suppress printing dreme stdout as a message when finishing with no errors. Can be useful for troubleshooting in situations where no motifs are discovered, but command completes successfully. (default: TRUE)
dreme flags can be passed as R function arguments to use
non-default behavior. For a full list of valid arguments, run your local
install of dreme -h, or visit the dreme documentation
website. See list below for aliases
of common flags. To set flags with no values (ex. -dna
), pass the
argument as a boolean value (ex. dna = TRUE
).
universalmotif_df
with statistics for each discovered motif. The motif
column contains a universalmotif object representation in PCM format of
each DREME motif. If no motifs are discovered, returns NULL. The column
values are as follows:
rank = ranked order of discovered motif
name = primary name of motif
altname = alternative name of motif
seq = consensus sequence of the motif
length = length of discovered motif
nsites = number of times the motif is found in input sequences
positive_hits = number of sequences in input containing at least 1 of the motif
negative_hits = number of sequences in control containing at least 1 of the motif
pval = p-value
eval = E-value
unerased_eval = Unerased E-Value
positive_total = number of sequences in input
negative_total = number of sequences in control
pos_frac = fraction of positive sequences with a hit
neg_frac = fraction of negative sequences with a hit
motif = a universalmotif object of the discovered motif
Properly setting the control
parameter is key to discovering biologically
relevant motifs. Often, using control = "shuffle"
will produce a suboptimal
set of motifs; however, some discriminative analysis designs don't have
proper "control" regions other than to shuffle.
As of MEME version 5.2.0, DREME is deprecated. Consider runStreme()
instead.
In addition to allowing any valid flag of dreme to be passed to ...
, we
provide a few user-friendly aliases for common flags which are more readable (see list below).
For example, e = 1 will use a max evalue cutoff of 1. This is equivalent to
setting evalue = 1. For additional details about each DREME flag, see the
DREME Manual Webpage.
List of values which can be passed to ...
:
NOTE: values must be referred to using their name in the "memes alias"
column, not the "DREME Flag" column.
memes alias | DREME Flag | description | default |
nmotifs | m | max number of motifs to discover | NULL |
sec | t | max number of seconds to run | NULL |
evalue | e | max E-value cutoff | 0.05 |
seed | s | random seed if using "shuffle" as control | 1 |
ngen | g | nuber of REs to generalize | 100 |
mink | mink | minimum motif width to search | 3 |
maxk | maxk | maximum motif width to search | 7 |
k | k | set mink and maxk to this value | NULL |
norc | norc | search only the input strand for sequences | FALSE |
dna | dna | use DNA alphabet | TRUE |
rna | rna | use RNA alphabet | FALSE |
protein | protein | use protein alphabet (NOT RECCOMENDED) | FALSE |
If you use runDreme()
in your analysis, please cite:
Timothy L. Bailey, "DREME: Motif discovery in transcription factor ChIP-seq data", Bioinformatics, 27(12):1653-1659, 2011. full text
The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.
if (meme_is_installed()) {
# Create random named sequences as input for example
seqs <- universalmotif::create_sequences(rng.seed = 123)
names(seqs) <- seq_along(seqs)
# Runs dreme with default settings, shuffles input as background
runDreme(seqs, "shuffle")
# Runs searching for max 2 motifs, e-value cutoff = 0.1, explicitly using the DNA alphabet
runDreme(seqs, "shuffle", nmotifs = 2, e = 0.1, dna = TRUE)
}
#> NULL