Denovo motif discovery of target regions using DREME

DREME discovers short, ungapped, de-novo motifs that are relatively enriched relative to a control set of sequences. DREME can be run to discover motifs relative to a shuffled set of input sequences, or against a separately provided set of "control" sequences.

runDreme(input, control, outdir = "auto", meme_path = NULL, silent = TRUE, ...)

Arguments

input

regions to scan for motifs. Can be any of:

path to fasta file
DNAStringSet object (can be generated from GRanges using get_sequence())
List of DNAStringSet objects (generated from get_sequence())
NOTE: if using StringSet inputs, each entry must be named (set with names()).
NOTE: If you want to retain the raw dreme output files, you must use a path to fasta file as input, or specify an "outdir"

control

regions to use as background for motif search. Can be any of:

path to fasta file
DNAStringSet object (can be generated from GRanges using get_sequence)
A Biostrings::BStringSetList (generated using get_sequence), in which case all sequences in the list will be combined as the control set.
if input is a list of DNAStringSet objects, a character vector of names in input will use those sequences as background. runDreme will not scan those regions as input.
"shuffle" to use dreme's built-in dinucleotide shuffle feature (NOTE: if input is a list object with an entry named "shuffle", the list entry will be used instead). Optionally can also pass seed = <any number> to ... to use as the random seed during shuffling. If no seed is passed, dreme will use 1 as the random seed, so results will be reproducible if rerunning. NOTE: beware system-specific differences. As of v5, dreme will compile using the default python installation on a system (either python2.7 or python3). The random number generator changed between python2.7 and python3, so results will not be reproducible between systems using python2.7 vs 3.

outdir

path to output directory of dreme files, or "auto" to autogenerate path. Default: location of input fasta in dir named "\<input\>vs\<control\>". If input is DNAstringset, will be temporary path. This means that if you want to save the raw output files, you must use fasta files as input or use an informative (and unique) outdir name. memes will not check if it overwrites files in a directory. Directories will be recursively created if needed.

meme_path

optional, path to "meme/bin/" on your local machine. runDreme will search 3 places in order for meme if this flag is unset:

the option "meme_bin" (set with options(meme_bin = "path/to/meme/bin"))
the environment variable "MEME_PATH" (set in .Renviron)
"~/meme/bin/" as the default location

If the user sets meme_path in the function call, this value will always be used

silent

whether to suppress printing dreme stdout as a message when finishing with no errors. Can be useful for troubleshooting in situations where no motifs are discovered, but command completes successfully. (default: TRUE)

...

dreme flags can be passed as R function arguments to use non-default behavior. For a full list of valid arguments, run your local install of dreme -h, or visit the dreme documentation website. See list below for aliases of common flags. To set flags with no values (ex. -dna), pass the argument as a boolean value (ex. dna = TRUE).

Value

universalmotif_df with statistics for each discovered motif. The motifcolumn contains a universalmotif object representation in PCM format of each DREME motif. If no motifs are discovered, returns NULL. The column values are as follows:

rank = ranked order of discovered motif
name = primary name of motif
altname = alternative name of motif
seq = consensus sequence of the motif
length = length of discovered motif
nsites = number of times the motif is found in input sequences
positive_hits = number of sequences in input containing at least 1 of the motif
negative_hits = number of sequences in control containing at least 1 of the motif
pval = p-value
eval = E-value
unerased_eval = Unerased E-Value
positive_total = number of sequences in input
negative_total = number of sequences in control
pos_frac = fraction of positive sequences with a hit
neg_frac = fraction of negative sequences with a hit
motif = a universalmotif object of the discovered motif

Details

Properly setting the control parameter is key to discovering biologically relevant motifs. Often, using control = "shuffle" will produce a suboptimal set of motifs; however, some discriminative analysis designs don't have proper "control" regions other than to shuffle.

As of MEME version 5.2.0, DREME is deprecated. Consider runStreme() instead.

In addition to allowing any valid flag of dreme to be passed to ..., we provide a few user-friendly aliases for common flags which are more readable (see list below). For example, e = 1 will use a max evalue cutoff of 1. This is equivalent to setting evalue = 1. For additional details about each DREME flag, see the DREME Manual Webpage.

List of values which can be passed to ...: NOTE: values must be referred to using their name in the "memes alias" column, not the "DREME Flag" column.

memes alias	DREME Flag	description	default
nmotifs	m	max number of motifs to discover	NULL
sec	t	max number of seconds to run	NULL
evalue	e	max E-value cutoff	0.05
seed	s	random seed if using "shuffle" as control	1
ngen	g	nuber of REs to generalize	100
mink	mink	minimum motif width to search	3
maxk	maxk	maximum motif width to search	7
k	k	set mink and maxk to this value	NULL
norc	norc	search only the input strand for sequences	FALSE
dna	dna	use DNA alphabet	TRUE
rna	rna	use RNA alphabet	FALSE
protein	protein	use protein alphabet (NOT RECCOMENDED)	FALSE

Citation

If you use runDreme() in your analysis, please cite:

Timothy L. Bailey, "DREME: Motif discovery in transcription factor ChIP-seq data", Bioinformatics, 27(12):1653-1659, 2011. full text

Licensing

The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.

Examples

if (meme_is_installed()) {
# Create random named sequences as input for example
seqs <- universalmotif::create_sequences(rng.seed = 123)
names(seqs) <- seq_along(seqs)

# Runs dreme with default settings, shuffles input as background
runDreme(seqs, "shuffle")

# Runs searching for max 2 motifs, e-value cutoff = 0.1, explicitly using the DNA alphabet
runDreme(seqs, "shuffle", nmotifs = 2, e = 0.1, dna = TRUE)
}
#> NULL