See package website for full vignette

The Bioconductor build system does not have the MEME Suite installed, therefore these vignettes will not contain any R output. To view the full vignette, visit this article page on the memes website at this link

# Verify that memes detects your meme install
# should return all green checks if so.
check_meme_install()
#> checking main install
#> ✔ /opt/meme/bin
#> checking util installs
#> ✔ /opt/meme/bin/dreme
#> ✔ /opt/meme/bin/ame
#> ✔ /opt/meme/bin/fimo
#> ✔ /opt/meme/bin/tomtom
#> ✔ /opt/meme/bin/meme
#> ✔ /opt/meme/bin/streme
fa <- system.file("extdata/fasta_ex/fa1.fa", package = "memes")
# NOTE: setting e > 1 is usually not recomended. 
# the example fasta file only has 1 sequence in it
# to keep the file size low and let the example run quickly.
# I set evalue = 39 because dreme cannot detect high confidence motifs from only 1 sequence.
dreme_out <- runDreme(fa, "shuffle", evalue = 39, outdir = tempdir())

DREME Commandline Documentation

Aliased flags

memes alias DREME Flag description
nmotifs m* max number of motifs to discover
sec t max number of seconds to run
evalue e max E-value cutoff
seed s* random seed if using “shuffle” as control
ngen g number of REs to generalize

* flags marked with * must be assigned using their alias

# equivalent to above
runDreme(fa, "shuffle", evalue = 39, outdir = tempdir())
runDreme(fa, "shuffle", e = 39, outdir = tempdir(), nmotifs = 2)

dreme results are a data.frame. The motif column contains a universalmotif object with the PCM information for each de-novo discovered motif. This is so that any filtering of the results object also simply filter the available motifs. For more details about each column see the “Value” section of ?runDreme.

dreme_out
#>                     motif         name altname consensus alphabet strand
#> m01_ATTTATTA <mot:m01_..> m01_ATTTATTA DREME-1  ATTTATTA      DNA     +-
#> m02_AAA      <mot:m02_..>      m02_AAA DREME-2       AAA      DNA     +-
#> m03_AATAATAA <mot:m03_..> m03_AATAATAA DREME-3  AATAATAA      DNA     +-
#> m04_AAT      <mot:m04_..>      m04_AAT DREME-4       AAT      DNA     +-
#> m05_TAA      <mot:m05_..>      m05_TAA DREME-5       TAA      DNA     +-
#>              icscore nsites bkgsites pval eval type pseudocount
#> m01_ATTTATTA      16      4        1  0.5 20.0  PCM           0
#> m02_AAA            6      1        1  1.0 24.0  PCM           0
#> m03_AATAATAA      16     10        1  0.5  9.0  PCM           0
#> m04_AAT            6      2        1  1.0 12.0  PCM           0
#> m05_TAA            6      4        1  0.5  0.5  PCM           0
#>                                     bkg rank      seq length positive_hits
#> m01_ATTTATTA 0.667, 0.007, 0.007, 0.319    1 ATTTATTA      8             1
#> m02_AAA      0.667, 0.007, 0.007, 0.319    2      AAA      3             1
#> m03_AATAATAA 0.667, 0.007, 0.007, 0.319    3 AATAATAA      8             1
#> m04_AAT      0.667, 0.007, 0.007, 0.319    4      AAT      3             1
#> m05_TAA      0.667, 0.007, 0.007, 0.319    5      TAA      3             1
#>              negative_hits unerased_evalue positive_total negative_total
#> m01_ATTTATTA             0              20              1              1
#> m02_AAA                  1              24              1              1
#> m03_AATAATAA             0              18              1              1
#> m04_AAT                  1              12              1              1
#> m05_TAA                  0               1              1              1
#>              pos_frac neg_frac
#> m01_ATTTATTA        1        0
#> m02_AAA             1        1
#> m03_AATAATAA        1        0
#> m04_AAT             1        1
#> m05_TAA             1        0
#> 
#> [Hidden empty columns: family, organism, qval.]

The results can be converted back to universalmotif format using to_list(). The view_motifs() function accepts a universalmotif list and can be used to visualize the motifs.

library(universalmotif)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

dreme_out %>% 
  to_list() %>% 
  view_motifs()

The primary advantage of using the data.frame output allows simple integration with base subsetting, piping, and the tidyverse.

dreme_out %>% 
  # after filtering with dplyr, only motifs with length 3 will be plotted
  filter(length == 3) %>% 
  to_list() %>% 
  universalmotif::view_motifs()

universalmotif manipulations can easily be executed on the motifs as well. For example:

Updating motif information

Occasionally, it can be useful to update the metadata associated with a dicovered motif (for example, to assign a new name to a denovo motif). memes provides a few utilities to accomplish this.

update_motifs() will search for specific column names which describe properties of the motif column and update the metadata in the motif column to reflect those values. See ?update_motifs for details.

as_universalmotif() will convert one of these special universalmotif data.frames into a universalmotif list after updating the metadata to reflect values as in update_motifs().

# update_motifs will update the values in the motif column
# to values in the data.frame
dreme_edit <- dreme_out %>% 
  dplyr::mutate(name = c("one", "two", "three", "four", "five")) %>% 
  update_motifs()

# to_list() will first update motif information
# before returning only the motif column
edit_motifs <- dreme_out %>% 
  dplyr::mutate(name = c("one", "two", "three", "four", "five")) %>% 
  to_list()

# The following outputs are identical
# where edit_motifs is a list of motifs
# and dreme_edit is a data.frame with a motif list column
identical(edit_motifs$one, dreme_edit$motif$one)
#> [1] TRUE

Notes about shuffled control sequences

Setting control = "shuffle" will use dreme’s random number generator to shuffle the input sequences. By default, dreme will use 1 as the random seed, so repeat runs of the same shuffle command will produce the same output. To change the random seed, pass seed = [your random seed] to runDreme(). NOTE: beware system-specific differences. As of MEME v5, dreme will compile using the default python installation on a system (either python2.7 or python3). The random number generator changed between python2.7 and python3, so results will not be reproducible between systems using python2.7 vs 3 even if setting the same random seed.

One way to overcome this is to manually shuffle the sequences within R. This can be done easily using universalmotif::shuffle_sequences(). Set k = 2 to preserve dinucleotide frequency (similar to dreme’s built-in shuffle), and set rng.seed to any number to create a reproducible shuffle. The output of this function can be used directly as the control sequences.

# Create random sequences to use for this example
seq <- create_sequences(rng.seed = 100)
# Shuffle sequences preserving dinucleotide frequency
shuffle <- shuffle_sequences(seq, k = 2, rng.seed = 100)

Analysis on Multiple Groups and Differential Analysis

Often, users want to perform motif analysis on many groups of sequences. For example, here we have ChIP-seq peaks for a transcription factor, E93. Analysis of chromatin accessibility in E93 peaks revealed sites that Increase accessibility, Decrease accessibility, or remain Static following E93 binding.

suppressPackageStartupMessages(library(GenomicRanges))
suppressPackageStartupMessages(library(plyranges))

data("example_chip_summits", package = "memes")
peaks <- example_chip_summits

To examine whether there are differences in motif content between increasing, decreasing, and static sites, we split the peaks into a list by their response to E93.

by_behavior <- peaks %>% 
  anchor_center() %>% 
  mutate(width = 100) %>% 
  split(mcols(.)$e93_sensitive_behavior)

Next, this list can be used directly in get_sequences() to generate a list of sequences for each set of peaks.

dm.genome <- BSgenome.Dmelanogaster.UCSC.dm3::BSgenome.Dmelanogaster.UCSC.dm3

seq_by_behavior <- by_behavior %>% 
  get_sequence(dm.genome)
names(seq_by_behavior)
#> [1] "Decreasing" "Increasing" "Static"

To run DREME on each set using shuffled input sequence as background, run:

runDreme(seq_by_behavior, control = "shuffle")

Discriminative analysis using list input

For this analysis, however, we are most interested in identifying motifs associated with increasing and decreasing that do not involve E93 binding. Therefore, a more appropriate control is to use the Static sites as background.

As always, an XStringSet object can be used as the control regions. However, running dreme in this way will run 3 jobs:

  1. Increasing vs Static
  2. Decreasing vs Static
  3. Static vs Static

This will waste time, as job #3 will detect no motifs (since input & control are identical), but will still take a long time to run. runDreme() has additional functionality to help avoid these issues, and to facilitate more complicated analysis designs.

runDreme(seq_by_behavior, control = seq_by_behavior$Static)

If the input to runDreme is a named list of XStringSet objects, control can be set to one or more values from names(input) to use those regions as background. It will skip running those regions as the input. The following code will result in these comparisons:

  1. Increasing vs Static
  2. Decreasing vs Static
runDreme(seq_by_behavior, control = "Static")

If multiple names are used in the control section, they will be combined together to make a single control set which will be used for all comparisons. Here, we use “Static” and “Decreasing” sites as the control, which will result in only running 1 comparison: Increasing vs Static+Decreasing.

runDreme(seq_by_behavior, control = c("Static", "Decreasing"))

Importing previous data

importDremeXML() can be used to import a dreme.xml file from a previous run on the MEME server or on the commandline. Details for how to save data from the DREME webserver are below.

Saving data from DREME Web Server

To download XML data from the MEME Server, right-click the DREME XML output link and “Save Target As” or “Save Link As” (see example image below), and save as <filename>.xml. This file can be read using importDremeXML()

Citation

memes is a wrapper for a select few tools from the MEME Suite, which were developed by another group. In addition to citing memes, please cite the MEME Suite tools corresponding to the tools you use.

If you use runDreme() in your analysis, please cite:

Timothy L. Bailey, “DREME: Motif discovery in transcription factor ChIP-seq data”, Bioinformatics, 27(12):1653-1659, 2011. full text

Licensing Restrictions

The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.

Session Info

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.2 LTS
#> 
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] plyranges_1.13.1      GenomicRanges_1.45.0  GenomeInfoDb_1.29.3  
#> [4] IRanges_2.27.0        S4Vectors_0.31.0      BiocGenerics_0.39.1  
#> [7] dplyr_1.0.7           universalmotif_1.11.3 memes_1.1.4          
#> 
#> loaded via a namespace (and not attached):
#>  [1] bitops_1.0-7                          matrixStats_0.60.0                   
#>  [3] fs_1.5.0                              usethis_2.0.1                        
#>  [5] rprojroot_2.0.2                       tools_4.1.0                          
#>  [7] bslib_0.2.5.1                         utf8_1.2.2                           
#>  [9] R6_2.5.0                              colorspace_2.0-2                     
#> [11] withr_2.4.2                           tidyselect_1.1.1                     
#> [13] processx_3.5.2                        compiler_4.1.0                       
#> [15] textshaping_0.3.5                     cli_3.0.1                            
#> [17] Biobase_2.53.0                        xml2_1.3.2                           
#> [19] desc_1.3.0                            DelayedArray_0.19.1                  
#> [21] rtracklayer_1.53.0                    sass_0.4.0                           
#> [23] scales_1.1.1                          readr_2.0.0                          
#> [25] pkgdown_1.6.1.9001                    systemfonts_1.0.2                    
#> [27] stringr_1.4.0                         digest_0.6.27                        
#> [29] Rsamtools_2.9.1                       rmarkdown_2.9                        
#> [31] R.utils_2.10.1                        XVector_0.33.0                       
#> [33] pkgconfig_2.0.3                       htmltools_0.5.1.1                    
#> [35] MatrixGenerics_1.5.2                  BSgenome_1.61.0                      
#> [37] fastmap_1.1.0                         highr_0.9                            
#> [39] rlang_0.4.11                          BSgenome.Dmelanogaster.UCSC.dm3_1.4.0
#> [41] BiocIO_1.3.0                          jquerylib_0.1.4                      
#> [43] farver_2.1.0                          generics_0.1.0                       
#> [45] jsonlite_1.7.2                        BiocParallel_1.27.2                  
#> [47] R.oo_1.24.0                           RCurl_1.98-1.3                       
#> [49] magrittr_2.0.1                        GenomeInfoDbData_1.2.6               
#> [51] Matrix_1.3-4                          Rcpp_1.0.7                           
#> [53] waldo_0.2.5                           munsell_0.5.0                        
#> [55] fansi_0.5.0                           lifecycle_1.0.0                      
#> [57] R.methodsS3_1.8.1                     stringi_1.7.3                        
#> [59] yaml_2.2.1                            MASS_7.3-54                          
#> [61] SummarizedExperiment_1.23.1           zlibbioc_1.39.0                      
#> [63] grid_4.1.0                            ggseqlogo_0.1                        
#> [65] parallel_4.1.0                        crayon_1.4.1                         
#> [67] lattice_0.20-44                       Biostrings_2.61.2                    
#> [69] hms_1.1.0                             knitr_1.33                           
#> [71] ps_1.6.0                              pillar_1.6.2                         
#> [73] rjson_0.2.20                          pkgload_1.2.1                        
#> [75] XML_3.99-0.6                          glue_1.4.2                           
#> [77] evaluate_0.14                         vctrs_0.3.8                          
#> [79] tzdb_0.1.2                            testthat_3.0.4                       
#> [81] gtable_0.3.0                          purrr_0.3.4                          
#> [83] tidyr_1.1.3                           rematch2_2.1.2                       
#> [85] cachem_1.0.5                          ggplot2_3.3.5                        
#> [87] xfun_0.24                             restfulr_0.0.13                      
#> [89] ragg_1.1.3                            tibble_3.1.3                         
#> [91] diffobj_0.3.4                         GenomicAlignments_1.29.0             
#> [93] memoise_2.0.0                         cmdfun_1.0.2                         
#> [95] ellipsis_0.3.2