Run TomTom on target motifs — runTomTom • memes

TomTom compares input motifs to a database of known, user-provided motifs to identify matches.

runTomTom(
  input,
  database = NULL,
  outdir = "auto",
  thresh = 10,
  min_overlap = 5,
  dist = "ed",
  evalue = TRUE,
  silent = TRUE,
  meme_path = NULL,
  ...
)

Arguments

input: path to .meme format file of motifs, a list of universalmotifs, or a universalmotif data.frame object (such as the output of runDreme())
database: path to .meme format file to use as reference database (or list of universalmotifs). NOTE: p-value estimates are inaccurate when the database has fewer than 50 entries.
outdir: directory to store tomtom results (will be overwritten if exists). Default: location of input fasta file, or temporary location if using universalmotif input.
thresh: report matches less than or equal to this value. If evalue = TRUE (default), set an e-value threshold (default = 10). If evalue = FALSE, set a value between 0-1 (default = 0.5).
min_overlap: only report matches that overlap by this value or more, unless input motif is shorter, in which case the shorter length is used as the minimum value
dist: distance metric. Valid arguments: allr | ed | kullback | pearson | sandelin | blic1 | blic5 | llr1 | llr5. Default: ed (euclidean distance).
evalue: whether to use E-value as significance threshold (default: TRUE). If evalue = FALSE, uses q-value instead.
silent: suppress printing stderr to console (default: TRUE).
meme_path: path to "meme/bin/" (optional). If unset, will check R environment variable "MEME_DB (set in .Renviron), or option "meme_db" (set with option(meme_db = "path/to/meme/bin"))
...: additional flags passed to tomtom using cmdfun formating (see table below for details)

Value

data.frame of match results. Contains best_match_motif column of universalmotif objects with the matched PWM from the database, a series of best_match_* columns describing the TomTom results of the match, and a tomtom list column storing the ranked list of possible matches to each motif. If a universalmotif data.frame is used as input, these columns are appended to the data.frame. If no matches are returned, tomtom and best_match_motif columns will be set to NA and a message indicating this will print.

Details

runTomTom will rank matches by significance and return a best match motif for each input (whose properties are stored in the best_match_* columns) as well as a ranked list of all possible matches stored in the tomtom list column.

Additional arguments

runTomTom() can accept all valid tomtom arguments passed to ... as described in the tomtom commandline reference. For convenience, below is a table of valid arguments, their default values, and their description.

TomTom Flag	allowed values	default	description
bfile	file path	`NULL`	path to background model for converting frequency matrix to log-odds score (not used when `dist` is set to "ed", "kullback", "pearson", or "sandelin"
motif_pseudo	`numeric`	0.1	pseudocount to add to motifs
xalph	`logical`	FALSE	convert alphabet of target database to alphabet of query database
norc	`logical`	FALSE	Do not score reverse complements of motifs
incomplete_scores	`logical`	FALSE	Compute scores using only aligned columns
thresh	`numeric`	0.5	only report matches with significance values <= this value. Unless `evalue = TRUE`, this value must be < 1.
internal	`logical`	FALSE	forces the shorter motif to be completely contained in the longer motif
min_overlap	`integer`	1	only report matches that overlap by this number of positions or more. If query motif is smaller than this value, its width is used as the min overlap for that query
time	`integer`	`NULL`	Maximum runtime in CPU seconds (default: no limit)

Citation

If you use runTomTom() in your analysis, please cite:

Shobhit Gupta, JA Stamatoyannopolous, Timothy Bailey and William Stafford Noble, "Quantifying similarity between motifs", Genome Biology, 8(2):R24, 2007. full text

Licensing

The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.

Examples

if (meme_is_installed()) {
motif <- universalmotif::create_motif("CCRAAAW")
database <- system.file("extdata", "flyFactorSurvey_cleaned.meme", package = "memes")

runTomTom(motif, database)
}
#>         motif  name consensus alphabet strand icscore type pseudocount
#> 1 <mot:motif> motif   CCRAAAW      DNA     +-      12  PPM           0
#>                      bkg  best_match_name best_match_altname
#> 1 0.25, 0.25, 0.25, 0.25 Eip93F_SANGER_10             Eip93F
#>              best_db_name best_match_offset best_match_pval best_match_eval
#> 1 flyFactorSurvey_cleaned                 4        1.91e-07        0.000106
#>   best_match_qval best_match_strand
#> 1        0.000213                 +
#>                                                       best_match_motif
#> 1 <S4 class ‘universalmotif’ [package “universalmotif”] with 20 slots>
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   tomtom
#> 1 Eip93F_SANGER_10, rib_SANGER_5, Ets65A_SANGER_10, CG12768_SANGER_5, Eip93F, rib, Ets65A, CG12768, <S4 class ‘universalmotif’ [package “universalmotif”] with 20 slots>, <S4 class ‘universalmotif’ [package “universalmotif”] with 20 slots>, <S4 class ‘universalmotif’ [package “universalmotif”] with 20 slots>, <S4 class ‘universalmotif’ [package “universalmotif”] with 20 slots>, flyFactorSurvey_cleaned, flyFactorSurvey_cleaned, flyFactorSurvey_cleaned, flyFactorSurvey_cleaned, 4, 1, 0, 5, 1.91e-07, 0.0015, 0.0126, 0.015, 0.000106, 0.832, 7, 8.36, 0.000213, 0.832, 1, 1, +, +, +, +
#> 
#> [Hidden empty columns: altname, family, organism, nsites, bkgsites,
#>   pval, qval, eval.]