TomTom compares input motifs to a database of known, user-provided motifs to identify matches.
runTomTom(
input,
database = NULL,
outdir = "auto",
thresh = 10,
min_overlap = 5,
dist = "ed",
evalue = TRUE,
silent = TRUE,
meme_path = NULL,
...
)
path to .meme format file of motifs, a list of universalmotifs,
or a universalmotif data.frame object (such as the output of runDreme()
)
path to .meme format file to use as reference database (or list of universalmotifs). NOTE: p-value estimates are inaccurate when the database has fewer than 50 entries.
directory to store tomtom results (will be overwritten if exists). Default: location of input fasta file, or temporary location if using universalmotif input.
report matches less than or equal to this value. If evalue = TRUE (default), set an e-value threshold (default = 10). If evalue = FALSE, set a value between 0-1 (default = 0.5).
only report matches that overlap by this value or more, unless input motif is shorter, in which case the shorter length is used as the minimum value
distance metric. Valid arguments: allr | ed | kullback | pearson | sandelin | blic1 | blic5 | llr1 | llr5
.
Default: ed
(euclidean distance).
whether to use E-value as significance threshold (default:
TRUE
). If evalue = FALSE, uses q-value instead.
suppress printing stderr to console (default: TRUE).
path to "meme/bin/" (optional). If unset, will check R
environment variable "MEME_DB (set in .Renviron
), or option
"meme_db" (set with option(meme_db = "path/to/meme/bin")
)
additional flags passed to tomtom using cmdfun formating (see table below for details)
data.frame of match results. Contains best_match_motif
column of
universalmotif
objects with the matched PWM from the database, a series
of best_match_*
columns describing the TomTom results of the match, and a
tomtom
list column storing the ranked list of possible matches to each
motif. If a universalmotif data.frame is used as input, these columns are
appended to the data.frame. If no matches are returned, tomtom
and
best_match_motif
columns will be set to NA
and a message indicating
this will print.
runTomTom will rank matches by significance and return a
best match motif for each input (whose properties are stored in the best_match_*
columns) as well as a ranked list of all possible matches stored in the
tomtom
list column.
Additional arguments
runTomTom() can accept all valid tomtom arguments passed to ...
as described in the
tomtom commandline reference. For
convenience, below is a table of valid arguments, their default values, and
their description.
TomTom Flag | allowed values | default | description |
bfile | file path | NULL | path to background model for converting frequency matrix to log-odds score (not used when dist is set to "ed", "kullback", "pearson", or "sandelin" |
motif_pseudo | numeric | 0.1 | pseudocount to add to motifs |
xalph | logical | FALSE | convert alphabet of target database to alphabet of query database |
norc | logical | FALSE | Do not score reverse complements of motifs |
incomplete_scores | logical | FALSE | Compute scores using only aligned columns |
thresh | numeric | 0.5 | only report matches with significance values <= this value. Unless evalue = TRUE , this value must be < 1. |
internal | logical | FALSE | forces the shorter motif to be completely contained in the longer motif |
min_overlap | integer | 1 | only report matches that overlap by this number of positions or more. If query motif is smaller than this value, its width is used as the min overlap for that query |
time | integer | NULL | Maximum runtime in CPU seconds (default: no limit) |
If you use runTomTom()
in your analysis, please cite:
Shobhit Gupta, JA Stamatoyannopolous, Timothy Bailey and William Stafford Noble, "Quantifying similarity between motifs", Genome Biology, 8(2):R24, 2007. full text
The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.
if (meme_is_installed()) {
motif <- universalmotif::create_motif("CCRAAAW")
database <- system.file("extdata", "flyFactorSurvey_cleaned.meme", package = "memes")
runTomTom(motif, database)
}
#> motif name consensus alphabet strand icscore type pseudocount
#> 1 <mot:motif> motif CCRAAAW DNA +- 12 PPM 0
#> bkg best_match_name best_match_altname
#> 1 0.25, 0.25, 0.25, 0.25 Eip93F_SANGER_10 Eip93F
#> best_db_name best_match_offset best_match_pval best_match_eval
#> 1 flyFactorSurvey_cleaned 4 1.91e-07 0.000106
#> best_match_qval best_match_strand
#> 1 0.000213 +
#> best_match_motif
#> 1 <S4 class ‘universalmotif’ [package “universalmotif”] with 20 slots>
#> tomtom
#> 1 Eip93F_SANGER_10, rib_SANGER_5, Ets65A_SANGER_10, CG12768_SANGER_5, Eip93F, rib, Ets65A, CG12768, <S4 class ‘universalmotif’ [package “universalmotif”] with 20 slots>, <S4 class ‘universalmotif’ [package “universalmotif”] with 20 slots>, <S4 class ‘universalmotif’ [package “universalmotif”] with 20 slots>, <S4 class ‘universalmotif’ [package “universalmotif”] with 20 slots>, flyFactorSurvey_cleaned, flyFactorSurvey_cleaned, flyFactorSurvey_cleaned, flyFactorSurvey_cleaned, 4, 1, 0, 5, 1.91e-07, 0.0015, 0.0126, 0.015, 0.000106, 0.832, 7, 8.36, 0.000213, 0.832, 1, 1, +, +, +, +
#>
#> [Hidden empty columns: altname, family, organism, nsites, bkgsites,
#> pval, qval, eval.]