A light wrapper around Biostrings::getSeq to return named DNAStringSets, from input genomic coordinates.
get_sequence(regions, genome, score_column, ...)
GRanges, or GRangesList object. Will also accept a data.frame as long as it can be coerced to a GRanges object, or a string in the format: "chr:start-end" (NOTE: use 1-based closed intervals, not BED format 0-based half-open intervals).
object of any valid type in `showMethods(Biostrings::getSeq)`. Commonly a BSgenome object, or fasta file. Used to look up sequences in regions.
optional name of column (in mcols() of `regions`) containing a fasta score that is added to the fasta header of each entry. Used when using [runAme()] in partitioning mode. (default: `NULL`)
additional arguments passed to Biostrings::getSeq.
`Biostrings::DNAStringSet` object with names corresponding to genomic coordinates. If input is a list object, output will be a `Biostrings::BStringSetList` with list names corresponding to input list names.
# using character string as coordinates
# using BSgenome object for genome
drosophila.genome <- BSgenome.Dmelanogaster.UCSC.dm6::BSgenome.Dmelanogaster.UCSC.dm6
get_sequence("chr2L:100-200", drosophila.genome)
#> DNAStringSet object of length 1:
#> width seq names
#> [1] 101 TGCCAACATATTGTGCTCTTTGA...GCCGCTAATCAGAAATAAATTCA chr2L:100-200
# using GRanges object for coordinates
data(example_peaks, package = "memes")
get_sequence(example_peaks, drosophila.genome)
#> DNAStringSet object of length 10:
#> width seq names
#> [1] 207 ATCAGAATGTTATATATTCAAGA...GTTTCTAGAATAGCCCCGGTCT chr3L:14551117-14...
#> [2] 201 TGGGCCCATTTTTATCATTTTCC...CCCCTCACATTTTAATTGTTGT chr3L:14625651-14...
#> [3] 265 AACAAAAAAAGGAAATAAAATGA...TTGTCACACCGCTTTTACACAT chr3L:14634333-14...
#> [4] 277 GAGCTGATTTTAGTTTACTGCGC...ACTGCATCCGCCGACTGCCCCT chr3L:14636768-14...
#> [5] 230 ATGGAGCGAGATAACATTTTGCC...CCAAAGAAAGAAAGAGATGCCT chr3L:14638220-14...
#> [6] 344 TTCACTTACGTCACGCAGCAATT...TTGAGTGAATGGGTATGAATGA chr3L:14743289-14...
#> [7] 379 AAGAAAACAGAGCAGCGCTTAAA...TACATCTTGTACTTTTCGGACT chr3L:14755114-14...
#> [8] 519 AAGGAGAAGGAGAGTGGCACTTC...TTTTATTAAAATGTATGAAACT chr3L:14759085-14...
#> [9] 247 TCTTCCACACGCACGTTCGGCAG...GGACACAACAGATCCAGATACA chr3L:14978519-14...
#> [10] 1424 CAGCGGTAGCCGTGGAGCCACAG...CCTGATGGCGGATGGCTTGACT chr3L:15104231-15...