A light wrapper around Biostrings::getSeq to return named DNAStringSets, from input genomic coordinates.

get_sequence(regions, genome, score_column, ...)

Arguments

regions

GRanges, or GRangesList object. Will also accept a data.frame as long as it can be coerced to a GRanges object, or a string in the format: "chr:start-end" (NOTE: use 1-based closed intervals, not BED format 0-based half-open intervals).

genome

object of any valid type in `showMethods(Biostrings::getSeq)`. Commonly a BSgenome object, or fasta file. Used to look up sequences in regions.

score_column

optional name of column (in mcols() of `regions`) containing a fasta score that is added to the fasta header of each entry. Used when using [runAme()] in partitioning mode. (default: `NULL`)

...

additional arguments passed to Biostrings::getSeq.

Value

`Biostrings::DNAStringSet` object with names corresponding to genomic coordinates. If input is a list object, output will be a `Biostrings::BStringSetList` with list names corresponding to input list names.

Examples

# using character string as coordinates
# using BSgenome object for genome
drosophila.genome <- BSgenome.Dmelanogaster.UCSC.dm6::BSgenome.Dmelanogaster.UCSC.dm6
get_sequence("chr2L:100-200", drosophila.genome)
#> DNAStringSet object of length 1:
#>     width seq                                               names               
#> [1]   101 TGCCAACATATTGTGCTCTTTGA...GCCGCTAATCAGAAATAAATTCA chr2L:100-200

# using GRanges object for coordinates
data(example_peaks, package = "memes")
get_sequence(example_peaks, drosophila.genome)
#> DNAStringSet object of length 10:
#>      width seq                                              names               
#>  [1]   207 ATCAGAATGTTATATATTCAAGA...GTTTCTAGAATAGCCCCGGTCT chr3L:14551117-14...
#>  [2]   201 TGGGCCCATTTTTATCATTTTCC...CCCCTCACATTTTAATTGTTGT chr3L:14625651-14...
#>  [3]   265 AACAAAAAAAGGAAATAAAATGA...TTGTCACACCGCTTTTACACAT chr3L:14634333-14...
#>  [4]   277 GAGCTGATTTTAGTTTACTGCGC...ACTGCATCCGCCGACTGCCCCT chr3L:14636768-14...
#>  [5]   230 ATGGAGCGAGATAACATTTTGCC...CCAAAGAAAGAAAGAGATGCCT chr3L:14638220-14...
#>  [6]   344 TTCACTTACGTCACGCAGCAATT...TTGAGTGAATGGGTATGAATGA chr3L:14743289-14...
#>  [7]   379 AAGAAAACAGAGCAGCGCTTAAA...TACATCTTGTACTTTTCGGACT chr3L:14755114-14...
#>  [8]   519 AAGGAGAAGGAGAGTGGCACTTC...TTTTATTAAAATGTATGAAACT chr3L:14759085-14...
#>  [9]   247 TCTTCCACACGCACGTTCGGCAG...GGACACAACAGATCCAGATACA chr3L:14978519-14...
#> [10]  1424 CAGCGGTAGCCGTGGAGCCACAG...CCTGATGGCGGATGGCTTGACT chr3L:15104231-15...