mira.tl.get_motif_hits_in_peaks#

mira.tl.get_motif_hits_in_peaks(adata, chrom='chr', start='start', end='end', pvalue_threshold=0.0001, *, genome_fasta, factor_type='motifs')#

Scan peak sequences for motif hits given by JASPAR position frequency matrices using MOODS 3. Motifs are recorded as binary hits if the p-value exceeds a given threshold.

Parameters
adataanndata.AnnData

AnnData object of chromatin accessibility. Peak locations located in .var with columns corresponding to the chromosome, start, and end coordinates given by the chrom, start and end parameters, respectively.

genome_fastastr

String, file location of fasta file of your organisms genome.

chromstr, default = “chr”

The column in adata.var corresponding to the chromosome of peaks

startstr, defualt = “start”

The column in adata.var corresponding to the start coordinate of peaks

endstr, default = “end”

The column in adata.var corresponding to the end coordinate of peaks

pvalue_thresholdfloat > 0, defualt = 0.0001

Adjusted p-value threshold for calling a motif hit within a peak.

Returns
adataanndata.AnnData
.varm[“motifs_hits”]scipy.spmatrix[float] of shape (n_motifs, n_peaks)

Called motif hits for each peak. Each value is the affinity score of a motif for a sequence. Non-significant hits are left empty in the sparse matrix.

.uns[‘motifs’]dict of type {strlist}

Dictionary of metadata for motifs. Each key is an attribute. Attributes recorded for each motif are the ID, name, parsed factor name (for lookup in expression data), and whether expression data exists for that factor. The columns are labeled id, name, parsed_name, and in_expr_data, respectively.

Note

To retrieve the metadata for motifs, one may use the method mira.utils.fetch_factor_meta(adata).

Currently, MIRA ships with the 2021 JASPAR core vertebrates collection. In the future, this will be expanded to include options for updated JASPAR collections and user-provided PFMs.

Examples

>>> atac_data.var
...                 chr   start     end
...    chr1:9778-10670     chr1    9778   10670
...    chr1:180631-181281  chr1  180631  181281
...    chr1:183970-184795  chr1  183970  184795
...    chr1:190991-191935  chr1  190991  191935
>>> mira.tl.get_motif_hits_in_peaks(atac_data, 
...    chrom = "chr", start = "start", end = "end",
...    genome_file = "~/genomes/hg38/hg38.fa"
... )