mira.tl.get_ChIP_hits_in_peaks#

mira.tl.get_ChIP_hits_in_peaks(adata, chrom='chr', start='start', end='end', species='mm10', *, factor_type='chip')#

Find ChIP hits that overlap with accessible regions using CistromeDB’s catalogue of publically-available datasets.

Parameters
adataanndata.AnnData

AnnData of accessibility features

species{“hg38”, “mm10”}

Organism. CistromeDB’s catalogue contains samples for hg38 and mm10.

chromstr, default = “chr”

The column in adata.var corresponding to the chromosome of peaks

startstr, defualt = “start”

The column in adata.var corresponding to the start coordinate of peaks

endstr, default = “end”

The column in adata.var corresponding to the end coordinate of peaks

Returns
adataanndata.AnnData
.varm[“chip_hits”]scipy.spmatrix[float] of shape (n_motifs, n_peaks)

Called ChIP hits for each peak. Non-significant hits are left empty in the sparse matrix.

.uns[‘chip’]dict of type {strlist}

Dictionary of metadata for ChIP samples. Each key is an attribute. Attributes recorded for each motif are the ID, name, parsed factor name (for lookup in expression data), and whether expression data exists for that factor. The columns are labeled id, name, parsed_name, and in_expr_data, respectively.

Note

To retrieve the metadata for ChIP, one may use the method mira.utils.fetch_factor_meta(adata, factor_type = “chip”). Methods that interact with binding site data always have a factor_type parameter. This parameter defaults to “motifs”, so when using ChIP data, specify factory_type = “chip”.

Examples

>>> atac_data.var
...                       chr   start     end
...    chr1:9778-10670     chr1    9778   10670
...    chr1:180631-181281  chr1  180631  181281
...    chr1:183970-184795  chr1  183970  184795
...    chr1:190991-191935  chr1  190991  191935
>>> mira.tl.get_ChIP_hits_in_peaks(atac_data, 
...    chrom = "chr", start = "start", end = "end",
...    species = "hg38")
...    Grabbing hg38 data (~15 minutes):
...       Downloading from database    
...       Done
...    Loading gene info ...
...    Validating user-provided regions ...
...    WARNING: 71 regions encounted from unknown chromsomes: KI270728.1,GL000194.1,GL000205.2,GL000195.1,GL000219.1,KI270734.1,GL000218.1,KI270721.1,KI270726.1,KI270711.1,KI270713.1
...    INFO:mira.adata_interface.regulators:Added key to varm: chip_hits
...    INFO:mira.adata_interface.regulators:Added key to uns: chip