Topic Modeling#

mira.topics.make_model(n_samples, ...[, ...])

Instantiates a topic model, which learns regulatory "topics" from single-cell RNA-seq or ATAC-seq data.

mira.topics.ExpressionTopicModel(*args, **kwargs)

Generic class for topics models for analyzing gene expression data.

mira.topics.AccessibilityTopicModel(*args, ...)

Generic class for topics models for analyzing chromatin accessibility data.

mira.topics.BayesianTuner(*, model, ...[, ...])

A SpeedyTuner object chooses the number of topics and the appropriate regularization to produce a model that best fits the user's dataset.

mira.topics.gradient_tune(model, data[, ...])

Tune number of topcis using a gradient-based estimator based on the Dirichlet Process model.

mira.topics.Redis([url, heartbeat_interval, ...])

Connects the optuna hyperparameter optimization instance to a REDIS server backend.

Regulatory Potential Modeling#

mira.rp.LITE_Model(*, expr_model, ...[, ...])

Container for multiple regulatory potential (RP) LITE models.

mira.rp.NITE_Model(*, expr_model, ...[, ...])

Container for multiple regulatory potential (RP) NITE models.

mira.rp_model.rp_model.GeneModel(*, gene[, ...])

Gene-level RP model object.


mira.time.normalize_diffmap(adata[, ...])

Calculates the eigengap heuristic for selecting optimal number of diffusion components to represent dataset.

mira.pl.plot_eigengap(adata[, basis, ...])

Plots the eigengap, the difference between consecutive eigenvalues, for estimation of the optimal number of diffusion components to represent dataset.

mira.time.get_connected_components(adata[, ...])

Finds subgraphs in diffusion map or KNN graph.

mira.time.get_transport_map(adata[, ...])

Calculate pseudotime and stochastic forward markov model of differentiation.

mira.time.find_terminal_cells(adata[, ...])

Uses transport map to identify terminal cells where differentiation progress reaches a steady state.

mira.time.get_branch_probabilities(adata[, ...])

Simulate forward random walks through transport map modeling stochastic differentiation process.

mira.time.get_tree_structure(adata[, ...])

Parse tree structure from terminal state probabilities.

mira.time.trace_differentiation(adata[, ...])

Starting from a group of initial cells, trace the diffusion over time through the markov chain model of differentiation.


mira.pl.plot_stream(adata[, data, layers, ...])

Plot a streamgraph representation of a differentiation or continuous process.

mira.pl.plot_chromatin_differential(adata[, ...])

Plot the expression, local accessibility prediction, chromatin differential, and LITE vs.


Plots chromatin differential scatterplot with more flexibility for coloring cells.


Make plot of geneset enrichments results.

mira.pl.compare_driver_TFs_plot(adata[, ...])

Use pISD (probabilistic insilico deletion) association scores between transcription factors and genes to compare and contrast driving regulatorys of two genesets.

mira.pl.plot_topic_contributions(...[, ...])

Utility plot for choosing representative number of topics for a dataset in conjuction with the gradient_tune method.

mira.pl.plot_disentanglement(adata[, gene, ...])


mira.tl.get_motif_hits_in_peaks(adata[, ...])

Scan peak sequences for motif hits given by JASPAR position frequency matrices using MOODS 3.

mira.tl.get_ChIP_hits_in_peaks(adata[, ...])

Find ChIP hits that overlap with accessible regions using CistromeDB's catalogue of publically-available datasets.


Post genelist to Enrichr for comparison against pre-compiled ontologies.

mira.tl.fetch_ontology(list_id[, ontology])

Fetch enrichment results from an ontology.

mira.tl.fetch_ontologies(list_id[, ontologies])

Fetch enrichment results from ontologies.

mira.tl.get_distance_to_TSS(adata[, ...])

Given TSS data for genes, find the distance between the TSS of each gene and the center of each accessible site measured in the data.

mira.tl.get_NITE_score_genes(adata[, ...])

Calculates the NITE score (Non-locally Influence Transcriptional Expression) for each gene.

mira.tl.get_NITE_score_cells(adata[, ...])

Calculates the NITE score (Non-locally Influence Transcriptional Expression) for each cell.

mira.tl.get_chromatin_differential(adata, *)

The per-cell difference in predictions between LITE and NITE models of gene is called "chromatin differential", and reflects the over or under- estimation of expression levels by local chromatin.

Joint Representation#

mira.utils.make_joint_representation(adata1, ...)

Finds common cells between two dataframes and concatenates features to form the joint representation.


For each cell, calculate the pointwise mutual information between RNA and ATAC topic compositions.

mira.tl.summarize_mutual_information(adata1, ...)

Calculate the total mutual information between expression and accessibility topics.

mira.tl.get_relative_norms(adata1, adata2, *)

One may assume that the influence of the two modalities on the joint representation is driven by the relative magnitude of the norm of these modalities' embeddings.

mira.tl.get_topic_cross_correlation(adata1, ...)

Get DataFrame of pearson cross-correlation between expression and accessibility topics.


mira.utils.make_joint_representation(adata1, ...)

Finds common cells between two dataframes and concatenates features to form the joint representation.


Makes Jupyter notebooks take up whole screen.


Changes stderr color to blue in Jupyter notebooks.

mira.utils.subset_factors(atac_adata, *, ...)

Subset which transcription factor binding annotations are used in downstream analysis.


Returns TSS metadata from mira.tl.get_distance_to_TSS.


Returns matrix of distances between gene transcription start sites and peaks.

mira.utils.fetch_factor_meta(atac_adata[, ...])

Fetch metadata associated with transcription factor binding annotations.

mira.utils.fetch_factor_hits(atac_adata[, ...])

Returns AnnData object of transcription factor binding annotations.

mira.utils.fetch_binding_sites(atac_adata[, ...])

Returns .var field of atac_adata, but subset to only contain peaks which are predicted to bind a certain transcription factor.


Display GIF in Jupyter notebook.



SHARE-seq skin dataset used in paper and tutorials.


Streamgraph tutorial data


Pseudotime trajectory inference tutorial data


Topic models trained on SHARE-seq dataset.


Raw count matrices for SHARE-seq skin dataset.


Annotated and modeled count matrices for SHARE-seq skin dataset.


Example RP models for tutorial


Count matrix and topic models for mouse brain dataset


Small synthetic test dataset for topic model tuning.


Chromosome sizes for mm10 genome.


Non-redundant canonical TSS locations for mm10 genome.


Chromosome sizes for hg38 genome.


Chromosome sizes for hg38 genome.