API#
Topic Modeling#
|
Instantiates a topic model, which learns regulatory "topics" from single-cell RNA-seq or ATAC-seq data. |
|
Generic class for topics models for analyzing gene expression data. |
|
Generic class for topics models for analyzing chromatin accessibility data. |
|
A SpeedyTuner object chooses the number of topics and the appropriate regularization to produce a model that best fits the user's dataset. |
|
Tune number of topcis using a gradient-based estimator based on the Dirichlet Process model. |
|
Connects the optuna hyperparameter optimization instance to a REDIS server backend. |
Regulatory Potential Modeling#
|
Container for multiple regulatory potential (RP) LITE models. |
|
Container for multiple regulatory potential (RP) NITE models. |
|
Gene-level RP model object. |
Pseudotime#
|
Calculates the eigengap heuristic for selecting optimal number of diffusion components to represent dataset. |
|
Plots the eigengap, the difference between consecutive eigenvalues, for estimation of the optimal number of diffusion components to represent dataset. |
|
Finds subgraphs in diffusion map or KNN graph. |
|
Calculate pseudotime and stochastic forward markov model of differentiation. |
|
Uses transport map to identify terminal cells where differentiation progress reaches a steady state. |
|
Simulate forward random walks through transport map modeling stochastic differentiation process. |
|
Parse tree structure from terminal state probabilities. |
|
Starting from a group of initial cells, trace the diffusion over time through the markov chain model of differentiation. |
Plotting#
|
Plot a streamgraph representation of a differentiation or continuous process. |
|
Plot the expression, local accessibility prediction, chromatin differential, and LITE vs. |
Plots chromatin differential scatterplot with more flexibility for coloring cells. |
|
|
Make plot of geneset enrichments results. |
|
Use pISD (probabilistic insilico deletion) association scores between transcription factors and genes to compare and contrast driving regulatorys of two genesets. |
|
Utility plot for choosing representative number of topics for a dataset in conjuction with the gradient_tune method. |
|
Tools#
|
Scan peak sequences for motif hits given by JASPAR position frequency matrices using MOODS 3. |
|
Find ChIP hits that overlap with accessible regions using CistromeDB's catalogue of publically-available datasets. |
|
Post genelist to Enrichr for comparison against pre-compiled ontologies. |
|
Fetch enrichment results from an ontology. |
|
Fetch enrichment results from ontologies. |
|
Given TSS data for genes, find the distance between the TSS of each gene and the center of each accessible site measured in the data. |
|
Calculates the NITE score (Non-locally Influence Transcriptional Expression) for each gene. |
|
Calculates the NITE score (Non-locally Influence Transcriptional Expression) for each cell. |
|
The per-cell difference in predictions between LITE and NITE models of gene is called "chromatin differential", and reflects the over or under- estimation of expression levels by local chromatin. |
Joint Representation#
|
Finds common cells between two dataframes and concatenates features to form the joint representation. |
For each cell, calculate the pointwise mutual information between RNA and ATAC topic compositions. |
|
|
Calculate the total mutual information between expression and accessibility topics. |
|
One may assume that the influence of the two modalities on the joint representation is driven by the relative magnitude of the norm of these modalities' embeddings. |
|
Get DataFrame of pearson cross-correlation between expression and accessibility topics. |
Utils/Accessors#
|
Finds common cells between two dataframes and concatenates features to form the joint representation. |
Makes Jupyter notebooks take up whole screen. |
|
Changes stderr color to blue in Jupyter notebooks. |
|
|
Subset which transcription factor binding annotations are used in downstream analysis. |
|
Returns TSS metadata from mira.tl.get_distance_to_TSS. |
|
Returns matrix of distances between gene transcription start sites and peaks. |
|
Fetch metadata associated with transcription factor binding annotations. |
|
Returns AnnData object of transcription factor binding annotations. |
|
Returns .var field of atac_adata, but subset to only contain peaks which are predicted to bind a certain transcription factor. |
|
Display GIF in Jupyter notebook. |
Datasets#
SHARE-seq skin dataset used in paper and tutorials. |
|
Streamgraph tutorial data |
|
Pseudotime trajectory inference tutorial data |
|
Topic models trained on SHARE-seq dataset. |
|
Raw count matrices for SHARE-seq skin dataset. |
|
Annotated and modeled count matrices for SHARE-seq skin dataset. |
|
Example RP models for tutorial |
|
Count matrix and topic models for mouse brain dataset |
|
Small synthetic test dataset for topic model tuning. |
|
Chromosome sizes for mm10 genome. |
|
|
Non-redundant canonical TSS locations for mm10 genome. |
Chromosome sizes for hg38 genome. |
|
|
Chromosome sizes for hg38 genome. |