mira.rp.LITE_Model#

class mira.rp.LITE_Model(*, expr_model, accessibility_model, genes, learning_rate=1, counts_layer=None, initialization_model=None, search_reps=1)#

Container for multiple regulatory potential (RP) LITE models. LITE models learn a relationship between a gene’s expression and accessibility in nearby cis-regulatory elements (CRE). The MIRA model assumes the regulatory influence of a CRE on a gene decays with respect to distance from that gene. MIRA learns this distance using variational Bayesian inference.

With a trained RP model, one may assess the

  • LITE/NITE characteristics of a gene: whether that gene’s expression is decoupled from changes in local chromatin.

  • Chromatin differential: the relative levels of nearby accessibility versus gene expression.

  • Insilico-deletion: predicts transcription factor regulators based on a model of nearby binding in influential CREs, as determined by the RP model.

Parameters
expr_model: mira.topics.ExpressionTopicModel

Trained MIRA expression topic model.

accessibility_modelmira.topics.AccessibilityTopicModel

Trained MIRA accessibility topic model.

genesnp.ndarray[str], list[str]

List of genes for which to learn RP models.

learning_ratefloat>0

Learning rate for L-BGFS optimizer.

counts_layerstr, default=None

Layer in AnnData that countains raw counts for modeling.

initialization_modelmira.rp.LITE_Model, mira.rp.NITE_Model, None

Initialize parameters of RP model using the provided model before further optimization with L-BGFS. This is used when training the NITE model, which is initialized with the LITE model parameters learned for the same genes, then retrained to optimized the NITE model’s extra parameters. This procedure speeds training.

Examples

Setup requires RNA and ATAC AnnData objects with shared cell barcodes and trained topic models for both modes:

>>> rp_args = dict(expr_adata = rna_data, atac_adata = atac_data)

Instantiating a LITE model (local chromatin accessibility only):

>>> litemodel = mira.rp.LITE_Model(
...     expr_model = rna_model, 
...     accessibility_model = atac_model,
...     counts_layer = 'counts',
...     genes = ['LEF1','WNT3','EDA','NOTCH1'],
... )
>>> litemodel.fit(**rp_args)
Attributes
genesnp.ndarray[str]

Array of gene names for models

featuresnp.ndarray[str]

Array of gene names for models

modelslist[mira.rp.GeneModel]

List of trained RP models

model_type{“NITE”, “LITE”}

Methods

fit([callback, n_workers, atac_topic_comps_key])

Optimize parameters of RP models to learn cis-regulatory relationships.

get_model(gene)

Gets model for gene

join(rp_model)

Merge RP models from two model containers.

load(prefix)

Load RP models saved with prefix.

load_dir([counts_layer])

Load directory of RP models.

predict(*, expr_adata, atac_adata[, ...])

Predicts the expression of genes given their cis-accessibility state.

probabilistic_isd([n_samples, checkpoint, ...])

For each gene, calcuate association scores with each transcription factor.

save(prefix)

Save RP models.

spawn_NITE_model()

Returns a NITE model seeded with the LITE model's parameters.

subset(genes)

Return a subset container of RP models.

spawn_NITE_model()#

Returns a NITE model seeded with the LITE model’s parameters.

classmethod load_dir(counts_layer=None, *, expr_model, accessibility_model, prefix)#

Load directory of RP models. Adds all available RP models into a container.

Parameters
expr_model: mira.topics.ExpressionTopicModel

Trained MIRA expression topic model.

accessibility_modelmira.topics.AccessibilityTopicModel

Trained MIRA accessibility topic model.

counts_layerstr, default=None

Layer in AnnData that countains raw counts for modeling.

prefixstr

Prefix under which RP models were saved.

Examples

>>> litemodel = mira.rp.LITE_Model.load_dir(
...     counts_layer = 'counts',
...     expr_model = rna_model, 
...     accessibility_model = atac_model,
...     prefix = 'path/to/rpmodels/'
... )
subset(genes)#

Return a subset container of RP models.

Parameters
genesnp.ndarray[str], list[str]

List of genes to subset from RP model

Examples

>>> less_models = litemodel.subset(['LEF1','WNT3'])
join(rp_model)#

Merge RP models from two model containers.

Parameters
rp_modelmira.rp.LITE_Model, mira.rp.NITE_Model

RP model container from which to append new RP models

Examples

>>> model1.genes
... ['LEF1','WNT3']
>>> model2.genes
... ['CTSC','EDAR']
>>> merged_model = model1.join(model2)
>>> merged_model.genes
... ['LEF1','WNT3','CTSC','EDAR']
__getitem__(gene)#

Alias for get_model(gene).

Examples

>>> rp_model["LEF1"]
... <mira.rp_model.rp_model.GeneModel at 0x7fa07af1cf10>
save(prefix)#

Save RP models.

Parameters
prefixstr

Prefix under which to save RP models. May be filename prefix or directory. RP models will save with format: {prefix}_{LITE/NITE}_{gene}.pth

get_model(gene)#

Gets model for gene

Parameters
genestr

Fetch RP model for this gene

load(prefix)#

Load RP models saved with prefix.

Parameters
prefixstr

Prefix under which RP models were saved.

fit(callback=None, *, expr_adata, atac_adata, n_workers=1, atac_topic_comps_key='X_topic_compositions')#

Optimize parameters of RP models to learn cis-regulatory relationships.

Parameters
expr_adataanndata.AnnData

AnnData of expression features

atac_adataanndata.AnnData

AnnData of accessibility features. Must be annotated with mira.tl.get_distance_to_TSS.

Returns
rp_modelmira.rp.LITE_Model, mira.rp.NITE_Model

RP model with optimized parameters

predict(*, expr_adata, atac_adata, n_workers=1, atac_topic_comps_key='X_topic_compositions')#

Predicts the expression of genes given their cis-accessibility state. Also evaluates the probability of that prediction for LITE/NITE evaluation.

Parameters
expr_adataanndata.AnnData

AnnData of expression features

atac_adataanndata.AnnData

AnnData of accessibility features. Must be annotated with mira.tl.get_distance_to_TSS.

Returns
anndata.AnnData
.layers[‘LITE_prediction’] or .layers[‘NITE_prediction’]: np.ndarray[float] of shape (n_cells, n_features)

Predicted relative frequencies of features using LITE or NITE model, respectively

.layers[‘LTIE_logp’] or .layers[‘NITE_logp’] : np.ndarray[float] of shape (n_cells, n_features)

Probability of observed expression given posterior predictive estimate of LITE or NITE model, respectively.

probabilistic_isd(n_samples=1500, *, checkpoint=None, hits_matrix, metadata, expr_adata, atac_adata, n_workers=1, atac_topic_comps_key='X_topic_compositions', factor_type='motifs')#

For each gene, calcuate association scores with each transcription factor. Association scores detect when a TF binds within cis-regulatory elements (CREs) that are influential to expression predictions for that gene. CREs that influence the RP model expression prediction are nearby a gene’s TSS and have accessibility that correlates with expression. This model assumes these attributes indicate a factor is more likely to regulate a gene.

Parameters
expr_adataanndata.AnnData

AnnData of expression features

atac_adataanndata.AnnData

AnnData of accessibility features. Must be annotated with TSS and factor binding data using mira.tl.get_distance_to_TSS and mira.tl.get_motif_hits_in_peaks/mira.tl.get_CHIP_hits_in_peaks.

n_samplesint>0, default=1500

Downsample cells to this amount for calculations. Speeds up computation time. Cells are sampled by stratifying over expression levels.

checkpointstr, default = None

Path to checkpoint h5 file. pISD calculations can be slow, and saving a checkpoint ensures progress is not lost if calculations are interrupted. To resume from a checkpoint, just pass the path to the h5.

Returns
anndata.AnnData
.varm[‘motifs-prob_deletion’] or .varm[‘chip-prob_deletion’]: np.ndarray[float] of shape (n_genes, n_factors)

Association scores for each gene-TF combination. Higher scores indicate greater predicted association/regulatory influence.

property parameters_#

Returns parameters of all contained RP models.