Regulatory potential modeling#

MIRA approaches the problem of comparing gene expression and accessibility at the cell and gene levels. The previous tutorials covered cell-level analyses - topic modeling, pseudotime trajectory inference, etc. This tutorial will cover gene-level Regulatory Potential modeling (RP modeling).

A regulatory potential model relates a gene’s expression to nearby cis-regulatory elements (CREs). While linking changes in accessibility to changes in gene expression is confounded by cell-wide changes in state, proximity of a CRE to the TSS of a gene gives a mechanistic justification to assume some regulatory effect. Thus, MIRA learns upstream and downstream decay distances that describe the range at which changes in local accessible chromatin appear to influence expression. Each RP model is a statistical model describing a gene’s relationship with its local chromatin neighborhood.

We can use RP models to:

Predict the expression state of a gene given its local chromatin state
Find transcription factors that bind potential regulatory sites around each gene
Understand the regulatory ranges of influence around genes
Identify genes where local chromatin and expression are out-of-sync.

Let’s start by importing some packages:

[1]:

import mira
import scanpy as sc
import anndata
import warnings
warnings.simplefilter("ignore")
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
matplotlib.rc("font", size = 12)
from IPython.display import Image
mira.utils.pretty_sderr()

INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.

[1]:

and downloading the SHARE-seq skin dataset (if not downloaded already):

[4]:

mira.datasets.ShareseqTopicModels()
mira.datasets.ShareseqAnnotatedData()

rna_data = anndata.read_h5ad('mira-datasets/shareseq_annotated_data/rna_data.joint_representation.rp_modeled.h5ad')
atac_data = anndata.read_h5ad('mira-datasets/shareseq_annotated_data/atac_data.joint_representation.motif_calls.tss_annotated.h5ad')

rna_model = mira.topics.load_model('mira-datasets/shareseq_topic_models/rna_model.pth')
atac_model = mira.topics.load_model('mira-datasets/shareseq_topic_models/atac_model.pth')

INFO:mira.datasets.datasets:Dataset already on disk.
INFO:mira.datasets.datasets:Dataset contents:
        * mira-datasets/shareseq_topic_models
                * atac_model.pth
                * rna_model.pth
INFO:mira.datasets.datasets:Dataset already on disk.
INFO:mira.datasets.datasets:Dataset contents:
        * mira-datasets/shareseq_annotated_data
                * rna_data.joint_representation.rp_modeled.h5ad
                * atac_data.joint_representation.motif_calls.tss_annotated.h5ad
INFO:mira.topic_model.base:Moving model to CPU for inference.
INFO:mira.topic_model.base:Moving model to device: cpu
INFO:mira.topic_model.base:Moving model to CPU for inference.
INFO:mira.topic_model.base:Moving model to device: cpu

TSS annotations#

To start, we must annotate the genes in our multiome dataset with TSS locations. Each gene symbol (or however you wish to identify your genes), must be associated with a unique TSS. If a gene symbol has multiple TSS annotations, only the first will be used.

The required information for each gene is the:

chromosome
transcription start
transcription end
strand
symbol/identifier

Non-redundant human (hg38 GENCODE VM39) and mouse (mm10 GENCODE VM23) TSS annotations are available to download through MIRA, but you can assemble your own TSS dataframe. You can see how these annotations were assembled and get instructions on assembling your own annotation by following this tutorial.

To run mira.tl.get_distance_to_TSS, first download the chromosome sizes and TSS annotations provided by MIRA. Equivalent methods are available for hg38:

mira.datasets.hg38_chrom_sizes()
mira.datasets.hg38_tss_data()

[6]:

mira.datasets.mm10_chrom_sizes()
mira.datasets.mm10_tss_data()

INFO:mira.datasets.datasets:Dataset contents:
        * mira-datasets/mm10.chrom.sizes

INFO:mira.datasets.datasets:Dataset contents:
        * mira-datasets/mm10_tss_data.bed12

[7]:

!head -n3 mira-datasets/mm10.chrom.sizes

chr1    195471971
chr2    182113224
chrX    171031299

[8]:

!head -n3 mira-datasets/mm10_tss_data.bed12

#chrom  #txStart        #txEnd  #geneSymbol     #score  #strand #thickStart     #thickEnd       #itemRGB        #exonCount      #blockSizes     #blockStarts
chr1    3073252 3074322 4933401J01RIK   0       +       0       0       0       1       1070    0
chr1    3102015 3102125 GM26206 0       +       0       0       0       1       110     0

If you have your own annotation with different column names, you can pass those column names as keyword arguments to mira.tl.get_distance_to_TSS. The column names must exclude the preceeding “#”, if applicable.

[9]:

mira.tl.get_distance_to_TSS(atac_data,
                            tss_data='mira-datasets/mm10_tss_data.bed12',
                            genome_file='mira-datasets/mm10.chrom.sizes')

INFO:mira.tools.connect_genes_peaks:Finding peak intersections with promoters ...
INFO:mira.tools.connect_genes_peaks:Calculating distances between peaks and TSS ...
INFO:mira.tools.connect_genes_peaks:Masking other genes' promoters ...
INFO:mira.adata_interface.rp_model:Added key to var: distance_to_TSS
INFO:mira.adata_interface.rp_model:Added key to uns: distance_to_TSS_genes

RP model training#

With TSS annotations in place, we can train RP models. Most RP-model related functions take the expr_adata and atac_adata keyword arguments, so it’s easiest to put them into a dictionary for repeated use:

[10]:

rp_args = dict(expr_adata = rna_data, atac_adata= atac_data)

Next, we instantiate an RP Model. MIRA refers to the RP model variant that uses local chromatin to predict gene expression as a LITE model - or Local chromatin accessibility-Influenced Transcriptional Expression model.

The mira.rp.LITE_Model object takes the expression and accessibility topic models. You must also define a list of genes to model. To keep this short, I will demonstrate training with four genes, but in a full analysis, we recommend training models for all highly-variable genes, plus all genes that scored in the top 5% most-activated for any topic, which gives a good survey of interesting gene expression variation in your data. This following snippet gives you that genelist:

rp_genes = list(rna_model.features[rna_model.highly_variable])
for topic in range(rna_model.num_topics):
    rp_genes.extend(rna_model.get_top_genes(topic, 200))
rp_genes = list(set(rp_genes))

[11]:

litemodel = mira.rp.LITE_Model(expr_model = rna_model,
                              accessibility_model=atac_model,
                              genes = ['LEF1','WNT3','KRT23','MT2'])

Now fit the models. Most LITE_model methods take the n_workers parameter, which parallelizes across cores. You can provide the mira.rp.SaveCallback to the fit function, which will save each model as it is trained. One must simply provide the prefix or directory where RP models are to be saved.

[12]:

!mkdir -p data/
!mkdir -p data/rpmodels/

litemodel.fit(**rp_args, n_workers=4,
              callback = mira.rp.SaveCallback('data/rpmodels/'))

INFO:mira.adata_interface.core:Added cols to obs: model_read_scale
INFO:mira.adata_interface.topic_model:Fetching key X_topic_compositions from obsm

INFO:mira.adata_interface.core:Added cols to obs: softmax_denom

[12]:

<mira.rp_model.rp_model.LITE_Model at 0x7fa602ffb410>

Otherwise, one can save RP models after training via:

[13]:

litemodel.save('data/rpmodels/')

!ls data/rpmodels/

LITE_KRT23.pth LITE_MT2.pth   NITE_KRT23.pth NITE_MT2.pth
LITE_LEF1.pth  LITE_WNT3.pth  NITE_LEF1.pth  NITE_WNT3.pth

To reload LITE models from disk, either instantiate the container object and use the load function with the file prefix given above:

litemodel = mira.rp.LITE_Model(expr_model = rna_model,
                              accessibility_model=atac_model,
                              genes = ['LEF1','WNT3','KRT23','MT2'])

litemodel.load('data/rpmodels/')

Or to skip having to provide the gene list, use mira.rp.LITE_Model.load_dir. This function will load every model in the given directory.

litemodel = mira.rp.LITE_Model.load_dir(
    expr_model = rna_model,
    accessibility_model = atac_model,
    prefix='data/rpmodels/'
)

Defining local chromatin neighborhoods#

If you are interested in the distance of estimated regulatory influence for a certain gene, you can index on the litemodel object with a gene name, then use the parameters_ attribute (distance is decay rate in kilobases):

[14]:

litemodel['LEF1'].parameters_

[14]:

{'theta': 1.2137538,
 'gamma': 0.9651811,
 'bias': 1.6418066,
 'bn_mean': 0.87357396,
 'bn_var': 0.54260945,
 'bn_eps': 1e-05,
 'a_upstream': 0.2547144,
 'a_promoter': 6.3659315,
 'a_downstream': 0.21996741,
 'distance_upstream': 37.609966,
 'distance_downstream': 11.843691}

Or access the parameters of all models like so:

[15]:

pd.DataFrame(
    litemodel.parameters_
).T

[15]:

	theta	gamma	bias	bn_mean	bn_var	bn_eps	a_upstream	a_promoter	a_downstream	distance_upstream	distance_downstream
LEF1	1.213754	0.965181	1.641807	0.873574	0.542609	0.00001	0.254714	6.365932	0.219967	37.609966	11.843691
WNT3	2.090939	0.953565	-0.168771	1.041202	0.314111	0.00001	0.451865	2.252736	0.438339	26.511311	28.357067
KRT23	0.398770	1.048645	-0.222739	0.460833	0.025817	0.00001	1.456381	0.956366	0.298903	16.317459	15.376928
MT2	1.212501	0.905758	1.345756	3.491027	1.094696	0.00001	0.643497	0.168498	3.534304	248.008972	8.631128

Say one wanted a list of all peaks within the influence of WNT3’s RP model. You can quickly access the peaks that make up a gene’s local cis-regulatory neighborhood using get_influential_local_peaks. This function takes the parameter decay_periods, which defines the distance covered by the gene’s chromatin neighborhood in terms of the decay_periods times the RP model’s upstream and downstream decay distances.

[16]:

litemodel['WNT3'].get_influential_local_peaks(atac_data, decay_periods = 5.).head(5)

[16]:

	chr	start	end	endogenous	distance_to_TSS	is_upstream
peak_id
7081	chr11	103697694	103697994	False	76305.0	True
8549	chr11	103892817	103893117	False	118818.0	False
13683	chr11	103897002	103897302	False	123003.0	False
14605	chr11	103773897	103774197	False	102.0	True
20920	chr11	103900925	103901225	True	126926.0	False

You can also manually access TSS-peak distances via:

[17]:

tss_distances = mira.utils.fetch_gene_TSS_distances(atac_data)
tss_distances

[17]:

AnnData object with n_obs × n_vars = 55304 × 334124
    obs: 'gene', 'chromosome', 'txStart', 'txEnd', 'strand'
    var: 'chr', 'start', 'end', 'endogenous'

In this matrix, peaks that are upstream of a gene have negative distances, while downstream have positive. All peaks that are outside of some range (by default 600 kilobases) are masked and have “zero” distance. if a peak begins exactly on the TSS of a gene, it is adjusted to be one base pair distant to avoid getting masked.

Predicting expression from accessibility#

With trained RP models, the predict function calculates the maximum aposteriori prediction of expression given the accessibility state of each gene in each cell. Additionally, the model quantifies the likelihood of that prediction.

[18]:

litemodel.predict(**rp_args)

INFO:mira.adata_interface.core:Added layer: LITE_prediction
INFO:mira.adata_interface.core:Added layer: LITE_logp

You can survey the goodness of fit by checking UMAPs. For the gene MT2, the correspondance between accessibility and expression looks almost perfect. For the other genes, not so much … more on that later!

LITE Model Predictions:

[19]:

sc.pl.umap(rna_data, color = litemodel.genes, frameon=False, color_map='viridis',
          layer='LITE_prediction', ncols=2, vmin = 0, vmax = 'p97')

../_images/notebooks_tutorial_cisregulatory_modeling_33_0.png

Gene Expression:

[20]:

sc.pl.umap(rna_data, color = litemodel.genes, **mira.pref.raw_umap(ncols=2, size = 20))

../_images/notebooks_tutorial_cisregulatory_modeling_35_0.png

Gene-TF targeting#

With probabilistic in-silico deletion (pISD)#

RP models define a local chromatin neighborhood where changes in accessibility appear to influence gene expression. One may assume that transcription factor binding in many cis-regulatory elements within this neighborhood suggests the transcription factor regulates the gene of interest. MIRA can query for these types of interactions at a systems level, finding potential regulatory associations across many gene-TF pairs.

The algorithm for calculating these associations is called probabilistic in-silico deletion (pISD), and it measures the ability of the RP model to predict expression of a gene before and after the regulatory elements predicted to bind a certain transcription factor are masked. In this way, pISD simulates a “computational knock out” of that factor.

To compute association scores using each RP model against motif-based binding site predictions of all available transcription factors, use litemodel.probabilistic_isd:

[28]:

litemodel.probabilistic_isd(**rp_args, n_workers = 4)

INFO:mira.adata_interface.rp_model:Appending to expression adata:
INFO:mira.adata_interface.rp_model:Added key to varm: 'motifs-prob_deletion')
INFO:mira.adata_interface.rp_model:Added key to layers: motifs-informative_samples
INFO:mira.adata_interface.rp_model:Added key to uns: motifs

This test defines a matrix of association scores between gene-TF pairs. We can access this matrix directly with:

[29]:

isd_matrix = mira.utils.fetch_ISD_matrix(rna_data) # ISD results stored in RNA AnnData

isd_matrix # genes are rows, TFs are columns

[29]:

	FOXQ1	NFYC	SREBF1	MYF5	MLXIPL	SMAD2::SMAD3::SMAD4	HINFP	MAFB	ALX1	NR1H2::RXRA	...	ZBTB6	EMX2	IRF6	MYBL1	LIN54	SNAI2	POU4F1	ETS2	NFIB	TBX1
gene
0610012D04RIK	0.169540	-0.029384	-0.200878	-0.099164	-0.126589	-0.523728	-0.001994	-0.068799	0.005452	0.107080	...	0.605548	-1.518619e-02	0.710122	-2.396497e-07	-0.005993	-0.267877	0.041505	0.076527	-0.106636	0.067892
0610038B21RIK	-0.036928	0.085300	-0.051138	-0.393942	-0.061608	-0.098026	0.003808	-0.072545	0.030532	-0.095745	...	0.074316	-3.394143e-02	0.000000	-1.471498e-03	-0.149618	0.097972	-0.090758	0.116396	0.048799	-0.022324
1110012L19RIK	0.036556	-0.275474	-0.221381	0.011563	0.183237	0.078137	0.000000	0.028468	0.028468	1.900235	...	-0.001522	-1.032316e+00	0.000000	0.000000e+00	-1.075561	0.004174	0.064352	-0.221381	-0.000001	-0.000096
1110059G10RIK	0.175102	-0.074012	-0.006563	0.122240	-0.019347	0.352985	-0.000747	-0.093108	0.000213	0.001827	...	0.070042	-5.139177e-03	0.019617	-4.050315e-03	0.397353	-0.256149	0.053545	0.300384	0.053623	-0.021751
1600012H06RIK	0.021406	0.592367	0.277713	0.015611	-0.028802	0.198631	0.096183	0.171284	0.929829	-0.011851	...	0.034424	9.359678e-01	0.048430	-1.033540e-01	0.385391	0.057265	0.289109	1.565489	0.116852	0.173030
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
ZSWIM2	-0.033843	-0.000013	0.024423	-0.006741	0.000005	0.000964	0.067987	0.000080	0.000088	-0.034123	...	0.034215	2.064183e-09	0.000000	0.000000e+00	-0.002852	0.000030	0.001128	0.000372	0.033620	0.000547
ZSWIM3	-0.590584	1.978670	0.079675	-0.524485	0.052527	-2.281603	-0.172332	0.075240	-0.199830	0.137679	...	1.032466	2.336549e-01	0.433290	2.409590e-06	0.448944	0.449440	-0.194346	0.295567	-1.740476	-1.016739
ZSWIM5	0.050175	-0.397321	0.161102	0.146898	0.047569	0.228481	0.191042	-0.100273	0.091490	-0.636234	...	-0.963587	-4.722838e-02	0.063973	7.622661e-02	-0.109906	0.035298	-0.114043	-0.715872	0.204014	0.104210
ZSWIM7	0.208178	0.147906	0.440069	0.309050	-0.565245	0.246133	0.000028	0.793261	0.318095	-0.047203	...	0.021071	1.185795e-01	0.234073	-1.586491e-02	-0.804975	0.133644	0.584839	-0.187613	-1.530654	-0.036285
ZYX	-0.927998	-0.577818	-3.116855	2.348907	-0.020789	2.062772	-0.984240	0.968229	-0.193598	-2.775327	...	1.922306	-5.982126e-01	0.001017	4.485060e-05	0.951619	10.972371	-1.677477	0.074832	2.782266	1.382568

4883 rows × 555 columns

To find the top transcription factors associated with binding sites within WNT3’s neighborhood:

[30]:

isd_matrix.loc['WNT3'].sort_values().tail(10)

[30]:

KLF12         16.497324
RBPJ          17.133034
KLF16         18.547176
STAT3         18.610457
PRDM1         22.032345
ARID3A        23.249511
KLF11         24.054482
FOSL2         24.516716
EWSR1-FLI1    26.278431
FOS-1         66.844178
Name: WNT3, dtype: float64

Of these transcription factors, RBPJ is interesting because it is a key regulator of late hair follicle differentiation spurred by Notch signaling.

Querying with many genes#

One key limitation of the pISD algorithm is that binding site predictions are noisy - whether based on ChIP samples or especially as defined by motifs. Additionally, proximal binding predictions do not gaurantee mechanistic regulation. As such, testing one gene or one TF at a time is inadvisable.

Querying for shared association with transcription factors across many co-regulated genes, however, produces results much more robust to the afforementioned sources of error. As part of the MIRA study, we computed association scores between 4000 genes and 555 TF motifs. Let’s load those results and test them against an interseting geneset.

[31]:

rna_data = anndata.read_h5ad('mira-datasets/shareseq_annotated_data/rna_data.joint_representation.rp_modeled.h5ad')

[32]:

print('{} Genes, {} Factors'.format(
    *mira.utils.fetch_ISD_matrix(rna_data).shape
))

4883 Genes, 555 Factors

Now we can query with many genes at a time to find potential regulators. The following query finds TF regulators of genes activated by the hair follicle Cortex cells topic via Wilcoxon test over the association scores:

[33]:

isd_results = mira.tl.driver_TF_test(rna_data, geneset=rna_model.get_top_genes(10, 150))

pd.DataFrame(isd_results).sort_values('pval').head(15)

INFO:mira.tools.tf_targeting:Testing with 144 query genes and 4739 background genes, against 555 factors

[33]:

	id	name	parsed_name	pval	test_statistic
247	MA0523.1	TCF7L2	TCF7L2	4.800274e-08	430104.5
205	MA0905.1	HOXC10	HOXC10	1.480306e-06	419074.0
164	MA0768.1	LEF1	LEF1	2.535205e-06	417234.0
315	MA0879.1	DLX1	DLX1	3.612823e-06	415980.0
496	MA0485.2	HOXC9	HOXC9	4.001443e-06	415557.0
411	MA0634.1	ALX3	ALX3	4.295351e-06	415366.0
385	MA0902.2	HOXB2	HOXB2	4.534243e-06	415160.5
540	MA0880.1	DLX3	DLX3	5.452141e-06	414503.0
457	MA0881.1	DLX4	DLX4	5.452141e-06	414503.0
195	MA1507.1	HOXD4	HOXD4	6.311274e-06	413973.0
232	MA0726.1	VSX2	VSX2	7.569086e-06	413309.0
531	MA0701.2	LHX9	LHX9	8.899616e-06	412714.0
399	MA0075.3	PRRX2	PRRX2	8.899616e-06	412714.0
499	MA1476.1	DLX5	DLX5	1.142593e-05	411775.0
311	MA0907.1	HOXC13	HOXC13	1.151225e-05	411748.0

Fantastic. The top results show WNT-related (TCF, LEF1), homeobox (DLX), and HOXC (HOXC13) motifs are enriched in the chromatin neighborhoods of genes upregulated in the cortex.

As with topic enrichment, it is often instructive to compare and contrast regulators acting on two sets of genes. The mira.pl.compare_driver_TFs_plot function takes two genesets and calculates TF enrichment against both. In this case, I compare regulators of the top 150 genes associated with hair follicle Cortex and Medulla topics, respectively:

[34]:

mira.pl.compare_driver_TFs_plot(rna_data,
                                geneset1=rna_model.get_top_genes(10, 150),
                                geneset2=rna_model.get_top_genes(5, 150),
                                label_factors=['SMAD2::SMAD3','SMAD4','LEF1','HOXC13','RUNX1','DLX3'],
                                fontsize=20, figsize=(5,5), color='lightgrey',
                                axlabels= ('-log10 p-value\nCortex Regulators','-log10 p-value\nMedulla Regulators'))
plt.show()

INFO:mira.tools.tf_targeting:Testing with 144 query genes and 4739 background genes, against 555 factors

INFO:mira.tools.tf_targeting:Testing with 146 query genes and 4737 background genes, against 555 factors

../_images/notebooks_tutorial_cisregulatory_modeling_48_4.png

Visualizing RP models#

After identifying TF regulators of a gene-of-interest, you may want to visualize that gene’s cis-regulatory neighborhood and assess binding sites. More comprehensive epigenetic visualization is coming in future updates of MIRA, but for now, we recommend using pygenometracks. Install pygenometracks using:

conda install -c conda-forge -c bioconda -y pygenometracks

or -

pip install pygenometracks

You can use the RP model’s write_bedgraph method to write the RP model profile to a bedgraph file, which can be plotted by pygenometracks.

[35]:

!mkdir -p data/trackdata

[36]:

litemodel['WNT3'].write_bedgraph(atac_data,
                                 save_name = 'data/trackdata/WNT3_rpmodel.bedgraph')

Earlier, we found that WNT3 and RBPJ share a high association score. Let’s see where RBPJ motifs are landing around the WNT3 locus. One can access the matrix of all TF motif binding predictions by fetch_factor_hits, or get a list of peaks predicted to be bound by RBPJ via fetch_binding_sites. Motifs are accessed by unique IDs, so first we need to find the ID for RBPJ:

[37]:

factors = mira.utils.fetch_factor_meta(atac_data)
factors[factors.name == 'RBPJ']

[37]:

	id	name	parsed_name
1035	MA1116.1	RBPJ	RBPJ

[38]:

rbpj_binding = mira.utils.fetch_binding_sites(atac_data, id = 'MA1116.1')
rbpj_binding.head(3)

[38]:

	chr	start	end	endogenous
peak_id
15	chr6	113306525	113306825	False
18	chr18	21300223	21300523	False
21	chr13	119690114	119690414	False

[39]:

# sort by chromosome, then start site
rbpj_binding[['chr','start','end']].sort_values(['chr','start'])\
    .to_csv('data/trackdata/RBPJ_hits.bed',  # save as bed file
           index = None, header=None, sep = '\t',
           )

Now we define a configuration file for the pygenometracks plot we wish to make (see their extensive documentation on building plots). Of note, for the [genes] heading, we can provide the same TSS annotation as was used in training the RP models, since it is a valid bed12 file.

[40]:

config_file = """

[x-axis]

[genes]
file = mira-datasets/mm10_tss_data.bed12
title = Genes
height = 3

[rp model]
file = data/trackdata/WNT3_rpmodel.bedgraph
height = 3
color = #e6e6e6
title = WNT3 RP Model
max_value = 1.1
min_value = 0
file_type = bedgraph
alpha = 0.2

[rp model2]
file = data/trackdata/WNT3_rpmodel.bedgraph
type = line:0.5
color = black
file_type = bedgraph
max_value = 1.1
min_value = 0
overlay previous = yes

[spacer]

[RBPJ hits]
file = data/trackdata/RBPJ_hits.bed
height = 1.5
style = UCSC
gene_rows = 1
color = red
title = RBPJ Motif Hits
labels = off

"""

with open('data/trackdata/config.ini', 'w') as f:
    print(config_file, file = f)

Finally run the plotting command. To find a plotting location for a given gene, you can reference:

[41]:

mira.utils.fetch_TSS_data(atac_data)['WNT3']

[41]:

{'gene_chrom': 'chr11',
 'gene_start': 103774149,
 'gene_end': 103817957,
 'gene_strand': '+'}

[45]:

!pygenometracks \
    --tracks data/trackdata/config.ini \
    --region chr11:103674149-103917957 \
    -out data/trackdata/wnt3_plot.png \
    --dpi 300 \
    --width 20 \
    --fontSize 9

WARNING:pygenometracks.tracksClass:Deprecated Warning: The section 4. [rp model2] uses parameter overlay previous but there is no more parameter with space in name. Will be substituted by overlay_previous.

WARNING:pygenometracks.tracksClass:Deprecation Warning: In section 6. [RBPJ hits], labels was set to off whereas in the future only true and false value will be accepted.

INFO:pygenometracks.tracksClass:initialize 1. [x-axis]
INFO:pygenometracks.tracksClass:initialize 2. [genes]
100%|████████████████████████████████████████| 12/12 [00:00<00:00, 15582.55it/s]
INFO:pygenometracks.tracksClass:initialize 3. [rp model]
100%|████████████████████████████████████| 4818/4818 [00:00<00:00, 33998.49it/s]
INFO:pygenometracks.tracksClass:initialize 4. [rp model2]
100%|████████████████████████████████████| 4818/4818 [00:00<00:00, 34002.38it/s]
INFO:pygenometracks.tracksClass:initialize 5. [spacer]
INFO:pygenometracks.tracksClass:initialize 6. [RBPJ hits]
100%|████████████████████████████████████████| 13/13 [00:00<00:00, 26001.88it/s]
INFO:pygenometracks.tracksClass:time initializing track(s):
INFO:pygenometracks.tracksClass:0.45090484619140625
DEBUG:pygenometracks.tracksClass:Figure size in cm is 20.0 x 10.50531914893617. Dpi is set to 300

INFO:pygenometracks.tracksClass:plotting 1. [x-axis]
INFO:pygenometracks.tracksClass:plotting 2. [genes]
DEBUG:pygenometracks.tracks.GenomeTrack:ylim 7.9799999999999995,-0.08
INFO:pygenometracks.tracksClass:plotting 3. [rp model]
INFO:pygenometracks.tracksClass:plotting 4. [rp model2]
INFO:pygenometracks.tracksClass:plotting 5. [spacer]
INFO:pygenometracks.tracksClass:plotting 6. [RBPJ hits]
DEBUG:pygenometracks.tracks.GenomeTrack:ylim 3.38,-0.08

[46]:

Image('data/trackdata/wnt3_plot.png')

[46]:

../_images/notebooks_tutorial_cisregulatory_modeling_61_0.png

RBPJ appears to bind many regions proximal to WNT3’s local chromatin neighborhood, defined by the bounds of the learned regulatory potential function.

Next#

LITE models quantify a relationship between local chromatin accessibility and gene expression. Interestingly, for some genes, changes in local chromatin appear to diverge from changes in gene expression. The next tutorial on NITE regulation demonstrates how to identify and analyze this phenomenon.

Pseudotime Trajectory Inference

NITE regulation