Regulatory potential modeling#

Open In Colab

MIRA approaches the problem of comparing gene expression and accessibility at the cell and gene levels. The previous tutorials covered cell-level analyses - topic modeling, pseudotime trajectory inference, etc. This tutorial will cover gene-level Regulatory Potential modeling (RP modeling).

A regulatory potential model relates a gene’s expression to nearby cis-regulatory elements (CREs). While linking changes in accessibility to changes in gene expression is confounded by cell-wide changes in state, proximity of a CRE to the TSS of a gene gives a mechanistic justification to assume some regulatory effect. Thus, MIRA learns upstream and downstream decay distances that describe the range at which changes in local accessible chromatin appear to influence expression. Each RP model is a statistical model describing a gene’s relationship with its local chromatin neighborhood.

We can use RP models to:

  • Predict the expression state of a gene given its local chromatin state

  • Find transcription factors that bind potential regulatory sites around each gene

  • Understand the regulatory ranges of influence around genes

  • Identify genes where local chromatin and expression are out-of-sync.

Let’s start by importing some packages:

[1]:
import mira
import scanpy as sc
import anndata
import warnings
warnings.simplefilter("ignore")
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
matplotlib.rc("font", size = 12)
from IPython.display import Image
mira.utils.pretty_sderr()
INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
[1]:

and downloading the SHARE-seq skin dataset (if not downloaded already):

[4]:
mira.datasets.ShareseqTopicModels()
mira.datasets.ShareseqAnnotatedData()

rna_data = anndata.read_h5ad('mira-datasets/shareseq_annotated_data/rna_data.joint_representation.rp_modeled.h5ad')
atac_data = anndata.read_h5ad('mira-datasets/shareseq_annotated_data/atac_data.joint_representation.motif_calls.tss_annotated.h5ad')

rna_model = mira.topics.load_model('mira-datasets/shareseq_topic_models/rna_model.pth')
atac_model = mira.topics.load_model('mira-datasets/shareseq_topic_models/atac_model.pth')
INFO:mira.datasets.datasets:Dataset already on disk.
INFO:mira.datasets.datasets:Dataset contents:
        * mira-datasets/shareseq_topic_models
                * atac_model.pth
                * rna_model.pth
INFO:mira.datasets.datasets:Dataset already on disk.
INFO:mira.datasets.datasets:Dataset contents:
        * mira-datasets/shareseq_annotated_data
                * rna_data.joint_representation.rp_modeled.h5ad
                * atac_data.joint_representation.motif_calls.tss_annotated.h5ad
INFO:mira.topic_model.base:Moving model to CPU for inference.
INFO:mira.topic_model.base:Moving model to device: cpu
INFO:mira.topic_model.base:Moving model to CPU for inference.
INFO:mira.topic_model.base:Moving model to device: cpu

TSS annotations#

To start, we must annotate the genes in our multiome dataset with TSS locations. Each gene symbol (or however you wish to identify your genes), must be associated with a unique TSS. If a gene symbol has multiple TSS annotations, only the first will be used.

The required information for each gene is the:

  1. chromosome

  2. transcription start

  3. transcription end

  4. strand

  5. symbol/identifier

Non-redundant human (hg38 GENCODE VM39) and mouse (mm10 GENCODE VM23) TSS annotations are available to download through MIRA, but you can assemble your own TSS dataframe. You can see how these annotations were assembled and get instructions on assembling your own annotation by following this tutorial.

To run mira.tl.get_distance_to_TSS, first download the chromosome sizes and TSS annotations provided by MIRA. Equivalent methods are available for hg38:

mira.datasets.hg38_chrom_sizes()
mira.datasets.hg38_tss_data()
[6]:
mira.datasets.mm10_chrom_sizes()
mira.datasets.mm10_tss_data()
INFO:mira.datasets.datasets:Dataset contents:
        * mira-datasets/mm10.chrom.sizes
INFO:mira.datasets.datasets:Dataset contents:
        * mira-datasets/mm10_tss_data.bed12
[7]:
!head -n3 mira-datasets/mm10.chrom.sizes
chr1    195471971
chr2    182113224
chrX    171031299
[8]:
!head -n3 mira-datasets/mm10_tss_data.bed12
#chrom  #txStart        #txEnd  #geneSymbol     #score  #strand #thickStart     #thickEnd       #itemRGB        #exonCount      #blockSizes     #blockStarts
chr1    3073252 3074322 4933401J01RIK   0       +       0       0       0       1       1070    0
chr1    3102015 3102125 GM26206 0       +       0       0       0       1       110     0

If you have your own annotation with different column names, you can pass those column names as keyword arguments to mira.tl.get_distance_to_TSS. The column names must exclude the preceeding “#”, if applicable.

[9]:
mira.tl.get_distance_to_TSS(atac_data,
                            tss_data='mira-datasets/mm10_tss_data.bed12',
                            genome_file='mira-datasets/mm10.chrom.sizes')
INFO:mira.tools.connect_genes_peaks:Finding peak intersections with promoters ...
INFO:mira.tools.connect_genes_peaks:Calculating distances between peaks and TSS ...
INFO:mira.tools.connect_genes_peaks:Masking other genes' promoters ...
INFO:mira.adata_interface.rp_model:Added key to var: distance_to_TSS
INFO:mira.adata_interface.rp_model:Added key to uns: distance_to_TSS_genes

RP model training#

With TSS annotations in place, we can train RP models. Most RP-model related functions take the expr_adata and atac_adata keyword arguments, so it’s easiest to put them into a dictionary for repeated use:

[10]:
rp_args = dict(expr_adata = rna_data, atac_adata= atac_data)

Next, we instantiate an RP Model. MIRA refers to the RP model variant that uses local chromatin to predict gene expression as a LITE model - or Local chromatin accessibility-Influenced Transcriptional Expression model.

The mira.rp.LITE_Model object takes the expression and accessibility topic models. You must also define a list of genes to model. To keep this short, I will demonstrate training with four genes, but in a full analysis, we recommend training models for all highly-variable genes, plus all genes that scored in the top 5% most-activated for any topic, which gives a good survey of interesting gene expression variation in your data. This following snippet gives you that genelist:

rp_genes = list(rna_model.features[rna_model.highly_variable])
for topic in range(rna_model.num_topics):
    rp_genes.extend(rna_model.get_top_genes(topic, 200))
rp_genes = list(set(rp_genes))
[11]:
litemodel = mira.rp.LITE_Model(expr_model = rna_model,
                              accessibility_model=atac_model,
                              genes = ['LEF1','WNT3','KRT23','MT2'])

Now fit the models. Most LITE_model methods take the n_workers parameter, which parallelizes across cores. You can provide the mira.rp.SaveCallback to the fit function, which will save each model as it is trained. One must simply provide the prefix or directory where RP models are to be saved.

[12]:
!mkdir -p data/
!mkdir -p data/rpmodels/

litemodel.fit(**rp_args, n_workers=4,
              callback = mira.rp.SaveCallback('data/rpmodels/'))
INFO:mira.adata_interface.core:Added cols to obs: model_read_scale
INFO:mira.adata_interface.topic_model:Fetching key X_topic_compositions from obsm
INFO:mira.adata_interface.core:Added cols to obs: softmax_denom

[12]:
<mira.rp_model.rp_model.LITE_Model at 0x7fa602ffb410>

Otherwise, one can save RP models after training via:

[13]:
litemodel.save('data/rpmodels/')

!ls data/rpmodels/
LITE_KRT23.pth LITE_MT2.pth   NITE_KRT23.pth NITE_MT2.pth
LITE_LEF1.pth  LITE_WNT3.pth  NITE_LEF1.pth  NITE_WNT3.pth

To reload LITE models from disk, either instantiate the container object and use the load function with the file prefix given above:

litemodel = mira.rp.LITE_Model(expr_model = rna_model,
                              accessibility_model=atac_model,
                              genes = ['LEF1','WNT3','KRT23','MT2'])

litemodel.load('data/rpmodels/')

Or to skip having to provide the gene list, use mira.rp.LITE_Model.load_dir. This function will load every model in the given directory.

litemodel = mira.rp.LITE_Model.load_dir(
    expr_model = rna_model,
    accessibility_model = atac_model,
    prefix='data/rpmodels/'
)

Defining local chromatin neighborhoods#

If you are interested in the distance of estimated regulatory influence for a certain gene, you can index on the litemodel object with a gene name, then use the parameters_ attribute (distance is decay rate in kilobases):

[14]:
litemodel['LEF1'].parameters_
[14]:
{'theta': 1.2137538,
 'gamma': 0.9651811,
 'bias': 1.6418066,
 'bn_mean': 0.87357396,
 'bn_var': 0.54260945,
 'bn_eps': 1e-05,
 'a_upstream': 0.2547144,
 'a_promoter': 6.3659315,
 'a_downstream': 0.21996741,
 'distance_upstream': 37.609966,
 'distance_downstream': 11.843691}

Or access the parameters of all models like so:

[15]:
pd.DataFrame(
    litemodel.parameters_
).T
[15]:
theta gamma bias bn_mean bn_var bn_eps a_upstream a_promoter a_downstream distance_upstream distance_downstream
LEF1 1.213754 0.965181 1.641807 0.873574 0.542609 0.00001 0.254714 6.365932 0.219967 37.609966 11.843691
WNT3 2.090939 0.953565 -0.168771 1.041202 0.314111 0.00001 0.451865 2.252736 0.438339 26.511311 28.357067
KRT23 0.398770 1.048645 -0.222739 0.460833 0.025817 0.00001 1.456381 0.956366 0.298903 16.317459 15.376928
MT2 1.212501 0.905758 1.345756 3.491027 1.094696 0.00001 0.643497 0.168498 3.534304 248.008972 8.631128

Say one wanted a list of all peaks within the influence of WNT3’s RP model. You can quickly access the peaks that make up a gene’s local cis-regulatory neighborhood using get_influential_local_peaks. This function takes the parameter decay_periods, which defines the distance covered by the gene’s chromatin neighborhood in terms of the decay_periods times the RP model’s upstream and downstream decay distances.

[16]:
litemodel['WNT3'].get_influential_local_peaks(atac_data, decay_periods = 5.).head(5)
[16]:
chr start end endogenous distance_to_TSS is_upstream
peak_id
7081 chr11 103697694 103697994 False 76305.0 True
8549 chr11 103892817 103893117 False 118818.0 False
13683 chr11 103897002 103897302 False 123003.0 False
14605 chr11 103773897 103774197 False 102.0 True
20920 chr11 103900925 103901225 True 126926.0 False

You can also manually access TSS-peak distances via:

[17]:
tss_distances = mira.utils.fetch_gene_TSS_distances(atac_data)
tss_distances
[17]:
AnnData object with n_obs × n_vars = 55304 × 334124
    obs: 'gene', 'chromosome', 'txStart', 'txEnd', 'strand'
    var: 'chr', 'start', 'end', 'endogenous'

In this matrix, peaks that are upstream of a gene have negative distances, while downstream have positive. All peaks that are outside of some range (by default 600 kilobases) are masked and have “zero” distance. if a peak begins exactly on the TSS of a gene, it is adjusted to be one base pair distant to avoid getting masked.

Predicting expression from accessibility#

With trained RP models, the predict function calculates the maximum aposteriori prediction of expression given the accessibility state of each gene in each cell. Additionally, the model quantifies the likelihood of that prediction.

[18]:
litemodel.predict(**rp_args)
INFO:mira.adata_interface.core:Added layer: LITE_prediction
INFO:mira.adata_interface.core:Added layer: LITE_logp

You can survey the goodness of fit by checking UMAPs. For the gene MT2, the correspondance between accessibility and expression looks almost perfect. For the other genes, not so much … more on that later!

LITE Model Predictions:

[19]:
sc.pl.umap(rna_data, color = litemodel.genes, frameon=False, color_map='viridis',
          layer='LITE_prediction', ncols=2, vmin = 0, vmax = 'p97')
../_images/notebooks_tutorial_cisregulatory_modeling_33_0.png

Gene Expression:

[20]:
sc.pl.umap(rna_data, color = litemodel.genes, **mira.pref.raw_umap(ncols=2, size = 20))
../_images/notebooks_tutorial_cisregulatory_modeling_35_0.png

Gene-TF targeting#

With probabilistic in-silico deletion (pISD)#

RP models define a local chromatin neighborhood where changes in accessibility appear to influence gene expression. One may assume that transcription factor binding in many cis-regulatory elements within this neighborhood suggests the transcription factor regulates the gene of interest. MIRA can query for these types of interactions at a systems level, finding potential regulatory associations across many gene-TF pairs.

The algorithm for calculating these associations is called probabilistic in-silico deletion (pISD), and it measures the ability of the RP model to predict expression of a gene before and after the regulatory elements predicted to bind a certain transcription factor are masked. In this way, pISD simulates a “computational knock out” of that factor.

To compute association scores using each RP model against motif-based binding site predictions of all available transcription factors, use litemodel.probabilistic_isd:

[28]:
litemodel.probabilistic_isd(**rp_args, n_workers = 4)
INFO:mira.adata_interface.rp_model:Appending to expression adata:
INFO:mira.adata_interface.rp_model:Added key to varm: 'motifs-prob_deletion')
INFO:mira.adata_interface.rp_model:Added key to layers: motifs-informative_samples
INFO:mira.adata_interface.rp_model:Added key to uns: motifs

This test defines a matrix of association scores between gene-TF pairs. We can access this matrix directly with:

[29]:
isd_matrix = mira.utils.fetch_ISD_matrix(rna_data) # ISD results stored in RNA AnnData

isd_matrix # genes are rows, TFs are columns
[29]:
FOXQ1 NFYC SREBF1 MYF5 MLXIPL SMAD2::SMAD3::SMAD4 HINFP MAFB ALX1 NR1H2::RXRA ... ZBTB6 EMX2 IRF6 MYBL1 LIN54 SNAI2 POU4F1 ETS2 NFIB TBX1
gene
0610012D04RIK 0.169540 -0.029384 -0.200878 -0.099164 -0.126589 -0.523728 -0.001994 -0.068799 0.005452 0.107080 ... 0.605548 -1.518619e-02 0.710122 -2.396497e-07 -0.005993 -0.267877 0.041505 0.076527 -0.106636 0.067892
0610038B21RIK -0.036928 0.085300 -0.051138 -0.393942 -0.061608 -0.098026 0.003808 -0.072545 0.030532 -0.095745 ... 0.074316 -3.394143e-02 0.000000 -1.471498e-03 -0.149618 0.097972 -0.090758 0.116396 0.048799 -0.022324
1110012L19RIK 0.036556 -0.275474 -0.221381 0.011563 0.183237 0.078137 0.000000 0.028468 0.028468 1.900235 ... -0.001522 -1.032316e+00 0.000000 0.000000e+00 -1.075561 0.004174 0.064352 -0.221381 -0.000001 -0.000096
1110059G10RIK 0.175102 -0.074012 -0.006563 0.122240 -0.019347 0.352985 -0.000747 -0.093108 0.000213 0.001827 ... 0.070042 -5.139177e-03 0.019617 -4.050315e-03 0.397353 -0.256149 0.053545 0.300384 0.053623 -0.021751
1600012H06RIK 0.021406 0.592367 0.277713 0.015611 -0.028802 0.198631 0.096183 0.171284 0.929829 -0.011851 ... 0.034424 9.359678e-01 0.048430 -1.033540e-01 0.385391 0.057265 0.289109 1.565489 0.116852 0.173030
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
ZSWIM2 -0.033843 -0.000013 0.024423 -0.006741 0.000005 0.000964 0.067987 0.000080 0.000088 -0.034123 ... 0.034215 2.064183e-09 0.000000 0.000000e+00 -0.002852 0.000030 0.001128 0.000372 0.033620 0.000547
ZSWIM3 -0.590584 1.978670 0.079675 -0.524485 0.052527 -2.281603 -0.172332 0.075240 -0.199830 0.137679 ... 1.032466 2.336549e-01 0.433290 2.409590e-06 0.448944 0.449440 -0.194346 0.295567 -1.740476 -1.016739
ZSWIM5 0.050175 -0.397321 0.161102 0.146898 0.047569 0.228481 0.191042 -0.100273 0.091490 -0.636234 ... -0.963587 -4.722838e-02 0.063973 7.622661e-02 -0.109906 0.035298 -0.114043 -0.715872 0.204014 0.104210
ZSWIM7 0.208178 0.147906 0.440069 0.309050 -0.565245 0.246133 0.000028 0.793261 0.318095 -0.047203 ... 0.021071 1.185795e-01 0.234073 -1.586491e-02 -0.804975 0.133644 0.584839 -0.187613 -1.530654 -0.036285
ZYX -0.927998 -0.577818 -3.116855 2.348907 -0.020789 2.062772 -0.984240 0.968229 -0.193598 -2.775327 ... 1.922306 -5.982126e-01 0.001017 4.485060e-05 0.951619 10.972371 -1.677477 0.074832 2.782266 1.382568

4883 rows × 555 columns

To find the top transcription factors associated with binding sites within WNT3’s neighborhood:

[30]:
isd_matrix.loc['WNT3'].sort_values().tail(10)
[30]:
KLF12         16.497324
RBPJ          17.133034
KLF16         18.547176
STAT3         18.610457
PRDM1         22.032345
ARID3A        23.249511
KLF11         24.054482
FOSL2         24.516716
EWSR1-FLI1    26.278431
FOS-1         66.844178
Name: WNT3, dtype: float64

Of these transcription factors, RBPJ is interesting because it is a key regulator of late hair follicle differentiation spurred by Notch signaling.

Querying with many genes#

One key limitation of the pISD algorithm is that binding site predictions are noisy - whether based on ChIP samples or especially as defined by motifs. Additionally, proximal binding predictions do not gaurantee mechanistic regulation. As such, testing one gene or one TF at a time is inadvisable.

Querying for shared association with transcription factors across many co-regulated genes, however, produces results much more robust to the afforementioned sources of error. As part of the MIRA study, we computed association scores between 4000 genes and 555 TF motifs. Let’s load those results and test them against an interseting geneset.

[31]:
rna_data = anndata.read_h5ad('mira-datasets/shareseq_annotated_data/rna_data.joint_representation.rp_modeled.h5ad')
[32]:
print('{} Genes, {} Factors'.format(
    *mira.utils.fetch_ISD_matrix(rna_data).shape
))
4883 Genes, 555 Factors

Now we can query with many genes at a time to find potential regulators. The following query finds TF regulators of genes activated by the hair follicle Cortex cells topic via Wilcoxon test over the association scores:

[33]:
isd_results = mira.tl.driver_TF_test(rna_data, geneset=rna_model.get_top_genes(10, 150))

pd.DataFrame(isd_results).sort_values('pval').head(15)
INFO:mira.tools.tf_targeting:Testing with 144 query genes and 4739 background genes, against 555 factors
[33]:
id name parsed_name pval test_statistic
247 MA0523.1 TCF7L2 TCF7L2 4.800274e-08 430104.5
205 MA0905.1 HOXC10 HOXC10 1.480306e-06 419074.0
164 MA0768.1 LEF1 LEF1 2.535205e-06 417234.0
315 MA0879.1 DLX1 DLX1 3.612823e-06 415980.0
496 MA0485.2 HOXC9 HOXC9 4.001443e-06 415557.0
411 MA0634.1 ALX3 ALX3 4.295351e-06 415366.0
385 MA0902.2 HOXB2 HOXB2 4.534243e-06 415160.5
540 MA0880.1 DLX3 DLX3 5.452141e-06 414503.0
457 MA0881.1 DLX4 DLX4 5.452141e-06 414503.0
195 MA1507.1 HOXD4 HOXD4 6.311274e-06 413973.0
232 MA0726.1 VSX2 VSX2 7.569086e-06 413309.0
531 MA0701.2 LHX9 LHX9 8.899616e-06 412714.0
399 MA0075.3 PRRX2 PRRX2 8.899616e-06 412714.0
499 MA1476.1 DLX5 DLX5 1.142593e-05 411775.0
311 MA0907.1 HOXC13 HOXC13 1.151225e-05 411748.0

Fantastic. The top results show WNT-related (TCF, LEF1), homeobox (DLX), and HOXC (HOXC13) motifs are enriched in the chromatin neighborhoods of genes upregulated in the cortex.

As with topic enrichment, it is often instructive to compare and contrast regulators acting on two sets of genes. The mira.pl.compare_driver_TFs_plot function takes two genesets and calculates TF enrichment against both. In this case, I compare regulators of the top 150 genes associated with hair follicle Cortex and Medulla topics, respectively:

[34]:
mira.pl.compare_driver_TFs_plot(rna_data,
                                geneset1=rna_model.get_top_genes(10, 150),
                                geneset2=rna_model.get_top_genes(5, 150),
                                label_factors=['SMAD2::SMAD3','SMAD4','LEF1','HOXC13','RUNX1','DLX3'],
                                fontsize=20, figsize=(5,5), color='lightgrey',
                                axlabels= ('-log10 p-value\nCortex Regulators','-log10 p-value\nMedulla Regulators'))
plt.show()
INFO:mira.tools.tf_targeting:Testing with 144 query genes and 4739 background genes, against 555 factors
INFO:mira.tools.tf_targeting:Testing with 146 query genes and 4737 background genes, against 555 factors
../_images/notebooks_tutorial_cisregulatory_modeling_48_4.png

Visualizing RP models#

After identifying TF regulators of a gene-of-interest, you may want to visualize that gene’s cis-regulatory neighborhood and assess binding sites. More comprehensive epigenetic visualization is coming in future updates of MIRA, but for now, we recommend using pygenometracks. Install pygenometracks using:

conda install -c conda-forge -c bioconda -y pygenometracks
  • or -

pip install pygenometracks

You can use the RP model’s write_bedgraph method to write the RP model profile to a bedgraph file, which can be plotted by pygenometracks.

[35]:
!mkdir -p data/trackdata
[36]:
litemodel['WNT3'].write_bedgraph(atac_data,
                                 save_name = 'data/trackdata/WNT3_rpmodel.bedgraph')

Earlier, we found that WNT3 and RBPJ share a high association score. Let’s see where RBPJ motifs are landing around the WNT3 locus. One can access the matrix of all TF motif binding predictions by fetch_factor_hits, or get a list of peaks predicted to be bound by RBPJ via fetch_binding_sites. Motifs are accessed by unique IDs, so first we need to find the ID for RBPJ:

[37]:
factors = mira.utils.fetch_factor_meta(atac_data)
factors[factors.name == 'RBPJ']
[37]:
id name parsed_name
1035 MA1116.1 RBPJ RBPJ
[38]:
rbpj_binding = mira.utils.fetch_binding_sites(atac_data, id = 'MA1116.1')
rbpj_binding.head(3)
[38]:
chr start end endogenous
peak_id
15 chr6 113306525 113306825 False
18 chr18 21300223 21300523 False
21 chr13 119690114 119690414 False
[39]:
# sort by chromosome, then start site
rbpj_binding[['chr','start','end']].sort_values(['chr','start'])\
    .to_csv('data/trackdata/RBPJ_hits.bed',  # save as bed file
           index = None, header=None, sep = '\t',
           )

Now we define a configuration file for the pygenometracks plot we wish to make (see their extensive documentation on building plots). Of note, for the [genes] heading, we can provide the same TSS annotation as was used in training the RP models, since it is a valid bed12 file.

[40]:
config_file = """

[x-axis]

[genes]
file = mira-datasets/mm10_tss_data.bed12
title = Genes
height = 3

[rp model]
file = data/trackdata/WNT3_rpmodel.bedgraph
height = 3
color = #e6e6e6
title = WNT3 RP Model
max_value = 1.1
min_value = 0
file_type = bedgraph
alpha = 0.2

[rp model2]
file = data/trackdata/WNT3_rpmodel.bedgraph
type = line:0.5
color = black
file_type = bedgraph
max_value = 1.1
min_value = 0
overlay previous = yes

[spacer]

[RBPJ hits]
file = data/trackdata/RBPJ_hits.bed
height = 1.5
style = UCSC
gene_rows = 1
color = red
title = RBPJ Motif Hits
labels = off

"""

with open('data/trackdata/config.ini', 'w') as f:
    print(config_file, file = f)

Finally run the plotting command. To find a plotting location for a given gene, you can reference:

[41]:
mira.utils.fetch_TSS_data(atac_data)['WNT3']
[41]:
{'gene_chrom': 'chr11',
 'gene_start': 103774149,
 'gene_end': 103817957,
 'gene_strand': '+'}
[45]:
!pygenometracks \
    --tracks data/trackdata/config.ini \
    --region chr11:103674149-103917957 \
    -out data/trackdata/wnt3_plot.png \
    --dpi 300 \
    --width 20 \
    --fontSize 9
WARNING:pygenometracks.tracksClass:Deprecated Warning: The section 4. [rp model2] uses parameter overlay previous but there is no more parameter with space in name. Will be substituted by overlay_previous.

WARNING:pygenometracks.tracksClass:Deprecation Warning: In section 6. [RBPJ hits], labels was set to off whereas in the future only true and false value will be accepted.

INFO:pygenometracks.tracksClass:initialize 1. [x-axis]
INFO:pygenometracks.tracksClass:initialize 2. [genes]
100%|████████████████████████████████████████| 12/12 [00:00<00:00, 15582.55it/s]
INFO:pygenometracks.tracksClass:initialize 3. [rp model]
100%|████████████████████████████████████| 4818/4818 [00:00<00:00, 33998.49it/s]
INFO:pygenometracks.tracksClass:initialize 4. [rp model2]
100%|████████████████████████████████████| 4818/4818 [00:00<00:00, 34002.38it/s]
INFO:pygenometracks.tracksClass:initialize 5. [spacer]
INFO:pygenometracks.tracksClass:initialize 6. [RBPJ hits]
100%|████████████████████████████████████████| 13/13 [00:00<00:00, 26001.88it/s]
INFO:pygenometracks.tracksClass:time initializing track(s):
INFO:pygenometracks.tracksClass:0.45090484619140625
DEBUG:pygenometracks.tracksClass:Figure size in cm is 20.0 x 10.50531914893617. Dpi is set to 300

INFO:pygenometracks.tracksClass:plotting 1. [x-axis]
INFO:pygenometracks.tracksClass:plotting 2. [genes]
DEBUG:pygenometracks.tracks.GenomeTrack:ylim 7.9799999999999995,-0.08
INFO:pygenometracks.tracksClass:plotting 3. [rp model]
INFO:pygenometracks.tracksClass:plotting 4. [rp model2]
INFO:pygenometracks.tracksClass:plotting 5. [spacer]
INFO:pygenometracks.tracksClass:plotting 6. [RBPJ hits]
DEBUG:pygenometracks.tracks.GenomeTrack:ylim 3.38,-0.08
[46]:
Image('data/trackdata/wnt3_plot.png')
[46]:
../_images/notebooks_tutorial_cisregulatory_modeling_61_0.png

RBPJ appears to bind many regions proximal to WNT3’s local chromatin neighborhood, defined by the bounds of the learned regulatory potential function.

Next#

LITE models quantify a relationship between local chromatin accessibility and gene expression. Interestingly, for some genes, changes in local chromatin appear to diverge from changes in gene expression. The next tutorial on NITE regulation demonstrates how to identify and analyze this phenomenon.