Identifying Promoter Regions Using the Viterbi Algorithm

Hidden Markov Models

Example of a Hidden Markov Model.

Converting to Code

from Bio.HMM.MarkovModel import HiddenMarkovModeltransition_alphabet = ['Rainy', 'Sunny']
emission_alphabet = ['W', 'S', 'C'] # W - walk, S - shop, C - clean
initial_prob = {'Rainy': 0.3, 'Sunny': 0.7}
transition_prob = {('Rainy', 'Rainy'): 0.4,
('Rainy', 'Sunny'): 0.6,
('Sunny', 'Rainy'): 0.2,
('Sunny', 'Sunny'): 0.8}
emission_prob = {('Rainy', 'W'): 0.1,
('Rainy', 'S'): 0.4,
('Rainy', 'C'): 0.5,
('Sunny', 'W'): 0.6,
('Sunny', 'S'): 0.3,
('Sunny', 'C'): 0.1}
model = HiddenMarkovModel(transition_alphabet, emission_alphabet, initial_prob, transition_prob, emission_prob, 1, 1)

Viterbi Algorithm

model.viterbi(Seq('WSC'), transition_alphabet)
(Seq('SunnySunnyRainy'), -4.5972020163389145)

Identifying Promoter Regions

Example of Hidden Markov Model in biology.
from Bio.HMM.MarkovModel import HiddenMarkovModeltransition_alphabet = ['B', 'P'] # B - background, P - promoter
emission_alphabet = ['A', 'T', 'C', 'G']
initial_prob = {'B': 0.5, 'P': 0.5}
transition_prob = {('B', 'B'): 0.85,
('B', 'P'): 0.15,
('P', 'P'): 0.75,
('P', 'B'): 0.25}
emission_prob = {('B', 'A'): 0.25,
('B', 'T'): 0.25,
('B', 'C'): 0.25,
('B', 'G'): 0.25,
('P', 'A'): 0.15,
('P', 'T'): 0.13,
('P', 'C'): 0.42,
('P', 'G'): 0.3}
model = HiddenMarkovModel(transition_alphabet, emission_alphabet, initial_prob, transition_prob, emission_prob, 1, 1)
model.viterbi(Seq('AGTACACTGGT'), transition_alphabet)
model.viterbi(Seq('GCGCGCGCAA'), transition_alphabet)
(Seq('BBBBBBBBBBB'), -17.567574447856494)
(Seq('PPPPPPPPBB'), -15.314217188702496)

HMMs with “Memory”

array_promoter = [0.18, 0.274, 0.426, 0.12, 0.171, 0.368, 0.274, 0.188, 0.161, 0.339, 0.375, 0.125, 0.079, 0.355, 0.384, 0.182]
vals_pp = [0.75*val for val in array_promoter]
vals_pb = [0.25*val for val in array_promoter]
array_background = [0.3, 0.205, 0.285, 0.21, 0.322, 0.298, 0.078, 0.302, 0.248, 0.246, 0.298, 0.208, 0.177, 0.239, 0.292, 0.292]
vals_bp = [0.15*val for val in array_background]
vals_bb = [0.85*val for val in array_background]
model.viterbi(Seq('AGTACACTGGT'), transition_alphabet)
model.viterbi(Seq('GCGCGCGCAA'), transition_alphabet)
(Seq('A-G-T-A-C-A-C-T-G-G-T-'), -17.77362274426406)
(Seq('G+C+G+C+G+C+G+C-A-A-'), -16.064944938458574)

Viterbi Algorithm

def viterbi_algorithm(observations, states, start_p, trans_p, emit_p, table):
V = [{}]

for st in states:
V[0][st] = {"prob": start_p[st] * emit_p[st][observations[0]], "prev": None}
for t in range(1, len(observations)):
V.append({})
for st in states:
max_tr_prob = V[t - 1][states[0]]["prob"] * trans_p[states[0]][st]
prev_st_selected = states[0]
for prev_st in states[1:]:
tr_prob = V[t - 1][prev_st]["prob"] * trans_p[prev_st][st]
if tr_prob > max_tr_prob:
max_tr_prob = tr_prob
prev_st_selected = prev_st

max_prob = max_tr_prob * emit_p[st][observations[t]]
V[t][st] = {"prob": max_prob, "prev": prev_st_selected}
opt = []
max_prob = 0.0
best_st = None

for st, data in V[-1].items():
if data["prob"] > max_prob:
max_prob = data["prob"]
best_st = st
opt.append(best_st)
previous = best_st
for t in range(len(V) - 2, -1, -1):
opt.insert(0, V[t + 1][previous]["prev"])
previous = V[t + 1][previous]["prev"]
print (" ".join(opt) + ", probability = %s" % max_prob)
Final result of the self-coded Viterbi algorithm.
HiddenMarkovModel Implementation
G+C+G+C+G+C+G+C-A-A-, probability = -16.064944938458574

Own Implementation
G+C+G+C+G+C+G+C-A-A-, probability = 1.0545885728228732e-07

TL;DR

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Blue Book for Bulldozers Competition Part 7 (Optional) — Deep Learning for Tabular Data I

Tokenization (Building a Tokenizer and a Sentencizer)

Pandas-Categorical and Continuous values encoding.

Predicting Movie Tags Using Plot Summaries

Build Knowledge Graph from unstructured corpus using Machine Learning

Detecting Pneumonia in X-Ray Images with a Convolutional Neural Network

New major release for nebullvm, an open-source to speed up AI inference by 2–30 times

CDS PhD Students Co-Author Papers Present at CogSci 2021 Conference

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aditya Mittal

Aditya Mittal

More from Medium

Data Scientist: Which area of specialization to take?

Metrics for the performance measurement of machine learning classifiers

Prediction using XGBRegressor (Kaggle Competition- Tabular Playground Series — Jan 2022)

Nine Tools and Approaches For Creating Unbiased AI Products