International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.4, No.5, October 2014
DOI : 10.5121/ijcsea.2014.45011
T
RUNCATED BOOLEAN MATRICES FOR DNA COMPUTATION
Nordiana Rajaee
1
, Awang Ahmad Sallehin Awang Hussaini
2
andSharifahMasniah Wan Masra
4
1
Faculty of Engineering, Universiti Malaysia Sarawak
2
Faculty of Resource Science and Technology, Universiti MalaysiaSarawak
3
Faculty of Engineering, Universiti Malaysia Sarawak
A
BSTRACT
Although DNA computing has emerged as a new computing paradigm with its massive parallel computingcapabilities, the large number of DNA required for larger size of computational problems still remain as astumbling block to its development as practical computing. In this paper, we propose a modification toimplement a physical experimentation of two Boolean matrices multiplication problem with DNAcomputing. The Truncated Matrices reduces the number of DNA sequences and lengths utilized to computethe problem with DNA computing.
K
EYWORDS
DNA computation, Boolean Matrices, Biomolecular tools, Parallel Overlap Assembly
1.I
NTRODUCTION
When Leonard M Adleman first proposed theuse of DNA for computation in solving theHamiltonian Path Problem (HPP) in 1994, the computation was implemented in an invitroexperimentation with designed DNA oligonucleotide sequences to represent the vertices and theedges. The solution to the computation was then derived from the chemical reactions via biomolecular tools such as hybridicationligation method, polymerase chain reaction and cutting byrestriction enzymes. The output was then visualized in gel electrophoresis process. Thecomputation of sevennode HPP took seven days to complete [1]. Since then, many proposalswere presented to compute problems with DNA computation but most of them still rely on the
L.M Adleman’s architecture to carry out the computation. Although the massive paral
lelcomputing capabilities of DNA computing promises faster and denser computation there remainseveral drawbacks which prevent it from becoming a practical computing material. One reason isthe exponential requirement of DNA in computing larger size of computational problem[2].Current strategy in DNA computing is to embed the computation problems in the DNAoligonucleotides sequences and derive the solution by eliminating incorrect DNA via selectiveprocesses. For a sevennode HPP, the problem wasencoded in a 20 oligonucleotide sequence.For a 23node HPP, the computation will require 1 kg of DNA and for a 70node HPP, thecomputation will require 10
25
kg of DNAto represent all the nodes [3].Other problems such asmaximal clique problems, vertexcover problems and set packaging problems all show similarlyexponential requirement of DNA and increased time for the computations. LaBean et al (2000)proposed that an1.89
n
volume, O (n
2
+m
2
) time molecular algorithm for the 3coloring problem
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.4, No.5, October 2014
2
and a 1.51
n
volume, O (n
2
m
2
) time molecular algorithm for the independent set problem, where nand m are, subsequently, the number of vertices and the number of edges in the problems resolved[4]. Fu (1997) presented a polynomial time algorithm with 1.497
n
volumefor the 3SATproblem, a polynomial time algorithm with a 1.345
n
volume for the 3coloring problem and apolynomial time algorithm with a 1.229
n
volume for the independent set [5]. Bunow goes on toestimate that an extension combinatorial database wouldrequire nearly 10
70
nucleotides (bycomparison, the universe is estimated to contain roughly 10
80
subatomic particles) [7].The srcinal algorithm to solve a Boolean matrix multiplication with DNA requires the generationof initial vertices, intermediatevertices, terminal vertices and directed edges to link the initialintermediate / intermediateterminal vertices. In our previous works in solving Boolean Matriceswith DNA computing, the quantity of DNA oligonucleotides to encode the problem isproportionate to the number of vertices and edges existing in the graph problem representing thematrix multiplication. The number of primers to represent the elements in the product matrix isderived from its total number of row and column indicators whereas thetotal tubes to representeach element in the product matrix is derived from the total number of primer combinations[79].For a 2 x 2 product matrix, the total number of primers required is 4 and the total number of tubesis also 4. In other words, for
an (m x k) ã (k x n) matrix multiplication problem, the total number
of primers is m + n and total number of tubes is m x n is required to represent the problem.Given any matrix multiplication problem with increasing N number of intermediate matrices, thenumber of intermediate vertices for the problem also increases (though not necessarily thenumber of test tubes representing the product matrix). For example, consider matrix problemswith pth power, a (10 x 10)
2
matrix and a (10 x 10)
10
matrix multiplication. The number of testtubes representing both problems is 100 but the number of intermediate vertices is 10 (for p = 2)and 90 (for p = 10). The number of primers and tubes increases also drastically for a larger N x Ncomputation. For a 10 x 10 product matrix, the total number of primers required is 20 and thetotal number of tubes to represent all elements in the product matrix is 100 as shown in Figure 1.As the size of the problem increases, the volume of DNA increases exponentially and the numberof experimental work becomes tedious and impractical to be considered as a viable technology.
100000000000010000000000000100010000000100000010000000100000010000000000000100000010000000000000001000010000000100000000001000000000000010000000000010000000010000000000010000010000000010000010000000000100000001000100000000000001000100000000001000000000000100000000000010000010000000000010001000000000X
=
10 x 10 10 x 10 10 x 10X
. . . .
000100000001000000000010000000000000100000000000100000000100000000000100000100000000100000100000000010 x 10X10
p = 10
Figure 1. (10 x 10)
10
matrix multiplication
2.T
RUNCATED BOOLEAN MATRICES
In invitro implementation of Boolean Matrix Multiplication problems, DNA oligonucleotidestrands are synthesized to represent the initial vertices, intermediate vertices, terminal vertices and
edges representing the problem. All generated single stranded sequences are quoted in 5’

3’
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.4, No.5, October 2014
3
order and length is denoted in mer, in whichone mer represents one DNA oligonucleotide. Theoutput of the computation is analysed quantitatively based on the direct proportionality of theDNA sequence strands.Let us assume a Boolean matrix multiplication problem as shown in Figure 2.
0110
c d a b
1100
e f c d
1001
g h e f
0011
i j g h
0011
i j a b
=
1
st
2
nd
3
rd
4
th
a b c d e f g h i j a b i j
=
Figure 2. Four Boolean Matrix Multiplication and its graph representation
Using the srcinal algorithm, the problem is represented by a directed graph G where the totalDNA strands generated for the problem is 18 as shown in Table 1.
Table 1 Number of DNA sequence strands for vertices and edges
DNA strands sequences for vertices and edgesNumber of DNA strandsInitial vertices=
{V
a
, V
b
}
2Intermediate vertices=
{ V
c
, V
d
, V
e
, V
f
, V
g
, V
h
}
6Terminal vertices=
{V
i
, V
j
}
2Directed edges=
{E
ad
, E
bc
, E
ce
, E
de
, E
eg
, E
fh
, E
gj
, E
hj
}
8Total:18
In case of cube matrices whereby the number of rows and columns for all matrices in themultiplication are equal, we propose a modification in order to reduce the generation of verticeswith Truncated Matrix.The main idea of truncating matrices is to eliminate or reduce thegeneration of vertices by replacing them directly with directed edges. Originally, the directededge for element of value 1 in the matrices is constructed from partial complementary strandsfrom a previous vertex to a next vertex. With Truncated Matrices, edges are constructed directlyfrom combination of sequences for rows and columns containing an element value of 1.Thus, the problem in Figure 2 is modelled such that each row and column indicator for all thematrices refers to a library containing predefined sequences for a vertex label. Table 2 shows thelibrary of generated DNA strand sequences for the vertex labels.
International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.4, No.5, October 2014
4Table 2 DNA Sequence for Vertex Labels
SequenceComplementary Sequence
a
cgatggcgtggctaccgcac
b
cccgagcgttgggctcgcaa
c
ggacagccctcctgtcggga
d
tgccgtagcgacggcatcgc
e
caacggtggcgttgccaccg
f
ctctcaggcggagagtccgc
g
gctcctggggcgaggacccc
h
gagggcgtcgctcccgcagc
i
cgatggcgtggctaccgcac
j
ggggcggaatccccgcctta
The edges constructed to represent the problem are shown in Table 3 and Table 4 for odd andeven matrices existing in the problem. SeqLen represents the sequence length of the strands,GC% represent the GC content in the sequence strands whichmay influence the stability of thestrands, and Tm refers to the melting temperature.For matrix = Odd:Generate edge sequences for all elements of value 1 from corresponding row and columnsequences. From the 1st matrix, elements of value 1 exist for intersections of
row a
–
column d
and
row b
–
column c
. From the 3rd matrix, elements of value 1 exist for intersections of
row e
–
column g
and
row f
–
column h
. Therefore, the generated edges for the odd matrices are as shownin Table 3:
Table 3 DNA Sequence for Edges (Matrix: Odd)
EdgesSequenceSeqLenGC%Tm
ad
cgatggcgtgtgccgtagcg200.7074.5
bc
cccgagcgttggacagccct200.7073.3
eg
caacggtggcgctcctgggg200.7576.5
fh
ctctcaggcggagggcgtcg200.7574.2
For matrix = Even:Generate complementary edge sequences for all elements of value 1 from corresponding row andcolumn sequences. From the 2nd matrix, elements of value 1 exist for intersections of
row c
–
column e
and
row d
–
column e
. From the 4th matrix, elements of value 1 exist for intersectionsof
row g
–
column j
and
row h
–
column j
. Therefore, the generated edges for the even matricesare as shown in Table 4: