Skip to main content

Splice Site Detection in DNA Sequences Using a Fast Classification Algorithm

problems

In the field of biological research there are several issues related to the processing of DNA
related to the processing of data included in the field of Bioinformatics. some DNA-related issues to be solved in bioinformatics is classification of a group of data (sequences), similarity detection, separating the proteins into DNA sequence (splicing), predict the molecular structure, looking for a new drug structures etc. research in the field of DNA involves large data containing information such as gene, protein sequences, and other biological related data so that the
processing time and memory requires relatively large.

Pattern recognition in DNA is not an easy problem because in addition to having relatively large size of data which DNA is composed of exons (encoded in proteins) and introns (not encoded in proteins) which are separated without any characters (explanatory) that account for the separation between the two.

Goal

This paper describes the main aim of this method is to predict the location exons and introns in a large-size data with high accuracy and time reasonable. More generally, the problem of pattern recognition in the DNA is able to implement a system that is able to solve the problems of storage,
processing, and analysis of large DNA data.

Method

Previous research tend to use the SVM method determine the location of exons and introns separating. But in this paper are described fundamental weakness which is owned by the SVM method that is growing memory needs very high complexity is the square of the number of input data, so it can said that the dependence of the SVM is very high complexity to the size of the data sets. The main idea of the method proposed in this paper has a background of weakness
owned by the SVM method in the training process has complexity high. Repairs carried out by reducing the number of data sets used in conduct training with the consideration that the data are close to the limits / boundaries an important point and while the data is far from the hyperplane does not have strength / contribution in the process of training SVM. This resulted in the number of data sets used in the training process is much smaller than using the entire data set on regular SVM method.

The new method is an improvement of SVM is generally divided into three stages process. The first stage of this method is to determine the small-sized data sets of support vector (SV). The second stage is to conduct training using the Bayesian SV and without SV were obtained from previous data and reduce the input data are considered less representative and make the important data sets into a candidate SV. The third stage candidate SV is generated using the previous process and
using the second step in SVM.

Result

Tests were conducted in this paper is to test the accuracy and time dataset used in the training process.

The above table shows a comparison of the error value, true negative, false negative and tested
the two datasets were used that dataset Acceptor and Donor.

Conclusion

In the paper described a method of repair on the SVM is used for classifying large data sets. These algorithms perform the selection of relevant data for included as training data and which is not. It is intended to reduce current complexity of the model building process of training. The results show that time spent in the training process is reduced significantly when the
formation models.

ADVANTAGE
  • The proposed method is simple but very significantly reduces the processing time establishment of training data
  • Guidelines in conducting experiments also included a clear and detailed results
    research

DISADVANTAGE
  • In the title does not indicate that this method is a method derived from the method preexisting namely SVM
  • In the first stage was not given a reason as well as the specific number of data sets used because there is only a data instruction set used is small.

SUGGESTION
  • In the chapter mentioned that the method is derived from other methods (SVM) so that readers get a clear picture of the proposed method.
  • Added information about the comparison of the accuracy of the proposed method less than or equal to the other methods, so that further highlight the repair time used in the formation of a more efficient training models.

Comments

Popular posts from this blog

How to choose between the Canon 700D or Nikon D5300

Canon EOS   700D   and Nikon   D5300   is a   digital SLR camera   aimed at   novice photographers .   Nevertheless ,   there are   features   on   both cameras are   much the same with   a more   sophisticated   camera   /   semi - pro,   so it   is good enough to   be used   in various   scene conditions .   From the shape   and size ,   at a second glance   something like the   camera .   But   when viewed   more closely ,   overall   more   compact   Nikon   D5300   a few   millimeters   and   about 100   grams lighter .   The new lens   Nikon   18-55mm   VR   II also   shorter   in   the off condition .   This can be achieved   because   the plastic   material   of the   camera   thinner . In   ...

10 Minuman Tradisional Khas Indonesia

1. Cendol  Merupakan minuman khas Indonesia yang terbuat dari tepung beras, disajikan dengan es parut serta gula merah cair dan santan. Rasa minuman ini manis dan gurih. Di daerah Sunda minuman ini dikenal dengan nama cendol sedangkan di Jawa Tengah dikenal dengan nama es dawet. Berkembang kepercayaan populer dalam masyarakat Indonesia bahwa istilah “cendol” mungkin sekali berasal dari kata “jendol”, yang ditemukan dalam bahasa Sunda, Jawa dan Indonesia; hal ini merujuk sensasi jendolan yang dirasakan ketika butiran cendol melalui mulut kala tengah meminum es cendol. Tepung beras diolah dengan diberi pewarna berwarna hijau dan di cetak melalui saringan khusus, sehingga berbentuk buliran. Pewarna yang digunakan awalnya adalah pewarna alami dari daun pandan, namun saat ini telah digunakan pewarna makanan buatan. Di Sunda cendol dibuat dengan cara mengayak kukusan tepung beras yang diwarnai dengan daun suji dengan ayakan sehingga diperoleh bentuk bulat lonjong yang lancip di ...

An Supervised Artificial Neural Network Method for Sattelite Image Segmentation

Image segmentation   is   an important   step   in image processing   ( image processing).   The main purpose   of   segmentation   is   to   simplify   and   or   to   change the   representation   of   an   image   into a form that   is easier   to   analyze.   Already there are   several   methods of   image segmentation   are   found ,   but   most of   these methods are not   suitable   for   satellite imagery   and   -Method   method   requires   a   knowledge of   the initial   ( a priori   knowledge) .   To   overcome these problems ,   a   satellite   image segmentation   method   is developed   using   an   artificial neural network method   without   learning   ( unsupervised )   called ...