-
Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique
- Back
Metadata
Document Title
Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique
Author
Pasupa K., Rathasamuth W., Tongsima S.
Name from Authors Collection
Affiliations
Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok, 10520, Thailand; National Biobank of Thailand, National Science and Technology Development Agency, Khong Luang, 12120, Thailand
Type
Article
Source Title
BMC Bioinformatics
ISSN
14712105
Year
2020
Volume
21
Issue
1
Open Access
All Open Access, Gold, Green
Publisher
BioMed Central Ltd.
DOI
10.1186/s12859-020-3471-4
Format
Abstract
Background: The number of porcine Single Nucleotide Polymorphisms (SNPs) used in genetic association studies is very large, suitable for statistical testing. However, in breed classification problem, one needs to have a much smaller porcine-classifying SNPs (PCSNPs) set that could accurately classify pigs into different breeds. This study attempted to find such PCSNPs by using several combinations of feature selection and classification methods. We experimented with different combinations of feature selection methods including information gain, conventional as well as modified genetic algorithms, and our developed frequency feature selection method in combination with a common classification method, Support Vector Machine, to evaluate the method's performance. Experiments were conducted on a comprehensive data set containing SNPs from native pigs from America, Europe, Africa, and Asia including Chinese breeds, Vietnamese breeds, and hybrid breeds from Thailand. Results: The best combination of feature selection methods - information gain, modified genetic algorithm, and frequency feature selection hybrid - was able to reduce the number of possible PCSNPs to only 1.62% (164 PCSNPs) of the total number of SNPs (10,210 SNPs) while maintaining a high classification accuracy (95.12%). Moreover, the near-identical performance of this PCSNPs set to those of bigger data sets as well as even the entire data set. Moreover, most PCSNPs were well-matched to a set of 94 genes in the PANTHER pathway, conforming to a suggestion by the Porcine Genomic Sequencing Initiative. Conclusions: The best hybrid method truly provided a sufficiently small number of porcine SNPs that accurately classified swine breeds. © 2020 The Author(s).
Industrial Classification
Knowledge Taxonomy Level 1
Knowledge Taxonomy Level 2
Knowledge Taxonomy Level 3
Funding Sponsor
King Mongkut's Institute of Technology Ladkrabang
License
CC BY
Rights
N/A
Publication Source
Scopus