Skip to main content
SearchLoginLogin or Signup

LINflow: a data driven approach to genome taxonomic identification

LINflow is designed to automate LIN assignments to genomes using their data, i.e., nucleotide content, by considering one or more organism comparison measurements.

Published onNov 21, 2023
LINflow: a data driven approach to genome taxonomic identification
·

LINflow: a data driven approach to genome taxonomic identification

Reza Mazloom1, Parul Sharma2,3, Kassaye Belay2,3, Boris A. Vinatzer2, Lenwood S. Heath1
1 Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States
2 School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, Virginia, United States
3 Graduate Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, Virginia, United States

 

In the world of organisms, the purpose of taxonomy is to systematically classify and cluster groups of similar organisms, name them, and facilitate their identification. Despite the accuracy and resilience of recognized taxonomies until recently, such as those used by NCBI or GTDB, their curative nature leaves a lot to be desired when 1) encountering novel or mutated organisms 2) classifying organisms with more than one potential species affiliation and 3) subspecies-level resolution is required. Life Identification Numbers (LINs) were introduced as a data driven classification system using genome similarity as the exclusive criterion. An assigned LIN will cluster an organism into a finite number of clusters which depends on multiple similarity thresholds bundled into a scheme, creating cluster boundary sizes, in addition to organisms already in the system. LINflow is designed to automate LIN assignments to genomes using their data, i.e., nucleotide content, by considering one or more organism comparison measurements. This allows users to easily tune the LIN system to their needs by customizing the scheme, combining measurements with varying hyper-parameters, change LIN assignment process, and eventually parallelize and optimize the system. LINbase.org and genomeRxiv are two customized versions of LINflow available through the web designed to work with all Prokaryotes. Our efforts to build a similar system to work with fungi have also been successful with further development being done on viruses.

 

Comments
0
comment
No comments here
Why not start the discussion?