LINflow is designed to automate LIN assignments to genomes using their data, i.e., nucleotide content, by considering one or more organism comparison measurements.
In the world of organisms, the purpose of taxonomy is to systematically classify and cluster groups of similar organisms, name them, and facilitate their identification. Despite the accuracy and resilience of recognized taxonomies until recently, such as those used by NCBI or GTDB, their curative nature leaves a lot to be desired when 1) encountering novel or mutated organisms 2) classifying organisms with more than one potential species affiliation and 3) subspecies-level resolution is required. Life Identification Numbers (LINs) were introduced as a data driven classification system using genome similarity as the exclusive criterion. An assigned LIN will cluster an organism into a finite number of clusters which depends on multiple similarity thresholds bundled into a scheme, creating cluster boundary sizes, in addition to organisms already in the system. LINflow is designed to automate LIN assignments to genomes using their data, i.e., nucleotide content, by considering one or more organism comparison measurements. This allows users to easily tune the LIN system to their needs by customizing the scheme, combining measurements with varying hyper-parameters, change LIN assignment process, and eventually parallelize and optimize the system. LINbase.org and genomeRxiv are two customized versions of LINflow available through the web designed to work with all Prokaryotes. Our efforts to build a similar system to work with fungi have also been successful with further development being done on viruses.