IEEE BIBM 2022

Yejin Kan, Dongyeon Kim, Gangman Yi

Patrol Route

A Novel Gap-Filling Method Based on Hybrid Read Information Analysis

De novo assembly, which discovers the entire nucleotide sequence by reconstructing the reads resulting from next-generation sequencing, is a subject that must be studied for genetic information analysis. The recombination of reads is performed in several steps, but gaps that cannot be resolved occur even after scaffolding. Gap-filling is performed as the last assembly stage to fill the unidentified regions called gaps, significantly improving overall assembly performance. We propose a gap-filling method using hybrid reads to resolve gaps based on sequence similarity estimation and graph searches. The proposed method consists of three key steps: extracting the candidate sequence, estimating similarity, and filling the gaps based on the graph. Hybrid reads extract sequences with more accurate information, and candidate sequences corresponding to noise are effectively removed based on the similarity estimation. In conclusion, a graph search using statistical information derives a final sequence that guarantees high coverage, resolves gaps, reduces misassemblies, and improves accuracy.