The Institute of Environmental Science and Research (ESR) has changed its name to the New Zealand Institute for Public Health and Forensic Science (PHF Science) as of 1 July 2025. The website address is now www.phfscience.nz. Visitors are automatically redirected to the new address. Please check and update any links and bookmarks.

Forensic STR allele extraction using a machine learning paradigm.

Abstract

We present a machine learning approach to short tandem repeat (STR) sequence detection and extraction from massively parallel sequencing data called Fragsifier. Using this approach, STRs are detected on each read by first locating the longest repeat stretches followed by locus prediction using k-mers in a machine learning sequence model. This is followed by reference flanking sequence alignment to determine precise STR boundaries. We show that Fragsifier produces genotypes that are concordant with profiles obtained using capillary electrophoresis (CE), and also compared the results with that of STRait Razor and the ForenSeq UAS. The data pre-processing and training of the sequence classifier is readily scripted, allowing the analyst to experiment with different thresholds, datasets and loci of interest, and different machine learning models.

view journal