A selection of computational methods that we have either developed or used in our research. For a more comprehensive review of methods, see Tanudisastro et al. (2024).
Short-read genotyping methods
Short-read novel locus detection methods
Long-read genotyping methods
There are often multiple ways to define a given repeat locus. For example, defining a locus as a stretch of perfectly repeting motifs tends to result in a narrower locus than strategies allowing for interruptions. In coding regions the locus boundary might be chosen to align with the reading frame.
There are often multiple "correct" ways to define a given repeat locus, however careful consideration must be made to the downstream uses of the data. In particular, genotyping accuracy can be affected by the choice of locus definition.
The loci in STRchive were defined to be broader, allowing for interruptions, and with consideration to the biological context and clinical utility of the locus. This strategy increases the chance that the locus will overlap with a relevant variant call when STRchive is used to annotate a VCF file. It is also the preferred approach to defining loci for improved genotyping accuracy with TRGT. In contrast, ExpansionHunter tends to perform better with narrower loci that exclude repeat interruptions. For this reason, repeat definitions used in gnomAD tend to be narrower than those used in STRchive.