Contact me for internships, Bachelor, Master or PhD thesis (co-)supervision
Currently a post-doc in Bioinformatics working with Ivo Hofacker in the research group TBI, University of Vienna.
Our main goal is to predict RNA structure from the sequence at different level (secondary, pseudo-knotted, tertiary) while integrating different sources of knowledge, such as probing data.
Meanwhile, I am interested in RNA Design problems, from the theoretical side to the practical.
See “RNA Design with Infrared” for more details.
Applications in biotechnology and bio-medical research call for effective strategies to design novel RNAs with very specific properties. Such advanced design tasks require support by computational design tools but at the same time put high demands on their flexibility and expressivity to model the applications-specific requirements. To address such demands, we present the computational framework Infrared. It supports developing advanced customized design tools, which generate RNA sequences with specific properties, often in a few lines of Python code. This text guides the reader in tutorial-format through the development of complex design applications. Thanks to the declarative, compositional approach of Infrared, we can describe this development as step-by-step extension of an elementary design task. Thus, we start with generating sequences that are compatible with a single RNA structure and go all the way to RNA design targeting complex positive and negative design objectives with respect to single or even multiple target structures. Finally, we present a ’real-world’ application of computational RNA design of a biotechnological device. We use Infrared to generate design candidates of an artificial AND-riboswitch, which could activate gene expression (only) in the simultaneous presence of two different metabolites.
@incollection{yao:hal-03711828,author={Yao, Hua-Ting and Ponty, Yann and Will, Sebastian},title={Developing Complex RNA Design Applications in the Infrared Framework},booktitle={RNA Folding: Methods and Protocols},publisher={Springer US},year={2024},editor={Lorenz, Ronny},pages={285--313},address={New York, NY},isbn={978-1-0716-3519-3},doi={10.1007/978-1-0716-3519-3_12},hal_id={hal-03711828},hal_version={v1},url={https://hal.archives-ouvertes.fr/hal-03711828},}
Journal
Infrared: a declarative tree decomposition-powered framework for bioinformatics
__Motivation:__ Many bioinformatics problems can be approached as optimization or controlled sampling tasks, and solved exactly and efficiently using Dynamic Programming (DP). However, such exact methods are typically tailored towards specific settings, complex to develop, and hard to implement and adapt to problem variations.
__Methods:__ We introduce the Infrared framework to overcome such hindrances for a large class of problems. Its under‐ lying paradigm is tailored toward problems that can be declaratively formalized as sparse feature networks, a generalization of constraint networks. Classic Boolean constraints specify a search space, consisting of putative solutions whose evaluation is performed through a combination of features. Problems are then solved using generic cluster tree elimination algorithms over a tree decomposition of the feature network. Their overall complexities are linear on the number of variables, and only exponential in the treewidth of the feature network. For sparse feature networks, associated with low to moderate treewidths, these algorithms allow to find optimal solutions, or generate controlled samples, with practical empirical efficiency.
__Results:__ Implementing these methods, the Infrared software allows Python programmers to rapidly develop exact optimization and sampling applications based on a tree decomposition-based efficient processing. Instead of directly coding specialized algorithms, problems are declaratively modeled as sets of variables over finite domains, whose dependencies are captured by constraints and functions. Such models are then automatically solved by generic DP algorithms. To illustrate the applicability of Infrared in bioinformatics and guide new users, we model and discuss variants of bioinformatics applications. We provide reimplementations and extensions of methods for RNA design, RNA sequence-structure alignment, parsimony-driven inference of ancestral traits in phylogenetic trees/networks, and design of coding sequences. Moreover, we demonstrate multidimensional Boltzmann sampling. These applications of the framework—together with our novel results—underline the practical relevance of Infrared. Remarkably, the achieved complexities are typically equivalent to the ones of specialized algorithms and implementations.
__Availability:__ Infrared is available at [https://amibio.gitlabpages.inria.fr/Infrared/](https://amibio.gitlabpages.inria.fr/Infrared/) with extensive documentation, including various usage examples and API reference; it can be installed using Conda or from source.
@article{yao:hal-04211173,author={Yao, Hua-Ting and Marchand, Bertrand and Berkemer, Sarah J. and Ponty, Yann and Will, Sebastian},title={{Infrared: a declarative tree decomposition-powered framework for bioinformatics}},journal={{Algorithms for Molecular Biology}},year={2024},doi={10.1186/s13015-024-00258-2},hal_id={hal-04211173},hal_version={v2},keywords={Bioinformatics Fixed-parameter tractable algorithms Tree decomposition Boltzmann sampling Network phylogeny RNA sequence design RNA alignment Pseudoknots ; Bioinformatics ; Fixed-parameter tractable algorithms ; Tree decomposition ; Boltzmann sampling ; Network phylogeny ; RNA sequence design ; RNA alignment ; Pseudoknots},publisher={{BioMed Central}},url={https://inria.hal.science/hal-04211173},}
The design of RNA sequences with desired structural properties presents a challenging computational problem with promising applications in biotechnology and biomedicine. Most regulatory RNAs function by forming RNA-RNA interactions, e.g., in order to regulate mRNA expression. It is therefore natural to consider problems where a sequence is designed to form a desired RNA-RNA interaction and switch between structures upon binding. This contribution demonstrates the use of the Infrared framework to design interacting sequences. Specifically, we consider the regulation of the rpoS mRNA by the sRNA DsrA and design artificial 5’UTRs that place a downstream protein coding gene under control of DsrA. The design process is explained step-by-step in a Jupyter notebook, accompanied by Python code. The text discusses setting up design constraints for sampling sequences in Infrared, computing quality measures, constructing a suitable cost function, as well as the optimization procedure. We show that not only thermodynamic, but also kinetic folding features can be relevant. Kinetics of interaction formation can be estimated efficiently using the RRIkinDP tool, and the chapter explains how to include kinetic folding features from RRIkinDP directly in the cost function. The protocol implemented in our Jupyter notebook can easily be extended to consider additional requirements or adapted to novel design scenarios.
@incollection{waldl:hal-04517643,author={Waldl, Maria and Yao, Hua-Ting and Hofacker, Ivo},title={{Sequence design for RNA-RNA interactions}},booktitle={RNA Design: Methods and Protocols},publisher={Springer},year={2024},pages={1--16},month=mar,doi={10.1007/978-1-0716-4079-1_1},hal_id={hal-04517643},hal_version={v1},keywords={RNA sequence design RNA-RNA interactions RNA structure RNA folding kinetics ; RNA sequence design ; RNA-RNA interactions ; RNA structure ; RNA folding kinetics},url={https://hal.science/hal-04517643},}
Conference
Old dog, new tricks: Exact seeding strategy improves RNA design performances
In RECOMB 2025 - 29th International Conference of Research in Computational Molecular Biology, Mar 2025
The Inverse Folding problem involves identifying RNA sequences that adopt a target structure with respect to free-energy minimization, i.e. preferential to all alternative structures. The problem has historically been regarded as challenging, largely due to its proven NP-completeness of an extended version where the base pair maximization energy model is used. In contrast, it has recently been shown that a large subset called m-separable structures, notably including those comprising helices of length 3+, can be solved in linear-time within the same energy model. This permits not only the identification of a single solution, but also the characterization of a language of solutions.
In this work, we seek to describe the “hardness” of Inverse Folding, bridging (at least heuristically) the gap between a simplified energy model and a more realistic Turner energy model. We used LinearBPDesign to generate seed sequences for RNAinverse, thereby improving the design process in a Turner energy model. To this end, we extended LinearBPDesign to accommodate biseparability and to handle non- or high modulo separable structures by minimalist addition of base pairs.
Our study suggests that seeds generated by LinearBPDesign capture long-range interactions, thereby improving the performance of RNAinverse compared to seed focusing on refining the energy model itself. Most surprisingly, a significant number of LinearBPDesign seeds uniquely fold into the target structure in the Turner model, especially when helices are at least of length 2. This observation suggests that the “hardness” of design may arise from the intrinsic properties of the structures themselves.
@inproceedings{boury:hal-04756160,author={Boury, Th{\'e}o and Sidl, Leonhard and Hofacker, Ivo L. and Ponty, Yann and Yao, Hua-Ting},title={{Old dog, new tricks: Exact seeding strategy improves RNA design performances}},booktitle={{RECOMB 2025 - 29th International Conference of Research in Computational Molecular Biology}},year={2025},address={Seoul, South Korea},doi={10.1007/978-3-031-90252-9_9},hal_id={hal-04756160},hal_version={v2},keywords={RNA design ; RNA secondary structure ; Dynamic programming ; Sampling ; RNA design RNA secondary structure Dynamic programming Sampling},url={https://hal.science/hal-04756160},}