Publications | Hua-Ting Yao

2026

Preprint
Rational design of mechanically active RNAs: de novo engineering of functional exoribonuclease-resistant RNAs

Jule Walter, Leonhard Sidl, Katrin Gutenbrunner, Denis Skibinski, and 5 more authors

bioRxiv, 2026

Mechanically active RNAs represent an emerging class of biomolecules whose function derives from resisting molecular forces. Among them, exoribonuclease-resistant RNAs (xr-RNAs) achieve this by folding into a ring-like topology that physically blocks 5’ → 3’ degra-dation. However, despite years of structural insight, the rational design of such mechanically functional RNA devices has remained elusive. Here, we describe a mechanics-aware RNA design approach that enables de novo engineering of functional xrRNAs. We first identify structural determinants of force resistance by perturbing pseudoknot architecture in a model xrRNA and quantifying resulting efficiencies in the stalling of exoribonuclease XRN1. We then implement these rules in a design framework that integrates explicit topological constraints with molecular dynamics-guided optimization. The resulting synthetic xrRNAs reproduce the ring-like architecture and stall exoribonuclease XRN1 with wild-type-like efficiency. Our top-performing constructs exhibit minimal sequence similarity to known xrRNAs and evade detection by covariance models, yet remain fully functional in vitro. Together, our results show that mechanical function can be rationally designed independent of evolutionary ancestry, laying the groundwork for the design of RNA elements that modulate decay and fine-tune the mechanical stability of engineered transcripts.
@article{walter2026rational, title = {Rational design of mechanically active RNAs: de novo engineering of functional exoribonuclease-resistant RNAs}, author = {Walter, Jule and Sidl, Leonhard and Gutenbrunner, Katrin and Skibinski, Denis and Kolberg, Tim and Hofacker, Ivo L and Yao, Hua-Ting and M{\"o}rl, Mario and Wolfinger, Michael T}, journal = {bioRxiv}, pages = {2026--01}, year = {2026}, publisher = {Cold Spring Harbor Laboratory}, doi = {10.64898/2026.01.08.698366}, }

2025

Journal
Computational complexity, algorithmic scope, and evolution

Leonhard Sidl^†, Maximilian Faissner^†, Manuel Uhlir^†, Cristian A Velandia-Huerto, and 4 more authors

Journal of Physics: Complexity, Mar 2025

Biological systems are widely regarded as performing computations. It is much less clear, however, what exactly is computed and how biological computation fits within the framework of standard computer science. Here we explore the idea that evolution confines biological computation to subsets of instances that can be solved efficiently with algorithms that are ‘hardcoded’ in the system itself. We use RNA secondary structure prediction as a simple surrogate for developmental programs to demonstrate that the salient features of the genotype–phenoabbr map remain intact even if ‘simpler’ algorithms are employed that correctly compute the structures only for small subsets of instances, albeit quantitative differences depending on the choice of alternative algorithms can be observed.
@article{sidl:hal-04715891, author = {Sidl, Leonhard and Faissner, Maximilian and Uhlir, Manuel and Velandia-Huerto, Cristian A and Waldl, Maria and Yao, Hua-Ting and Hofacker, Ivo L and Stadler, Peter F}, title = {{Computational complexity, algorithmic scope, and evolution}}, journal = {Journal of Physics: Complexity}, year = {2025}, volume = {6}, number = {1}, pages = {015013}, month = mar, doi = {10.1088/2632-072X/adb928}, hal_id = {hal-04715891}, hal_version = {v2}, keywords = {Computational Complexity ; Biological Computation ; RNA Folding ; Developmental Program ; Genotype-Phenoabbr Map}, publisher = {IOP Publishing}, url = {https://hal.science/hal-04715891}, }
Conference
Old dog, new tricks: Exact seeding strategy improves RNA design performances

Théo Boury, Leonhard Sidl, Ivo L. Hofacker, Yann Ponty, and 1 more author

In RECOMB 2025 - 29th International Conference of Research in Computational Molecular Biology, Mar 2025

The Inverse Folding problem involves identifying RNA sequences that adopt a target structure with respect to free-energy minimization, i.e. preferential to all alternative structures. The problem has historically been regarded as challenging, largely due to its proven NP-completeness of an extended version where the base pair maximization energy model is used. In contrast, it has recently been shown that a large subset called m-separable structures, notably including those comprising helices of length 3+, can be solved in linear-time within the same energy model. This permits not only the identification of a single solution, but also the characterization of a language of solutions. In this work, we seek to describe the “hardness” of Inverse Folding, bridging (at least heuristically) the gap between a simplified energy model and a more realistic Turner energy model. We used LinearBPDesign to generate seed sequences for RNAinverse, thereby improving the design process in a Turner energy model. To this end, we extended LinearBPDesign to accommodate biseparability and to handle non- or high modulo separable structures by minimalist addition of base pairs. Our study suggests that seeds generated by LinearBPDesign capture long-range interactions, thereby improving the performance of RNAinverse compared to seed focusing on refining the energy model itself. Most surprisingly, a significant number of LinearBPDesign seeds uniquely fold into the target structure in the Turner model, especially when helices are at least of length 2. This observation suggests that the “hardness” of design may arise from the intrinsic properties of the structures themselves.
@inproceedings{boury:hal-04756160, author = {Boury, Th{\'e}o and Sidl, Leonhard and Hofacker, Ivo L. and Ponty, Yann and Yao, Hua-Ting}, title = {{Old dog, new tricks: Exact seeding strategy improves RNA design performances}}, booktitle = {{RECOMB 2025 - 29th International Conference of Research in Computational Molecular Biology}}, year = {2025}, address = {Seoul, South Korea}, doi = {10.1007/978-3-031-90252-9_9}, hal_id = {hal-04756160}, hal_version = {v2}, keywords = {RNA design ; RNA secondary structure ; Dynamic programming ; Sampling ; RNA design RNA secondary structure Dynamic programming Sampling}, url = {https://hal.science/hal-04756160}, }
Preprint
Undesignable motifs in structural RNAs and combinatorial consequences

Hua-Ting Yao, Cedric Chauve, Mireille Regnier, and Yann Ponty

May 2025

working paper or preprint

RNA design aims at constructing RiboNucleic Acids (RNA) sequences that perform a predefined biological function, usually modeled by multiple constraints on the sequence and structure level. In its most popular setting, called the inverse folding problem, designed RNAs should adopt a predefined target secondary structure, preferentially to any alternative structure. It was previously observed that some secondary structures are undesignable, i.e. no RNA sequence can fold uniquely into the target structure while satisfying some criterion measuring how preferential this folding is compared to alternative conformations. We show that the proportion of designable secondary structures decreases exponentially with the size of the target secondary structure, for various popular combinations of energy models and design objectives. This exponential decay is, at least in part, due to the existence of undesignable motifs, which can be generically constructed, and jointly analyzed to yield asymptotic upper-bounds on the number of designable structures. Finally, we define a lower bound of the minimal ensemble defect of a secondary structure. We show that, across uniformly distributed secondary structures, such lower bound admits a normal limiting distribution whose two parameters, the expected value and the variance, both grow linearly with the size of secondary structure.
@unpublished{yao:hal-05067902, author = {Yao, Hua-Ting and Chauve, Cedric and Regnier, Mireille and Ponty, Yann}, title = {{Undesignable motifs in structural RNAs and combinatorial consequences}}, note = {working paper or preprint}, month = may, year = {2025}, hal_id = {hal-05067902}, hal_version = {v1}, keywords = {RNA design ; Secondary structure ; RNA Evolution ; Enumerative Combinatorics}, url = {https://hal.science/hal-05067902}, }
Conference
Integrating High-Throughput RNA-RNA Interaction Data into RNA Secondary Structure Prediction

Denis Skibinski^†, Thomas Spicher^†, Leonhard Sidl^†, Paulína Holotová, and 7 more authors

In International Symposium on Bioinformatics Research and Applications (ISBRA) 2025, Apr 2025

In recent years, several methods for detecting RNA-RNA interactions have become available that use a combination of crosslinking, ligation, and sequencing of the resulting chimeric reads. In principle, such data also convey information on intramolecular helices. They are, however, not accurate enough to identify base pairs directly. Instead, only regions of direct contacts can be inferred. Here, we show that such data can be incorporated as pseudo-energies into RNA secondary structure prediction algorithms by assigning a bonus term to all potential pairs between crosslinked intervals. Using simulated data, we show that given sufficient coverage, such data can push the accuracy of the predicted structure to a base pair-wise MCC of above 90%. Moreover, we observe that the beneficial effect of such interval-wise pseudo-energies is quite robust w.r.t. the length of the interval and the value of the bonus term, but depends strongly on the fraction of the sequence that is covered by significant interaction data.
@inproceedings{skibinski:hal-05032055, author = {Skibinski, Denis and Spicher, Thomas and Sidl, Leonhard and Holotov{\'a}, Paul{\'i}na and Pan, Yingjie and Faissner, Maximilian and Velandia-Huerto, Cristian A and Lorenz, Ronny and Waldl, Maria and Yao, Hua-Ting and Stadler, Peter F}, title = {{Integrating High-Throughput RNA-RNA Interaction Data into RNA Secondary Structure Prediction}}, booktitle = {International Symposium on Bioinformatics Research and Applications (ISBRA) 2025}, year = {2025}, month = apr, doi = {10.1007/978-981-95-0695-8_13}, hal_id = {hal-05032055}, hal_version = {v1}, keywords = {RNA folding algorithms ; Pseudo-energy ; Dynamic programming ; RNA crosslinking}, url = {https://hal.science/hal-05032055}, }

2024

Book Chapter
Developing Complex RNA Design Applications in the Infrared Framework

Hua-Ting Yao, Yann Ponty, and Sebastian Will

In RNA Folding: Methods and Protocols, Apr 2024

Applications in biotechnology and bio-medical research call for effective strategies to design novel RNAs with very specific properties. Such advanced design tasks require support by computational design tools but at the same time put high demands on their flexibility and expressivity to model the applications-specific requirements. To address such demands, we present the computational framework Infrared. It supports developing advanced customized design tools, which generate RNA sequences with specific properties, often in a few lines of Python code. This text guides the reader in tutorial-format through the development of complex design applications. Thanks to the declarative, compositional approach of Infrared, we can describe this development as step-by-step extension of an elementary design task. Thus, we start with generating sequences that are compatible with a single RNA structure and go all the way to RNA design targeting complex positive and negative design objectives with respect to single or even multiple target structures. Finally, we present a ’real-world’ application of computational RNA design of a biotechnological device. We use Infrared to generate design candidates of an artificial AND-riboswitch, which could activate gene expression (only) in the simultaneous presence of two different metabolites.
@incollection{yao:hal-03711828, author = {Yao, Hua-Ting and Ponty, Yann and Will, Sebastian}, title = {Developing Complex RNA Design Applications in the Infrared Framework}, booktitle = {RNA Folding: Methods and Protocols}, publisher = {Springer US}, year = {2024}, editor = {Lorenz, Ronny}, pages = {285--313}, address = {New York, NY}, isbn = {978-1-0716-3519-3}, doi = {10.1007/978-1-0716-3519-3_12}, hal_id = {hal-03711828}, hal_version = {v1}, url = {https://hal.archives-ouvertes.fr/hal-03711828}, }
Journal
Infrared: a declarative tree decomposition-powered framework for bioinformatics

Hua-Ting Yao, Bertrand Marchand, Sarah J. Berkemer, Yann Ponty, and 1 more author

Algorithms for Molecular Biology, Apr 2024

__Motivation:__ Many bioinformatics problems can be approached as optimization or controlled sampling tasks, and solved exactly and efficiently using Dynamic Programming (DP). However, such exact methods are typically tailored towards specific settings, complex to develop, and hard to implement and adapt to problem variations. __Methods:__ We introduce the Infrared framework to overcome such hindrances for a large class of problems. Its under‐ lying paradigm is tailored toward problems that can be declaratively formalized as sparse feature networks, a generalization of constraint networks. Classic Boolean constraints specify a search space, consisting of putative solutions whose evaluation is performed through a combination of features. Problems are then solved using generic cluster tree elimination algorithms over a tree decomposition of the feature network. Their overall complexities are linear on the number of variables, and only exponential in the treewidth of the feature network. For sparse feature networks, associated with low to moderate treewidths, these algorithms allow to find optimal solutions, or generate controlled samples, with practical empirical efficiency. __Results:__ Implementing these methods, the Infrared software allows Python programmers to rapidly develop exact optimization and sampling applications based on a tree decomposition-based efficient processing. Instead of directly coding specialized algorithms, problems are declaratively modeled as sets of variables over finite domains, whose dependencies are captured by constraints and functions. Such models are then automatically solved by generic DP algorithms. To illustrate the applicability of Infrared in bioinformatics and guide new users, we model and discuss variants of bioinformatics applications. We provide reimplementations and extensions of methods for RNA design, RNA sequence-structure alignment, parsimony-driven inference of ancestral traits in phylogenetic trees/networks, and design of coding sequences. Moreover, we demonstrate multidimensional Boltzmann sampling. These applications of the framework—together with our novel results—underline the practical relevance of Infrared. Remarkably, the achieved complexities are typically equivalent to the ones of specialized algorithms and implementations. __Availability:__ Infrared is available at [https://amibio.gitlabpages.inria.fr/Infrared/](https://amibio.gitlabpages.inria.fr/Infrared/) with extensive documentation, including various usage examples and API reference; it can be installed using Conda or from source.
@article{yao:hal-04211173, author = {Yao, Hua-Ting and Marchand, Bertrand and Berkemer, Sarah J. and Ponty, Yann and Will, Sebastian}, title = {{Infrared: a declarative tree decomposition-powered framework for bioinformatics}}, journal = {{Algorithms for Molecular Biology}}, year = {2024}, doi = {10.1186/s13015-024-00258-2}, hal_id = {hal-04211173}, hal_version = {v2}, keywords = {Bioinformatics Fixed-parameter tractable algorithms Tree decomposition Boltzmann sampling Network phylogeny RNA sequence design RNA alignment Pseudoknots ; Bioinformatics ; Fixed-parameter tractable algorithms ; Tree decomposition ; Boltzmann sampling ; Network phylogeny ; RNA sequence design ; RNA alignment ; Pseudoknots}, publisher = {{BioMed Central}}, url = {https://inria.hal.science/hal-04211173}, }
Book Chapter
Sequence design for RNA-RNA interactions

Maria Waldl^†, Hua-Ting Yao^†, and Ivo Hofacker

In RNA Design: Methods and Protocols, Mar 2024

The design of RNA sequences with desired structural properties presents a challenging computational problem with promising applications in biotechnology and biomedicine. Most regulatory RNAs function by forming RNA-RNA interactions, e.g., in order to regulate mRNA expression. It is therefore natural to consider problems where a sequence is designed to form a desired RNA-RNA interaction and switch between structures upon binding. This contribution demonstrates the use of the Infrared framework to design interacting sequences. Specifically, we consider the regulation of the rpoS mRNA by the sRNA DsrA and design artificial 5’UTRs that place a downstream protein coding gene under control of DsrA. The design process is explained step-by-step in a Jupyter notebook, accompanied by Python code. The text discusses setting up design constraints for sampling sequences in Infrared, computing quality measures, constructing a suitable cost function, as well as the optimization procedure. We show that not only thermodynamic, but also kinetic folding features can be relevant. Kinetics of interaction formation can be estimated efficiently using the RRIkinDP tool, and the chapter explains how to include kinetic folding features from RRIkinDP directly in the cost function. The protocol implemented in our Jupyter notebook can easily be extended to consider additional requirements or adapted to novel design scenarios.
@incollection{waldl:hal-04517643, author = {Waldl, Maria and Yao, Hua-Ting and Hofacker, Ivo}, title = {{Sequence design for RNA-RNA interactions}}, booktitle = {RNA Design: Methods and Protocols}, publisher = {Springer}, year = {2024}, pages = {1--16}, month = mar, doi = {10.1007/978-1-0716-4079-1_1}, hal_id = {hal-04517643}, hal_version = {v1}, keywords = {RNA sequence design RNA-RNA interactions RNA structure RNA folding kinetics ; RNA sequence design ; RNA-RNA interactions ; RNA structure ; RNA folding kinetics}, url = {https://hal.science/hal-04517643}, }
Journal
Phylogenetic and Chemical Probing Information as Soft Constraints in RNA Secondary Structure Prediction

Sarah Löhneysen, Thomas Spicher, Yuliia Varenyk, Hua-Ting Yao, and 3 more authors

Journal of Computational Biology, Jun 2024

Extrinsic, experimental information can be incorporated into thermodynamics-based RNA folding algorithms in the form of pseudo-energies. Evolutionary conservation of RNA secondary structure elements is detectable in alignments of phylogenetically related sequences and provides evidence for the presence of certain base pairs that can also be converted into pseudo-energy contributions. We show that the centroid base pairs computed from a consensus folding model such as RNAalifold result in a substantial improvement of the prediction accuracy for single sequences. Evidence for specific base pairs turns out to be more informative than a position-wise profile for the conservation of the pairing status. A comparison with chemical probing data, furthermore, strongly suggests that phylogenetic base pairing data are more informative than position-specific data on (un)pairedness as obtained from chemical probing experiments. In this context we demonstrate, in addition, that the conversion of signal from probing data into pseudo-energies is possible using thermodynamic structure predictions as a reference instead of known RNA structures.
@article{Loehneysen2024, author = {von Löhneysen, Sarah and Spicher, Thomas and Varenyk, Yuliia and Yao, Hua-Ting and Lorenz, Ronny and Hofacker, Ivo and Stadler, Peter F.}, journal = {Journal of Computational Biology}, title = {Phylogenetic and Chemical Probing Information as Soft Constraints in RNA Secondary Structure Prediction}, year = {2024}, issn = {1557-8666}, month = jun, number = {6}, pages = {549--563}, volume = {31}, doi = {10.1089/cmb.2024.0519}, publisher = {Mary Ann Liebert Inc}, }

2023

Journal
Mono-valent salt corrections for RNA secondary structures in the ViennaRNA package

Hua-Ting Yao, Ronny Lorenz, Ivo L Hofacker, and Peter F Stadler

Algorithms for Molecular Biology, Jul 2023

__Background__: RNA features a highly negatively charged phosphate backbone that attracts a of cloud counter-ions that reduce the electrostatic repulsion in a concentration dependent manner. Ion concentrations thus have a large influence on folding and stability of RNA structures. Despite their well-documented effects, salt effects are not handled by currently available secondary stucture prediction algorithms. Combining Debye-Hückel potentials for line charges and Manning’s counter-ion condensation theory, Einert et al. [Biophys. J. 100: 2745-2753 (2011)] modeled the energetic effects contributions monovalent cations on loops and helices. __Results__: The model of Einert et al. is adapted to match the structure of the dynamic programming recursion of RNA secondary structure prediction algorithms. An empirical term describing the dependence salt dependence of the duplex initiation energy is added to improve co-folding predictions for two or more RNA strands. The slightly modified model is implemented in the ViennaRNA package in such way that only the energy parameters but not the algorithmic structure is affected. A comparison with data from the literature show that predicted free energies and melting temperatures are in reasonable agreement with experiments. Conclusion: The new feature in the ViennaRNA package makes it possible to study effects of salt concentrations on RNA folding in a systematic manner. Strictly speaking, the model pertains only to mono-valent cations, and thus covers the most important parameter, i.e., the NaCl concentration. It remains a question for future research to what extent unspecific effects of bi-and tri-valent cations can be approximated in a similar manner. __Availability__: Corrections for the concentration of monovalent cations are available in the ViennaRNA package starting from version 2.6.0.
@article{yao:hal-04062134, author = {Yao, Hua-Ting and Lorenz, Ronny and Hofacker, Ivo L and Stadler, Peter F}, title = {{Mono-valent salt corrections for RNA secondary structures in the ViennaRNA package}}, journal = {{Algorithms for Molecular Biology}}, year = {2023}, volume = {18}, pages = {8}, month = jul, doi = {10.1186/s13015-023-00236-0}, hal_id = {hal-04062134}, hal_version = {v2}, keywords = {RNA Secondary Structure Salt concentration Debye-H{\"u}ckel potential ; RNA Secondary Structure ; Salt concentration ; Debye-H{\"u}ckel potential ; RNA secondary structure Salt concentration Debye-H{\"u}ckel potential ; RNA secondary structure}, publisher = {{BioMed Central}}, url = {https://hal.science/hal-04062134}, }
Conference
Phylogenetic Information as Soft Constraints in RNA Secondary Structure Prediction

Sarah Löhneysen, Thomas Spicher, Yuliia Varenyk, Hua-Ting Yao, and 3 more authors

In Bioinformatics Research and Applications, Jul 2023

Pseudo-energies are a generic method to incorporate extrinsic information into energy-directed RNA secondary structure predictions. Consensus structures of RNA families, usually predicted from multiple sequence alignments, can be treated as soft constraints in this manner. In this contribution we first revisit the theoretical framework and then show that pseudo-energies for the centroid base pairs of the consensus structure result in a substantial increase in folding accuracy. In contrast, only a moderate improvement can be achieved if only the information that a base is predominantly paired is utilized.
@incollection{Loehneysen2023, author = {von Löhneysen, Sarah and Spicher, Thomas and Varenyk, Yuliia and Yao, Hua-Ting and Lorenz, Ronny and Hofacker, Ivo and Stadler, Peter F.}, booktitle = {Bioinformatics Research and Applications}, publisher = {Springer Nature Singapore}, title = {Phylogenetic Information as~Soft Constraints in~{RNA} Secondary Structure Prediction}, year = {2023}, pages = {267--279}, doi = {10.1007/978-981-99-7074-2_21}, }

2021

Conference
Taming Disruptive Base Pairs to Reconcile Positive and Negative Structural Design of RNA

Hua-Ting Yao, Jérôme Waldispühl, Yann Ponty, and Sebastian Will

In RECOMB 2021 - 25th international conference on research in computational molecular biology, Apr 2021

The negative structural design of RNAs, also called Inverse folding, consists in building a synthetic nucleotides sequence adopting a targeted secondary structure as its Minimum Free Energy (MFE) structure. Computationally an NP hard problem, it is mostly addressed as an optimization task and solved using (meta-)heuristics. Existing methods are frequently challenged by demanding instances, and typically produce a single design, hindering practical applications of design, where multiple candidates are desirable to circumvent the idealized nature of design models. In this work, we introduce RNA POsitive and Negative Design (RNAPOND), a sampling approach which generates design candidates exactly from a well-defined distribution influenced by positive design objectives, including affinity towards the target and GC-content. Negative design principles are captured by an original iterative approach, where a subset of Disruptive Base Pairs (DPBs) are identified at each step, and subsequently forbidden from pairing by the introduction of suitable constraints. Despite the NP-hardness of the associated decision problem, we propose a combinatorial sampling algorithm which is Fixed Parameter Tractable (FPT) for the tree-width of the constraint network. Our algorithm, coupled with a suitable rejection step and an automated inference of DPBs, achieves a similar or better level of success in comparison to the state of the art, while allowing for the generation of diverse designs. Interestingly, it also automatically recovers some of the strategies used by practitioners of RNA design. RNAPOND is an open source project, available at: https://gitlab.inria.fr/amibio/RNAPOND
@inproceedings{yao:hal-02987566, author = {Yao, Hua-Ting and Waldispühl, Jérôme and Ponty, Yann and Will, Sebastian}, title = {{Taming Disruptive Base Pairs to Reconcile Positive and Negative Structural Design of RNA}}, booktitle = {{RECOMB 2021 - 25th international conference on research in computational molecular biology}}, year = {2021}, address = {Padova, France}, month = apr, hal_id = {hal-02987566}, hal_version = {v2}, url = {https://hal.inria.fr/hal-02987566}, }
Thesis
Local Decomposition in RNA Structural Design

Hua-Ting Yao

Ecole Polytechnique (Palaiseau, France) ; Université McGill [Montréal], Dec 2021

RNA positive structural design problem attempts to find RNA sequences achieving low free energy of the target secondary structure. Differently, in the negative design, solution sequences should adopt the target structure as its folding preferentially to any alternative structure, according to the given metric and energy model. Inverse folding, a typical negative design, requires the target to be the solution sequence’s MFE folding. Other metrics, like the ensemble defect, are also considered for design evaluation. The additivity of the energy model suggests the existence of local properties for the RNA design problem. It was discovered in several works that, due to the presence of specific local motifs, some secondary structures are undesignable, i.e., no RNA sequence can fold into the target structure while satisfying the negative design objective. The sequence sampling approach is often used in the positive design. Unwanted local structures, like base pairs, repeatedly form while folding sampled sequences toward the negative design. In this thesis, we study the impact of such local nature on the combinatorial aspect and on the development of negative design methods. We show that the proportion of designable secondary structures decreases exponentially with the target structure length from the combinatorial aspect. Given a negative design metric, we propose an automated pipeline to identify all undesignable motifs. Enumerating secondary structures avoiding such local obstructions followed by asymptotic analysis yields an upper-bounds on the number of designable structures. In addition, we define a lower bound for the structural ensemble defect derived from occurred local motifs. We show that the lower bound follows a Normal limiting distribution with a closed-form expression, implying also an exponential decrease. We then present Infrared, a generic framework for efficient combinatorial sampling. We formalize the RNA design problem as a CSP with design objectives described as a set of constraints and a set of weighted functions. Assignments satisfying constraints are generated from a Boltzmann weighted distribution using a dynamic programming algorithm followed by stochastic backtracking. The approach is FPT for the treewidth of the dependency graph induced from the problem. We show that the framework can be easily employed for RNA positive design and flexible applications. Finally, as an application of Infrared, we propose an original iterative sampling approach that captures negative design principles implemented in RNAPOsitive and Negative Design (RNAPOND). A set of DBPs is identified at each round and subsequently prevented from pairing by introducing proper constraints into the sampling framework. Despite the NP-hardness of the associated decision problem, an efficient sequence sampling algorithm is ensured by the Infrared framework. Our approach achieves a similar or better success rate than state-of-the-art negative design tools while allowing for the generation of diverse, thermodynamically efficient designs, i.e., positive design principles. One of the research directions of the works presented in this thesis is the extension to more complicated structures, such as pseudoknotted secondary structures. The flexibility of the Infrared framework opens a door for design tool development. For example, the success of RNAPOND suggests a potential approach for RNA negative structural design.
@phdthesis{yao:tel-03538576, author = {Yao, Hua-Ting}, title = {{Local Decomposition in RNA Structural Design}}, school = {{Ecole Polytechnique (Palaiseau, France) ; Universit{\'e} McGill [Montr{\'e}al]}}, year = {2021}, month = dec, hal_id = {tel-03538576}, hal_version = {v2}, keywords = {Negative RNA Design ; Parameterized Complexity ; RNA motif ; Design negatif d'ARN ; Complexite parametree ; Motif d'ARN}, number = {2021IPPAX126}, url = {https://tel.archives-ouvertes.fr/tel-03538576}, }

2020

Conference
Stochastic Sampling of Structural Contexts Improves the Scalability and Accuracy of RNA 3D Modules Identification

Roman Sarrazin-Gendron, Hua-Ting Yao, Vladimir Reinharz, Carlos G Oliver, and 2 more authors

In RECOMB 2020 - 24th Annual International Conference on Research in Computational Molecular Biology, May 2020

RNA structures possess multiple levels of structural organization. Secondary structures are made of canonical (i.e. Watson-Crick and Wobble) helices, connected by loops whose local conformations are critical determinants of global 3D architectures. Such local 3D structures consist of conserved sets of non-canonical base pairs, called RNA modules. Their prediction from sequence data is thus a milestone toward 3D structure modelling. Unfortunately, the computational efficiency and scope of the current 3D module identification methods are too limited yet to benefit from all the knowledge accumulated in modules databases. Here, we introduce BayesPairing 2, a new sequence search algorithm leveraging secondary structure tree decomposition which allows to reduce the computational complexity and improve predictions on new sequences. We benchmarked our methods on 75 modules and 6360 RNA sequences, and report accuracies that are comparable to the state of the art, with considerable running time improvements. When identifying 200 modules on a single sequence, BayesPairing 2 is over 100 times faster than its previous version, opening new doors for genome-wide applications.
@inproceedings{sarrazingendron:hal-02354733, author = {Sarrazin-Gendron, Roman and Yao, Hua-Ting and Reinharz, Vladimir and Oliver, Carlos G and Ponty, Yann and Waldispühl, Jérôme}, title = {{Stochastic Sampling of Structural Contexts Improves the Scalability and Accuracy of RNA 3D Modules Identification}}, booktitle = {{RECOMB 2020 - 24th Annual International Conference on Research in Computational Molecular Biology}}, year = {2020}, series = {Proceedings of RECOMB - 24th Annual International Conference on Research in Computational Molecular Biology - 2020}, address = {Padova, Italy}, month = may, doi = {10.1007/978-3-030-45257-5_12}, hal_id = {hal-02354733}, hal_version = {v1}, url = {https://hal.inria.fr/hal-02354733}, }
Book Chapter
Advanced design of structural RNAs using RNARedPrint

Yann Ponty, Stefan Hammer, Hua-Ting Yao, and Sebastian Will

In RNA Bioinformatics, May 2020

RNA design addresses the need to build novel RNAs, e.g. for biotechnological applications in synthetic biology, equipped with desired functional properties. This chapter describes how to use the software RNARedPrint for the de novo rational design of RNA sequences adopting one or several desired secondary structures. Depending on the application , these structures could represent alternate configurations or kinetic pathways. The software makes such design convenient and sufficiently fast for practical routine, where it even overcomes notorious problems in the application of RNA design, e.g. it maintains realistic GC content.
@incollection{ponty:hal-02990264, author = {Ponty, Yann and Hammer, Stefan and Yao, Hua-Ting and Will, Sebastian}, title = {{Advanced design of structural RNAs using RNARedPrint}}, booktitle = {{RNA Bioinformatics}}, year = {2020}, editor = {Picardi, Ernesto}, series = {Methods in Molecular Biology}, doi = {10.1007/978-1-0716-1307-8_1}, hal_id = {hal-02990264}, hal_version = {v1}, keywords = {RNA design ; Kinetic landscapes ; Riboswitches}, url = {https://hal.inria.fr/hal-02990264}, }

2019

Conference
Exponentially few RNA structures are designable

Hua-Ting Yao, Cedric Chauve, Mireille Regnier, and Yann Ponty

In ACM-BCB 2019 - 10th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, Sep 2019

The problem of RNA design attempts to construct RNA sequences that performs a predefined biological function, identified by several additional constraints. One of the foremost objective of RNA design is that the designed RNA sequence should adopt a predefined target secondary structure preferentially to any alternative structure, according to a given metrics and folding model. It was observed in several works that some secondary structures are undesignable, i.e. no RNA sequence can fold into the target structure while satisfying some criterion measuring how preferential this folding is compared to alternative conformations. In this paper, we show that the proportion of designable secondary structures decreases exponentially with the size of the target secondary structure, for various popular combinations of energy models and design objectives. This exponential decay is, at least in part, due to the existence of undesignable motifs, which can be generically constructed, and jointly analyzed to yield asymptotic upper-bounds on the number of designable structures.
@inproceedings{yao:hal-02141853, author = {Yao, Hua-Ting and Chauve, Cedric and Regnier, Mireille and Ponty, Yann}, title = {{Exponentially few RNA structures are designable}}, booktitle = {{ACM-BCB 2019 - 10th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics}}, year = {2019}, pages = {289-298}, address = {Niagara-Falls, United States}, month = sep, publisher = {{ACM Press}}, doi = {10.1145/3307339.3342163}, hal_id = {hal-02141853}, hal_version = {v2}, keywords = {RNA Design ; Generating functions ; Pattern Avoidance ; Inverse Folding ; Pattern Matching ; Analytic Combinatorics ; Neutral Networks}, url = {https://hal.inria.fr/hal-02141853}, }

2018

Journal
MentaLiST – A fast MLST caller for large MLST schemes

Pedro Feijao, Hua-Ting Yao, Dan Fornika, Jennifer Gardy, and 3 more authors

Microbial Genomics, Feb 2018

MLST (multi-locus sequence typing) is a classic technique for genotyping bacteria, widely applied for pathogen outbreak surveillance. Traditionally, MLST is based on identifying sequence types from a small number of housekeeping genes. With the increasing availability of whole-genome sequencing data, MLST methods have evolved towards larger typing schemes, based on a few hundred genes [core genome MLST (cgMLST)] to a few thousand genes [whole genome MLST (wgMLST)]. Such large-scale MLST schemes have been shown to provide a finer resolution and are increasingly used in various contexts such as hospital outbreaks or foodborne pathogen outbreaks. This methodological shift raises new computational challenges, especially given the large size of the schemes involved. Very few available MLST callers are currently capable of dealing with large MLST schemes. We introduce MentaLiST, a new MLST caller, based on a k-mer voting algorithm and written in the Julia language, specifically designed and implemented to handle large typing schemes. We test it on real and simulated data to show that MentaLiST is faster than any other available MLST caller while providing the same or better accuracy, and is capable of dealing with MLST schemes with up to thousands of genes while requiring limited computational resources. MentaLiST source code and easy installation instructions using a Conda package are available at [https://github.com/WGS-TB/MentaLiST](https://github.com/WGS-TB/MentaLiST).
@article{Feijao2018, author = {Feijao, Pedro and Yao, Hua-Ting and Fornika, Dan and Gardy, Jennifer and Hsiao, William and Chauve, Cedric and Chindelevitch, Leonid}, journal = {Microbial Genomics}, title = {{MentaLiST} {\textendash} A fast {MLST} caller for large {MLST} schemes}, year = {2018}, month = feb, number = {2}, volume = {4}, doi = {10.1099/mgen.0.000146}, publisher = {Microbiology Society}, }