Research

My research is centered on reliable machine learning systems: models and agents that can be evaluated, adapted, constrained, and deployed in workflows where mistakes matter.

LLM Agents, Control, and Safety

I study long-horizon language agents that must decide when to ask, delegate, verify, act, or escalate. This includes learning under the agent’s own continuation policy, evaluating action value and regret, and designing guard frameworks for tool-using coding agents.

Recent themes:

Reliable action selection for language agents.
Counterfactual action-value relabeling and audited evaluation.
Route, provenance, and capability checks for high-risk tool use.
Agent evaluation with explicit success, safety, and misfire criteria.

Foundation Models and Model Adaptation

I develop methods for adapting and evaluating foundation models under constrained settings, including data-free model merging and performance enhancement without task data, training, or test-time tuning.

Recent work improved average benchmark performance to 86.1 across 7 NLP tasks and 8 vision tasks under strict no-data constraints, with emphasis on reproducible evaluation and strong baseline comparisons.

Sequence Modeling, Decoding, and Search

I work on structure-aware generation methods for autoregressive sequence models, especially settings where repeated outputs and hidden structural collapse can harm constrained generation.

This work includes distributed asynchronous generation and evaluation frameworks, token-level value prediction, human-in-the-loop steering, and large-scale candidate generation over structured spaces.

Applied AI for Health and Scientific Discovery

I began my research in the Laboratory of Machine Learning and Health Informatics, where I worked on machine learning for health, genomics, multimodal data, molecular generation, and drug discovery.

Selected directions include:

Multimodal health modeling with imaging, demographic, and auxiliary biomarkers.
Depression treatment outcome prediction using mobile sensing and clinical data.
Molecular generation with hierarchical chemical graph representations and autoregressive chemical language models.
Matrix completion and tensor modeling for biomedical phenotyping.

Earlier Systems Work

Earlier in my research path, I also worked on secure inter-domain routing and RPKI/ROV forecast tools through the UConn Comcast Center for Excellence in Security Innovation. That work helped shape my interest in systems that combine algorithmic decisions with practical deployment constraints.

Xinyu Wang