Accepted Papers


LTR-ICD: A Learning-To-Rank Approach For Automatic ICD Coding

Mohammad Mansoori, Amira Soliman, and Farzaneh Etminani , Center for Applied Intelligent Systems Research (CAISR),Halmstad University, Sweden

ABSTRACT

Clinical notes contain unstructured text provided by clinicians during patient encounters.These notes are usually accompanied by a sequence of diagnostic codes following the International Classifi-cation of Diseases (ICD). Correctly assigning and ordering ICD codes is essential for medical diagnosis andreimbursement. However, automating this task remains challenging. State-of-the-art methods treated thisproblem as a classification task, leading to ignoring the order of ICD codes that is essential for differentpurposes. In this work, as a first attempt, we approach this task from a retrieval system perspective toconsider the order of codes, thus formulating this problem as a classification and ranking task. Our resultsand analysis show that the proposed framework has a superior ability to identify high-priority codes com-pared to other methods. For instance, our model’s accuracy in correctly ranking primary diagnosis codes is˜47%, compared to ˜20% for the state-of-the-art classifier. Additionally, in terms of classification metrics,the proposed model achieves a micro- and macro-F1 scores of 0.6065 and 0.2904, respectively, surpassingthe previous best model with scores of 0.597 and 0.2660.

Keywords

generative language models, learning to rank, automatic medical coding, ICD coding, elec-tronic health records, pre-trained language models.


Household Movement Detection In Mixed-Formatoccupancy Data Using Llm-based Entity Resolution

Sasirekha Oguri, John R. Talburt, and Mert Can Cakmak Center for Entity Resolution and Information Quality (ERIQ)University of Arkansas - Little Rock , USA

ABSTRACT

Entity resolution (ER) typically relies on pairwise similarity comparisons between records,which limits its ability to capture indirect relationships present in demographic occupancy data. An im-portant indirect pattern arises from household movement, where multiple individuals relocate togetheracross addresses, but detecting such patterns is difficult due to mixed-format records, noise, duplication,and the absence of stable identifiers. This paper proposes an AI-enhanced framework for detecting indirectentity links associated with household movement in unstandardized name–address data. The approachintegrates prompt-based large language model (LLM) named entity recognition for extracting personalnames and addresses without extensive preprocessing, semantic text embeddings for robust similaritycomputation, and graph-based reasoning to infer group-level movement patterns. Experimental evaluationon SPX benchmark datasets (S8–S12) generated using the Synthetic Occupancy Generator demonstratesthat incorporating indirect household movement evidence improves recall by 8–15% while maintaining highprecision, yielding F1-score gains of 6–8% over a strong pairwise baseline.

Keywords

Entity Resolution, Household Movement Detection, Indirect Linkage, Named Entity Recog-nition, Large Language Models, Semantic Text Embeddings, Graph-Based Clustering, Occupancy Data,Synthetic Data, Data Integration


A Multi-agent Social Simulation Framework Based On Large Language Models: A Case Study Of Public Opinion Evolution On The Fukushima Nuclear Wastewater Discharge

Siying Wang1, Xuan Wang2, Yining Tang3, Chao Wu3,1School of Information Resources Management, Renmin University of China, Beijing, China,2China Media Group, Beijing, China ,3School of Public Affairs, Zhejiang University, Hangzhou, China

ABSTRACT

Simulating public opinion evolution is a core focus of computational social science. Traditional agent-based models rely on predefined heuristic rules, failing to capture the semantic features and cognitive processes of human natural language interactions. While large language models offer new approaches for artificial society construction, existing frameworks have limitations in scalability and memory management. Taking the Fukushima nuclear wastewater discharge event as the background, this study uses an open-source multi-agent social simulation framework, designing four progressive intervention scenarios to analyze agents cognitive synergy and public opinion trajectories. Results show the framework mitigates role drift and premature consensus, reproduces the public opinion evolution trajectory, providing empirical insights for policy testing and LLM-driven social computing.

Keywords

Multi-agent simulation, Public opinion evolution, Nuclear wastewater discharge, Computational social science