Accepted Papers


LTR-ICD: A Learning-To-Rank Approach For Automatic ICD Coding

Mohammad Mansoori, Amira Soliman, and Farzaneh Etminani

ABSTRACT

Clinical notes contain unstructured text provided by clinicians during patient encounters.These notes are usually accompanied by a sequence of diagnostic codes following the International Classifi-cation of Diseases (ICD). Correctly assigning and ordering ICD codes is essential for medical diagnosis andreimbursement. However, automating this task remains challenging. State-of-the-art methods treated thisproblem as a classification task, leading to ignoring the order of ICD codes that is essential for differentpurposes. In this work, as a first attempt, we approach this task from a retrieval system perspective toconsider the order of codes, thus formulating this problem as a classification and ranking task. Our resultsand analysis show that the proposed framework has a superior ability to identify high-priority codes com-pared to other methods. For instance, our model’s accuracy in correctly ranking primary diagnosis codes is˜47%, compared to ˜20% for the state-of-the-art classifier. Additionally, in terms of classification metrics,the proposed model achieves a micro- and macro-F1 scores of 0.6065 and 0.2904, respectively, surpassingthe previous best model with scores of 0.597 and 0.2660.

Keywords

generative language models, learning to rank, automatic medical coding, ICD coding, elec-tronic health records, pre-trained language models.


Household Movement Detection In Mixed-Formatoccupancy Data Using Llm-based Entity Resolution

Sasirekha Oguri, John R. Talburt, and Mert Can Cakmak Center for Entity Resolution and Information Quality (ERIQ)University of Arkansas - Little Rock

ABSTRACT

Entity resolution (ER) typically relies on pairwise similarity comparisons between records,which limits its ability to capture indirect relationships present in demographic occupancy data. An im-portant indirect pattern arises from household movement, where multiple individuals relocate togetheracross addresses, but detecting such patterns is difficult due to mixed-format records, noise, duplication,and the absence of stable identifiers. This paper proposes an AI-enhanced framework for detecting indirectentity links associated with household movement in unstandardized name–address data. The approachintegrates prompt-based large language model (LLM) named entity recognition for extracting personalnames and addresses without extensive preprocessing, semantic text embeddings for robust similaritycomputation, and graph-based reasoning to infer group-level movement patterns. Experimental evaluationon SPX benchmark datasets (S8–S12) generated using the Synthetic Occupancy Generator demonstratesthat incorporating indirect household movement evidence improves recall by 8–15% while maintaining highprecision, yielding F1-score gains of 6–8% over a strong pairwise baseline.

Keywords

Entity Resolution, Household Movement Detection, Indirect Linkage, Named Entity Recog-nition, Large Language Models, Semantic Text Embeddings, Graph-Based Clustering, Occupancy Data,Synthetic Data, Data Integration