{"id":11067,"date":"2025-04-26T23:36:42","date_gmt":"2025-04-26T14:36:42","guid":{"rendered":"https:\/\/www.ibs.re.kr\/bimag\/?post_type=tribe_events&#038;p=11067"},"modified":"2025-06-09T09:18:25","modified_gmt":"2025-06-09T00:18:25","slug":"data-splitting-to-avoid-information-leakage-with-datasail-myna-lim","status":"publish","type":"tribe_events","link":"https:\/\/www.ibs.re.kr\/bimag\/event\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\/","title":{"rendered":"Data splitting to avoid information leakage with DataSAIL &#8211; Myna Lim"},"content":{"rendered":"<p>In this talk, we discuss the paper, &#8220;Data splitting to avoid information leakage with DataSAIL&#8221; by Roman Joeres, et al., Nature Communications, 2025.<\/p>\n<p><strong>Abstract<\/strong><\/p>\n<p>Information leakage is an increasingly important topic in machine learning research for biomedical applications. When information leakage happens during a model\u2019s training, it risks memorizing the training data instead of learning generalizable properties. This can lead to inflated performance metrics that do not reflect the actual performance at inference time. We present DataSAIL, a versatile Python package to facilitate leakage-reduced data splitting to enable realistic evaluation of machine learning models for biological data that are intended to be applied in out-of-distribution scenarios. DataSAIL is based on formulating the problem to find leakage-reduced data splits as a combinatorial optimization problem. We prove that this problem is NP-hard and provide a scalable heuristic based on clustering and integer linear programming. Finally, we empirically demonstrate DataSAIL\u2019s impact on evaluating biomedical machine learning models.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this talk, we discuss the paper, &#8220;Data splitting to avoid information leakage with DataSAIL&#8221; by Roman Joeres, et al., Nature Communications, 2025. Abstract Information leakage is an increasingly important &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.ibs.re.kr\/bimag\/event\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Data splitting to avoid information leakage with DataSAIL &#8211; Myna Lim&#8221;<\/span><\/a><\/p>\n","protected":false},"author":11,"featured_media":0,"template":"","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","_tribe_events_status":"","_tribe_events_status_reason":"","footnotes":""},"tags":[],"tribe_events_cat":[219],"class_list":["post-11067","tribe_events","type-tribe_events","status-publish","hentry","tribe_events_cat-journal-club","cat_journal-club"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data splitting to avoid information leakage with DataSAIL - Myna Lim - Biomedical Mathematics Group<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ibs.re.kr\/bimag\/event\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data splitting to avoid information leakage with DataSAIL - Myna Lim - Biomedical Mathematics Group\" \/>\n<meta property=\"og:description\" content=\"In this talk, we discuss the paper, &#8220;Data splitting to avoid information leakage with DataSAIL&#8221; by Roman Joeres, et al., Nature Communications, 2025. Abstract Information leakage is an increasingly important &hellip; Continue reading &quot;Data splitting to avoid information leakage with DataSAIL &#8211; Myna Lim&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ibs.re.kr\/bimag\/event\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\/\" \/>\n<meta property=\"og:site_name\" content=\"Biomedical Mathematics Group\" \/>\n<meta property=\"article:modified_time\" content=\"2025-06-09T00:18:25+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/event\\\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\\\/\",\"url\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/event\\\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\\\/\",\"name\":\"Data splitting to avoid information leakage with DataSAIL - Myna Lim - Biomedical Mathematics Group\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/#website\"},\"datePublished\":\"2025-04-26T14:36:42+00:00\",\"dateModified\":\"2025-06-09T00:18:25+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/event\\\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/event\\\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/event\\\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Events\",\"item\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/events\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Data splitting to avoid information leakage with DataSAIL &#8211; Myna Lim\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/#website\",\"url\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/\",\"name\":\"Biomedical Mathematics Group\",\"description\":\"\uae30\ucd08\uacfc\ud559\uc5f0\uad6c\uc6d0 \uc758\uc0dd\uba85\uc218\ud559\uadf8\ub8f9\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/#organization\",\"name\":\"IBS Biomedical Mathematics Group\",\"url\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/cms\\\/wp-content\\\/uploads\\\/2021\\\/02\\\/ibs-circle-1.png\",\"contentUrl\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/cms\\\/wp-content\\\/uploads\\\/2021\\\/02\\\/ibs-circle-1.png\",\"width\":250,\"height\":250,\"caption\":\"IBS Biomedical Mathematics Group\"},\"image\":{\"@id\":\"https:\\\/\\\/www.ibs.re.kr\\\/bimag\\\/#\\\/schema\\\/logo\\\/image\\\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data splitting to avoid information leakage with DataSAIL - Myna Lim - Biomedical Mathematics Group","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ibs.re.kr\/bimag\/event\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\/","og_locale":"en_US","og_type":"article","og_title":"Data splitting to avoid information leakage with DataSAIL - Myna Lim - Biomedical Mathematics Group","og_description":"In this talk, we discuss the paper, &#8220;Data splitting to avoid information leakage with DataSAIL&#8221; by Roman Joeres, et al., Nature Communications, 2025. Abstract Information leakage is an increasingly important &hellip; Continue reading \"Data splitting to avoid information leakage with DataSAIL &#8211; Myna Lim\"","og_url":"https:\/\/www.ibs.re.kr\/bimag\/event\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\/","og_site_name":"Biomedical Mathematics Group","article_modified_time":"2025-06-09T00:18:25+00:00","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.ibs.re.kr\/bimag\/event\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\/","url":"https:\/\/www.ibs.re.kr\/bimag\/event\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\/","name":"Data splitting to avoid information leakage with DataSAIL - Myna Lim - Biomedical Mathematics Group","isPartOf":{"@id":"https:\/\/www.ibs.re.kr\/bimag\/#website"},"datePublished":"2025-04-26T14:36:42+00:00","dateModified":"2025-06-09T00:18:25+00:00","breadcrumb":{"@id":"https:\/\/www.ibs.re.kr\/bimag\/event\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.ibs.re.kr\/bimag\/event\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.ibs.re.kr\/bimag\/event\/data-splitting-to-avoid-information-leakage-with-datasail-myna-lim\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.ibs.re.kr\/bimag\/"},{"@type":"ListItem","position":2,"name":"Events","item":"https:\/\/www.ibs.re.kr\/bimag\/events\/"},{"@type":"ListItem","position":3,"name":"Data splitting to avoid information leakage with DataSAIL &#8211; Myna Lim"}]},{"@type":"WebSite","@id":"https:\/\/www.ibs.re.kr\/bimag\/#website","url":"https:\/\/www.ibs.re.kr\/bimag\/","name":"Biomedical Mathematics Group","description":"\uae30\ucd08\uacfc\ud559\uc5f0\uad6c\uc6d0 \uc758\uc0dd\uba85\uc218\ud559\uadf8\ub8f9","publisher":{"@id":"https:\/\/www.ibs.re.kr\/bimag\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.ibs.re.kr\/bimag\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.ibs.re.kr\/bimag\/#organization","name":"IBS Biomedical Mathematics Group","url":"https:\/\/www.ibs.re.kr\/bimag\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.ibs.re.kr\/bimag\/#\/schema\/logo\/image\/","url":"https:\/\/www.ibs.re.kr\/bimag\/cms\/wp-content\/uploads\/2021\/02\/ibs-circle-1.png","contentUrl":"https:\/\/www.ibs.re.kr\/bimag\/cms\/wp-content\/uploads\/2021\/02\/ibs-circle-1.png","width":250,"height":250,"caption":"IBS Biomedical Mathematics Group"},"image":{"@id":"https:\/\/www.ibs.re.kr\/bimag\/#\/schema\/logo\/image\/"}}]}},"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false,"dimag-thumbnail":false,"twentyseventeen-featured-image":false,"twentyseventeen-thumbnail-avatar":false},"uagb_author_info":{"display_name":"Gyuyoung","author_link":"https:\/\/www.ibs.re.kr\/bimag\/author\/gyuyoung\/"},"uagb_comment_info":0,"uagb_excerpt":"In this talk, we discuss the paper, &#8220;Data splitting to avoid information leakage with DataSAIL&#8221; by Roman Joeres, et al., Nature Communications, 2025. Abstract Information leakage is an increasingly important &hellip; Continue reading \"Data splitting to avoid information leakage with DataSAIL &#8211; Myna Lim\"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.ibs.re.kr\/bimag\/wp-json\/wp\/v2\/tribe_events\/11067","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ibs.re.kr\/bimag\/wp-json\/wp\/v2\/tribe_events"}],"about":[{"href":"https:\/\/www.ibs.re.kr\/bimag\/wp-json\/wp\/v2\/types\/tribe_events"}],"author":[{"embeddable":true,"href":"https:\/\/www.ibs.re.kr\/bimag\/wp-json\/wp\/v2\/users\/11"}],"version-history":[{"count":1,"href":"https:\/\/www.ibs.re.kr\/bimag\/wp-json\/wp\/v2\/tribe_events\/11067\/revisions"}],"predecessor-version":[{"id":11068,"href":"https:\/\/www.ibs.re.kr\/bimag\/wp-json\/wp\/v2\/tribe_events\/11067\/revisions\/11068"}],"wp:attachment":[{"href":"https:\/\/www.ibs.re.kr\/bimag\/wp-json\/wp\/v2\/media?parent=11067"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ibs.re.kr\/bimag\/wp-json\/wp\/v2\/tags?post=11067"},{"taxonomy":"tribe_events_cat","embeddable":true,"href":"https:\/\/www.ibs.re.kr\/bimag\/wp-json\/wp\/v2\/tribe_events_cat?post=11067"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}