An afternoon of socioeconomic data science II

Place: Lecture hall T4, The Computer Science Building, Aalto University

Date: April 2, 2026

Format: 20 min talk + 10 min Q&A

Organizers: Tomomi Kito, Graduate School of Creative Science and Engineering, Waseda University; Petter Holme, Department of Computer Science, Aalto University.

Program

13:00–13:45
Omar Guerrero, University of Helsinki
Modeling sustainable development from the bottom up

The explicit acknowledgement of the complexity of the Sustainable Development Goals (SDGs) is one of the main innovations of this international agenda. However, the formal analysis of complex systems in the SDG literature remains scant, as most of the focus is given to (top-down) aggregate models such as systems dynamics and networks of indicators. In this talk, I will argue that an adequate treatment of complexity requires viewing development as a bottom-up process, with macro-level outcomes emerging from micro-level interventions. From a quantitative point of view, popular methodologies such as statistical analysis and machine learning are inadequate to address this vertical causation, as the available data are aggregate and coarse-grained (typically annual development indicators). To resolve this, models with explicit agent-level causal mechanisms are needed, and agent computing is the right tool to create them. I will present the research program of Policy Priority Inference (oguerr.com/ppi), which employs agent computing to model the SDGs from the perspective of public expenditure interventions. I will discuss several applications related to policy coherence, policy resilience, feasibility, fiscal federalism, accelerators, and bottlenecks, as well as the country case studies in which they have been applied. This programme provides a fresh perspective on the challenges of multidimensional development and a rigorous approach to exploiting not only indicators but also new sources, such as open spending data.


14:00–14:45
Manuel Cebrian, Spanish National Research Council
General scales unlock AI evaluation with explanatory and predictive power

Ensuring safe and effective use of AI requires understanding and anticipating its performance on novel tasks, from advanced scientific challenges to transformed workplace activities. So far, benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems, given the low transferability across diverse tasks. In this paper, we introduce general scales for AI evaluation that can explain what common AI benchmarks really measure, extract ability profiles of AI systems, and predict their performance for new task instances, in- and out-of-distribution. Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate. Illustrated for 15 large language models and 63 tasks, high explanatory power is unleashed from inspecting the demand and ability profiles, bringing insights on the sensitivity and specificity exhibited by different benchmarks, and how knowledge, metacognition and reasoning are affected by model size, chain-of-thought and distillation. Surprisingly, high predictive power at the instance level becomes possible using these demand levels, providing superior estimates over black-box baseline predictors based on embeddings or finetuning, especially in out-of-distribution settings (new tasks and new benchmarks). The scales, rubrics, battery, techniques and results presented here represent a major step for AI evaluation, underpinning the reliable deployment of AI in the years ahead. (Collaborative platform: https://kinds-of-intelligence-cfi.github.io/ADELE/)