圖片 20251124163841 34 2

Galaxy International Convention Center

Date: 08 Dec 2025(Mon) Time: 09:00 – 17:30
09:00 - 09:05
Keynote
09:05 - 10:00
b93e7a49 2c23 4bf6 958a c3ca1f183f2b

Contemporary artificial intelligence is essentially data intelligence,means automation of intelligence powered by data. Without computers, there would be no internet; without the internet, there would be no big data; and without big data, there would be no current AI. Behind the data intelligence boom is the newer and deeper understanding about data engineering that emerged almost at the same time with AI. In the history of human being, the engineering disciplines have played very important roles, which evolved from civil engineeing to mechanic engineering then to electric engineering. Before the advent of the internet, database were regarded as the key infrastructure of information society. In China, data has been defined as the fifth factor of production. Unlike traditional factors of production, it just like a new power or driving force akin to electricity, i.e. Data is Power. Therefore, the future direction for database is to focus on data power platforms, functioning much like a power grid.

ZHOU Aoying, Professor at School of Data Science and Engineering, East China Normal University. He is China Computer Federation (CCF) Fellow and a member of CCF Standing Committee, the Chair of the CCF Technical Committee on Databases, and the Associate Editor-in-Chief of Chinese Journal of Computer. He is the Chairman of Shanghai Computer Society, and the President of Shanghai Society of Artificial Intelligence and Social Development. He was the Chair of the Department of Computer Science, Fudan University (1999-2002), the Vice President of East China Normal University (2016-2023). His research interests lie in Databases, Data Management, Blockchain, Digital Transformation, FinTech, and EduTech.

10:00 - 10:30
zhuang mengzhou austin web scaled 940x1000 ct

Referral programs are a powerful tool for user acquisition, yet the optimal design of reward schemes remains underexplored. This research empirically investigates how referral reward schemes—conditional referral incentive (CRI), where rewards depend on conversion, versus unconditional referral incentive (URI), where rewards are guaranteed for sharing—and reward value (high vs. low) jointly affects the behavior of both existing users (inviters) and the new users they acquire (invitees). Through a large-scale field experiment with a mobile gaming company, we tracked the referral activities and in-game activities of 66,427 inviters and 97,181 invitees over 60 days. Our results reveal a critical trade-off between referral quantity and quality for inviters. In the high-reward condition, inviters under the CRI scheme sent 77.4% fewer referrals than their URI counterparts but achieved a per-referral acceptance rate that was over 11 times higher. Despite their efficiency in conversion, CRI schemes generate negative spillover effects on the subsequent engagement of invitees. Newly acquired users from CRI schemes consistently demonstrated lower engagement; in the high-reward condition, these invitees spent 30.3% less time and 76.6% less expenditure compared to invitees acquired under URI schemes. We find that these contrasting outcomes can be explained by the inviters’ persuasive efforts. Inviters in CRI conditions contact potential invitees 222.9% more frequently, suggesting that the reward-motivated social pressure may increase adoption but undermines the invitees’ post-adoption engagement. Our findings uncover the potential contrast between the two prevailing referral reward schemes: while the CRI scheme is effective for immediate conversions, the URI scheme fosters greater long-term value by cultivating higher-quality invitees. This suggests managers should consider the trade-off between long-term benefits of healthier user acquisition dynamics and short-term conversion metrics.

Mengzhou (Austin) Zhuang joined the University of Hong Kong in 2019, after receiving his Ph.D. in Business Administration (Marketing) from University of Illinois, Urbana-Champaign. Before that he received his M.Phil. in Marketing from Lingnan University, and Bachelor degrees in Business Administration from Xi’an Jiaotong University.

His research interests lie in online advertising and multi-channel marketing strategy. His work primarily focuses on understanding the strategic decisions of multi-channel retailers, online advertisers, retailing platforms, and consumers.

10:30 - 11:00
Keynote
11:00 - 11:55
paisley500
Deep neural networks have revolutionized many fields, but their black-box nature also occasionally prevents their wider adoption in fields such as healthcare and finance, where interpretable and explainable models are needed. The recent development of Neural Additive Models (NAMs) is a significant step in the direction of interpretable deep learning for tabular datasets. In this talk, we discuss a new subclass of NAMs that use a single-layer neural network construction of the Gaussian process via random Fourier features, which we call Gaussian Process Neural Additive Models (GP-NAM). GP-NAMs have the advantage of a convex objective function and number of trainable parameters that grows linearly with feature dimensionality. It suffers no loss in performance compared to deeper NAM approaches because GPs are well-suited for learning complex non-parametric univariate functions. We demonstrate the performance of GP-NAM on several tabular datasets, showing that it achieves comparable or better performance in both classification and regression tasks with a large reduction in the number of parameters.

John Paisley is an Associate Professor in the Department of Electrical Engineering at Columbia University, where he is also a member of the Data Science Institute. His research interests include Bayesian models and inference, with applications to machine learning problems. Before joining Columbia in 2013, he was a postdoctoral researcher in the computer science departments at Princeton University and UC Berkeley. He received the BSE and PhD degrees in Electrical and Computer Engineering from Duke University in 2004 and 2010, respectively.

11:55 - 12:25
bo tang

In the large model era, vectors—serving as metadata for large models—are being applied across various aspects of these models, such as Retrieval-Augmented Generation (RAG), sparse attention, and KV caching. Our team has taken vectors as the entry point to develop a research framework centered on the principle: “Model Capability = Memory Capability × Inference Capability.” In this talk, I will introduce two research work: AlayaLite and AlayaJet, and briefly discuss the key challenges in building AlayaDB, the data foundation for the large model era.

Dr. Bo Tang is a tenured associate professor at Southern University of Science and
Technology. His research interests are big data/large model systems. He always
published at top-tier conference and journals (e.g., SIGMOD, PVLDB, TODS). His
research outputs have been widely used in leading IT companies (e.g.,
Microsoft, Tencent, and Huawei). He won Huawei Spark Award (3 times). He is
awarded SIGMOD China Rising Star at 2021 and sponsored by NSFC excellent young
researcher grant at 2024.

12:25 - 12:55
leye 200x269 1
Effective traffic management systems are pivotal for urban sustainability, yet their deployment is often hindered by high costs and limited cross-city generalizability. While recent Large Language Model (LLM)-based methods attempt to automate this process, they predominantly focus on scene construction or optimization algorithms, often neglecting the alignment of generated traffic flow with real-world semantics. This oversight frequently results in significant Sim-to-Real gaps. To address this challenge, we propose an autonomous agentic framework designed to automate high-fidelity simulation construction for diverse traffic tasks. Our framework leverages LLMs to streamline the pipeline from road network extraction to demand generation. Crucially, it integrates real-time road conditions derived from navigation data to enable precise, road-level calibration. To achieve this, we introduce a novel Iterative Calibration via OD Trip-Index Table mechanism. Unlike traditional coarse-grained parameter tuning, this method targets specific vehicle trips, pruning or augmenting individual trajectories based on directed errors to ensure fine-grained alignment with real-world behaviors. Evaluated on a four-city benchmark, our framework surpasses existing traffic agents in replicating realistic traffic dynamics. Furthermore, the resulting high-fidelity traffic flow significantly enhances the performance of various reinforcement learning algorithms, effectively reducing the Sim-to-Real gap. Notably, simulation accuracy improves by up to 35.2%, and when applied to real-world optimization tasks, the system yields an average performance gain of 7%.

Leye Wang is an assistant professor at Key Lab of High Confidence Software Technologies, School of Computer Science, Peking University, China. His research interests include ubiquitous computing, mobile crowdsensing, and urban computing. Wang received a Ph.D. in computer science from the Institut Telecom SudParis and University Paris 6, France, in 2016, and was a postdoc researcher with Hong Kong University of Science and Technology. He has published 50+ papers and received 1500+ citations according to Google Scholar.

12:55 - 14:00
Keynote
14:00 - 14:55
131309

As innovation cycles accelerate across industries, organizations struggle to track emerging technologies, detect weak signals, and map fast-evolving knowledge landscapes. Traditional technology-monitoring pipelines rely on manual curation or classical machine-learning models that scale poorly when confronted with millions of documents and tens of thousands of possible technology tags. In this talk, we present a scalable framework for technology monitoring powered by eXtreme Multi-Label Classification (XMLC). We describe a number of technological advances we have published in terms of explainable text classification, automated taxonomy update, and text classification leveraging very large and structured sets of labels. 

Philippe Cudre-Mauroux is a Full Professor and the Director of the eXascale Infolab at the University of Fribourg in Switzerland. He received his Ph.D. from the Swiss Federal Institute of Technology EPFL, where he won both the Doctorate Award and the EPFL Press Mention in 2007. Before joining the University of Fribourg, he worked on information management infrastructures at IBM Watson (NY), Microsoft Research Asia and Silicon Valley, and MIT. He recently won the Verisign Internet Infrastructures Award, a Swiss National Center in Research award, a Google Faculty Research Award, as well as a 2 million Euro grant from the European Research Council. His research interests are in next-generation, Big Data management infrastructures for non-relational data and AI.

14:55 - 15:25
圖片 20251120180835 21 2

The use of Kernel Density Visualization (KDV) has become widespread in several disciplines, including geography, crime science, transportation science, and ecology, for analyzing geospatial data. However, the growing scale of massive geospatial data has rendered many commonly used software tools unable of generating high-resolution KDVs, leading to concerns about the inefficiency of KDV. This talk aims to raise awareness among database researchers about this important, emerging, database-related, and interdisciplinary topic. In this talk, I will first discuss the background and the state-of-the-art method of KDV. Then, I will further discuss the state-of-the-art method of the key variant of KDV, which is spatiotemporal kernel density visualization (STKDV). After that, I will discuss some new software packages, which are based on these state-of-the-art methods, for supporting KDV and STKDV. Lastly, I will outline some future directions for this topic.

Tsz Nam Chan (Edison) is currently a Distinguished Professor in the database group of the Big Data Institute in Shenzhen University (SZU). He is a data engineering researcher (for handling the efficiency issues in big data settings). He published several research papers in prestigious conferences and journals (CCF: A, CSRankings, and top ranking in Google scholar) in both database (data engineering) and data mining areas, including SIGMOD, VLDB, ICDE, SIGKDD, and TKDE. Prior to joining the SZU, he was a Research Assistant Professor in the Hong Kong Baptist University from Sep 2020 to Aug 2023 and a postdoctoral researcher in The University of Hong Kong from Sep 2018 to Aug 2020. He received the PhD degree in computing and the BEng degree in electronic and information engineering from The Hong Kong Polytechnic University in 2019 and 2014, respectively. He is an IEEE senior member and an ACM member.

15:25 - 16:00
16:00 - 16:30
15717626 (1)

Anomaly detection is an essential problem in data analytics with applications in many domains. In recent years, there has been an increasing interest in anomaly detection for time series. In this talk, we take a holistic view of anomaly detection in time series, starting with the core definitions and taxonomies of time series and anomaly types, and proceeding to a general overview of the anomaly detection methods proposed by different communities in the literature. We will then conclude with Ensembling and Model Selection for time-series anomaly detection, discussing strategies for automatically selecting the appropriate techniques for a given time series.

I am a researcher at Inria, member of the VALDA project-team, which is a joint team between Inria Paris, École Normale Supérieure, and CNRS. Before that, I was a Postdoctoral researcher at Ecole Normale Supérieure (ENS) Paris Saclay (Centre Borelli) in the team of Prof. Laurent Oudre. I completed my Ph.D. at the University of Paris and EDF R&D, working with Prof. Themis Palpanas, Emmanuel Remy, and Mohammed Meftah. During my Ph.D., I did an internship at the University of Chicago under the supervision of Prof. Michael J. Franklin and Prof. John Paparrizos. Before starting my Ph.D., I worked as a research engineer at the computer science lab of Ecole Polytechnique in Prof. Michalis Vazirgiannis’s team. My research interest lies in the intersections between: Massive time series analytics and management systems. Unsupervised and supervised anomaly detection methods for large time series. Machine learning for time series analytics.

16:30 - 17:00
head1
Large language models (LLMs) such as ChatGPT and DeepSeek have shown remarkable potential in the pursuit of general artificial intelligence and are now widely used in question answering, search, education, finance, and beyond. However, their outputs still suffer from hallucinations, limited knowledge coverage, and slow knowledge updates, which substantially undermine their reliability and practicality. Retrieval-Augmented Generation (RAG), which integrates external knowledge to improve generation quality, has therefore emerged as a key solution to these challenges. This talk presents our recent progress in building efficient and trustworthy RAG systems. We first construct large-scale multilingual text-pair datasets and train a new generation of universal semantic representation models, significantly improving robustness and generalization across tasks and domains. We then introduce task-specific retrieval enhancements that address query rewriting, intent ambiguity, and long-context modeling, further strengthening the accuracy and contextual understanding of RAG systems in complex real-world scenarios.

Defu Lian is a professor from University of Science and Technology of China. His main research interest lies in data mining and deep learning. He has published more than 160 papers at prestigious conferences and journals, and received a best paper runner-up in APWeb 2016, best paper candidate in WWW 2021 and best paper award in WISE 2022. He developed a highly-modularized recommender system (RecStudio) and a learned vector retrieval library (LibVQ). He received the National Science Fund for Excellent Young Scholars.

17:00 - 17:30
photo
Developing sustainable smart cities is imperative for addressing the multifaceted challenges posed by rapid urbanization. However, conventional digital twin technologies employed in smart city management are often constrained by prohibitive modeling costs and a distinct deficiency in cognitive autonomy. To bridge this gap, we introduce AIDT, a generative AI framework designed to serve as the intelligent kernel of smart cities. By synergizing ubiquitous crowdsourced sensing for cost-effective urban 3D reconstruction and leveraging Large Language Models (LLMs) for autonomous reasoning, AIDT endows cities with the capability of “Perception, Cognition, and Action.” We demonstrate the practical efficacy of AIDT across diverse scenarios, including multi-agent training environments, smart building management, and intelligent emergency response. By establishing a closed-loop mechanism of “Perception-Cognition-Decision,” AIDT presents a scalable and sustainable intelligent paradigm for next-generation smart city management.

Longbiao CHEN is an associate professor with Department of Computer Science, Xiamen University, China. He obtained his Ph.D. degree in computer science from Sorbonne University, France in 2018 and Zhejiang University, China in 2016, respectively. Before joining Xiamen University, he worked as a research assistant in Institut Mines-Télécom, France. His research interests include Ubiquitous Computing, Mobile Crowdsensing, Urban Computing, and Big Data Analytics. Dr. Chen has published over 50 papers in top-tier journals and conferences, including ACM UbiComp, IEEE Trans. Mobile Computing, and IEEE Trans. Intelligent Transportation Systems. He received two UbiComp Honorable Mention Awards in 2015 and 2016, respectively. He is a senior member of China Computer Federation (CCF), technical committee member of ACM SIGSPATIAL China Chapter and CCF Ubiquitous Computing Committee. He serves as the PC members of several conferences including IJCAI and UIC.

Scroll to Top