About the Workshop
As foundation models scale, available training data sources have rapidly depleted. However, several forms of valuable data artifacts such as medical records, legal, and financial documents are restricted from use in model training due to their sensitive nature. In addition, the strong reasoning capabilities in current generative models have opened the possibility for highly personalizable AI applications but these remain bottlenecked by limited access to high quality user data. Hence, it is of immense value to responsibly unlock these data sources (for example: using data transformation or constrained training paradigms) or to generate synthetic alternatives. In this workshop, we aim to bring together domain experts in data, privacy, model training, and legal policy, to advance the frontier of responsibly leveraging such sensitive data with foundation models.
Topics of interest include (but are not limited to):
- Data Transformation: De-identification, Anonymization, Pseudonymization.
- Synthetic Data Generation: Controlled Regeneration, Data Diversity.
- Novel Training Paradigms: DP, Federated Learning, Architectural Solutions.
- Evaluation & Auditing: Privacy attack benchmarks, Utility-Privacy tradeoffs.
- Policy: Compliance, New regulations on data sharing.
Call for Papers
We invite long papers with novel research contributions (up to 8 pages long) as well as short papers (up to 4 pages) reflecting preliminary studies or negative results.
Submissions are managed via OpenReview. Accepted papers are non-archival, and concurrent submissions are allowed. Please follow the COLM 2026 template.
Key Dates
All deadlines are 23:59 AoE (anywhere on earth)
- Submissions open: May 27, 2026
- Submission deadline: June 23, 2026
- Acceptance Notifications: July 24, 2026
- Workshops day at COLM: October 9, 2026
Speakers
Janel Thamkul
Deputy General Counsel, ex-Anthropic, ex-Google
Committee
Program Committee
If you wish to join the program committee, please signup here.