Discussion about this post

User's avatar
Ian Gow's avatar

Of course there are many ways to read Parquet files lazily. DuckDB, Arrow, Polars, and more. No need to learn SQL (even before AI).

I suspect that "curating" data like this (1) requires specialist expertise (even with Claude Code/Codex) and (2) really makes no sense to have many researchers doing in parallel. I think universities (and others) should sponsor more curation and offer it as a public good. If data curation were rewarded nearly as well as yet-another-DiD-study papers …

I recently used collection of Call Reports as a case study of AI-assisted curation (though even that is probably much easier in the few months since): https://iangow.github.io/notes/published/curate_call_reports.html

No posts

Ready for more?