Understanding when different CATE estimators succeed or fail is core to both scientific discovery and policy learning. This project builds intuition about the assumptions used in estimators (ignorability, overlap, smoothness, complexity of the outcome and propensity models) and how these interact with data-generating processes (DGPs).
Build a rigorous benchmark to compare the state-of-the-art CATE methods, i.e., doubly robust learners (DR-learner), double machine learning (DML), Meta learners (X-/T-/S-/DA-learners), Bayesian Additive Regression Trees (BART), deep-learning approaches (DragonNet, TarNet, RANet), as well as amortized methods like CausalPFN and Do-PFN, on families of DGPs explicitly constructed to favor each algorithm. Since there is no one-size-fits-all approach, analyze what conditions or assumptions lead to an estimator having better results.
We specify a few ideas one what aspects of DGPs to take into account when generating the synthetic benchmarks to create a diverse set. Feel free to use your own ideas.