Every single machine learning (ML)-driven breakthrough, from language translation to Go championships, was immediately preceded not by an algorithmic innovation but by a massive, well-controlled, purpose-built dataset. An ML-driven breakthrough in medicinal chemistry is going to require a similar dataset, but such high-quality, well-structured experimental data doesn’t exist yet. We're building that data set by making physical measurements of lots of targets (with automated in-house protein production) screened against lots of compounds (with DNA-encoded chemical libraries). We're also building the ML models, that will eventually be generalizable across any protein and novel medicines using insights from our models.