Description of the submission process.

In Phase 1, participants are asked to submit their methods (code used for data preprocessing and training), rather than the predicted values themselves. The submission platform supports Python and R. Methods will be published open source in a GitHub repository. In GitHub methods are automatically run on example data to check for errors. If the checks are successful then the method can be submitted and will be evaluated on the holdout dataset. This workflow fosters computational reproducibility, which was a concern in the Fragile Families Challenge in which participants only submitted their predictions (Liu & Salganik, 2019). This also allows us to run the method on different (or future) variants of the data. For the submissions, participants will use the open-source web platform Next. It allows for reproducible submissions in data challenges in which data is not publicly available, and therefore common solutions like Kaggle are not possible. Instructions on how to submit to the platform and example code will be provided on a website dedicated to the data challenge.

In Phase 2, participants cannot make use of the submission platform because the register data is only available within the RA environment, which is why participants are asked to submit predicted values (generated by their method) by saving them in a specified folder inside the RA environment along with the trained model and all scripts used for data preprocessing and model training.

Photo by Kelli McClintock on Unsplash