Phases of the challenge
The challenge includes two phases.
Phase 1
In Phase 1, participants will predict the outcome using only the LISS data. Participants will be able to download the LISS training data on their own devices (after accepting the user statement; all the instructions will be sent to the registered participants) and run their methods on their own computers. They will submit their methods through a submission platform (see Submission).
Phase 1 will run from April 1 till the end of May, 2024.
Phase 2
In mid-June 2024, Phase 2 will start. It will run until mid-September 2024.
Phase 2 includes three tracks. Based on all the submissions that teams make in the first phase, we will select several of the best-performing teams (10-20 teams) for tracks 1 and 2 of the second phase to work inside the secure Remote Access (RA) environment of CBS, or Statistics Netherlands. You can read about how teams will be selected to participate in tracks 1-2 and the conditions for accessing the CBS data here.
Teams that are not selected into tracks 1 and 2 will continue working on the LISS data (this is track 3).
Tracks 1 and 2 differ on the holdout set for which the participants will predict the outcome. Participants themselves can choose which track(s) they will work on.
In the first track, participants will predict the fertility outcome for the LISS holdout set. This is similar to Phase 1/track 3, but the difference is that the LISS data can be linked to CBS data inside the RA environment.
In the second track, participants will instead predict the fertility outcome for the CBS holdout set.
This setup provides the participants of tracks 1 and 2 with a unique opportunity to develop and test multiple approaches to possibly enhance the performance of their methods by using both datasets.
Access to the CBS RA environment and CBS data is governed by strict rules and regulations in relation to data protection and privacy. One consequence of such rules is that access to this RA environment is only possible from the European Economic Area and a few other countries and is subject to the approval of CBS and passing security checks. Another issue in working in the CBS RA environment is that computing resources are constrained. Given the limitations, only a selection of teams can participate in the second phase. The costs of access to the CBS datasets will be covered by ODISSEI and access will be subject to the vetting and agreement of Statistics Netherlands and the ODISSEI Management Board under the general grant conditions of ODISSEI.