Intro to the CBS data

LISS
dataset
guide

Here we provide some general information about the datasets based on Dutch registers, that are available for selected teams in the second part of PreFer.

Authors

Lisa Sivak

Gert Stulp

Published

July 16, 2024

The Netherlands use administrative data from different registers (for example, municipal personal records database and tax administration databases) instead of census as a main source of information about the population.

Statistics Netherlands (CBS) collects administrative data from different registers with information about persons, households, jobs, businesses, dwellings, and more. All personal identifiers in these datasets are replaced with anonymous linkable keys, such as anonymous personal ID, household ID, and address ID. CBS maintains these datasets, imputes missing data, combines them to derive additional characteristics (e.g. infer unmarried cohabitation based on addresses and tax data) and to even create whole new datasets such as networks of the entire Dutch population. This data is used for official statistics and also provided to scientists from selected organizations for scientific research under strict conditions.

Useful resources to understand and navigate CBS data

  1. Prins K. Population register data, basis for the Netherlands Population Statistics: describes Personal Records Database which is one of the main sources of CBS micro-data.

  2. van de Laan D.j.: describes how the original network datasets were created.

  3. Updates in the methodology of constructing network datasets: provides insights not only about the network datasets and changes in methodology of establishing ties, but also about other CBS datasets (e.g. details about how some characteristics are inferred which are not mentioned elsewhere).

  4. ODISSEI portal: here you can search by the dataset name or keywords for different CBS datasets and read the documentation (to translate descriptions of datasets and variables from Dutch, use Chrome and enable auto-translate)

  5. Microdata catalog: the list of all available datasets with documentation (pdf files, in Dutch)

CBS datasets used in PreFer

We selected several most relevant CBS datasets which can be used in PreFer (only data before 2021). Participants can also request additional datasets but have to provide argumentation. We also created a base file which includes variables from different datasets (mostly from 2020). The codebook for this base dataset (downloadable from here) includes some comments which might be useful even if you will not use the base file.

General comments

CBS uses different structures to store information. This is important to keep in mind to avoid mistakes in linking datasets and prevent using information from 2021 and later.

  • BUS in the name of a dataset means that there are several rows per unit of observation. A new row is added if a change occur. For example, if someone gets divorced, a new row is added to this person in the GBABURGERLIJKESTAATBUS dataset which covers civil status of Dutch residents.

  • TAB in the name of a dataset means that there is only one row per unit of observation. For example, each yearly dataset about the highest level of education (HOOGSTEOPLTAB) includes one row per person.

  • For most datasets, separate files for each year are available. Some of these yearly files include only information from a certain year (e.g. SPOLISBUS - information about people’s jobs that year; HOOGSTEOPLTAB - highest level of education that year, INPATAB - personal income that year). Some yearly datasets are cumulative: they include all data up to a certain year. For example, GBAPERSOONTAB yearly files include personal information about all people registered in the Netherlands up to December 31 YYYY. It is important to use only files up to (and including) 2020 in PreFer.

  • In some cases, there is only one file which is updated each year; when a new version is released, the older version becomes unavailable. Examples include GBABURGERLIJKESTAAT (civil status - married/registered partnership/single/etc.) and SECMBUS (socio-economic category). If you use these datasets, make sure you don’t use information starting from January 1, 2021 (civil status and socio-economic category with start date on January 1, 2021 or later).

Brief description and some tips

  1. GBAPERSOONTAB (CBS codebook, ODISSEI portal): demographic characteristics (e.g. gender, year of birth, country of origin) of people included in the Municipal Personal Records Database (BRP). Most of them are Dutch residents; but from 2014 the dataset also includes people who are not residents but have relationships with Dutch authorities (e.g. work in the Netherlands but don’t live there long-term). There is one file per year (starting from 2009). Each file includes people who are registered in the BRP from January 1, 1994 to December 31, YYYY. People who were registered before January 1, 1994, but not after, are only partly covered.

  2. GBAADRESOBJECTBUS (CBS codebook, ODISSEI portal): addresses of people registered in the Netherlands: address ID, date of registration at this address, and end date of address (when a person moved out or died). There are multiple rows per person (one row for each address). Also includes object ID (i.e. house ID) which can be used to link neighborhood characteristics to individuals.

    There is one file per year (starting from 2009). Each file contains information about all past and current addresses of all people registered in the Netherlands from January 1, 1994 to December 31, YYYY. People who were registered before January 1, 1994, but not after, are only partly covered.

    People who plan to stay in the Netherlands longer than 4 months are obliged to register at an address. This file includes all long-term residents of the Netherlands. Additionally, some people who stay in the Netherlands short-term can also register; they are also included in this data.

    When a person moves, she/he has to report the move (with an end date of previous address and start date of new address). Most people do that. Representative survey showed that most people (~95%) actually live at an address where they are registered Prins, 2017.

  3. HOOGSTEOPLTAB (CBS codebook, ODISSEI portal): highest level of education of people registered in the Netherlands. There is one file per year (with data about the highest level of education on October YYYY; one row per person) starting from 1999.

    The file is based on data from various registers and the Occupational Population Survey (EBB) which is used to impute missing data. Although the coverage rate is high, the dataset does not represent the entire target population. For older people (>40) and for immigrants information is more often missing than for younger native Dutch population. Private education is also undercovered.

  4. SECMBUS (CBS codebook, ODISSEI portal): socio-economic category of persons (e.g. employee, self-employed, social benefit recipient). Socio-economic category is established based on the datasets about employment, self-employment, study, social benefits, etc. There is only one file, updated every year, which includes information from 1999.

    There are multiple rows per person. The CBS codebook says that the dataset includes data on the socio-economic category in a given month, but the data is not per month; it contains the start and end date of all socio-economic categories that a person had. Sometimes a new row about socio-economic category is added for a person even if it hasn’t changed.

  5. SPOLISBUS (CBS codebook, ODISSEI portal): characteristics of jobs. Self-employment is not covered in this file; only people who were employed at companies are included.

    There is one file per year (starting from 2010), with data about employment this year. Each yearly file includes one row per person per job per period (usually per month). The dataset is huge. Don’t read in the whole file, pre-select variables.

  6. GBABURGERLIJKESTAATBUS (CBS codebook, ODISSEI portal): civil status of people registered in BRP (single, married, registered partnership, divorced, etc.), with start date and end date of each status. There are multiple rows per person (about each status). For current statuses (e.g. if the person is single) the end date is in the future, coded as 20501231. If this person marries, the end date is updated, and a new row about marriage is added. There is only one file which is updated every year.

    Note that sometimes divorce take months; because of that there are people who are formally married but they are not a married couple any more. Living together (e.g. belonging to the same household or having the same address) or being labelled as partners in the Familienetwerk dataset might be an additional parameter to use to estimate who are actually married.

  7. GBAVERBINTENISPARTNERBUS (CBS codebook, ODISSEI portal): personal IDs of spouses/registered partners. These IDs can be used to link information from other datasets about partners.

  8. INPATAB (CBS codebook, ODISSEI portal): personal income in year YYYY (one file per year starting from 2011) for all people who belonged to the Dutch population on January 1 of that year. The CBS codebook says that each dataset inlcudes income of individuals on January 1 of the survey year, but it includes information about income on the end of YYYY (from tax declarations).

  9. INHATAB (CBS codebook, ODISSEI portal): household income in year YYYY (one file per year starting from 2011). The CBS codebook says that each dataset includes income of households on January 1 of the survey year, but it includes information about income on the end of YYYY (derived from tax declarations). To link household income data to individuals, use the KOPPELPERSOONHUISHOUDEN dataset of the same reporting year. These files only contain the relationship between persons (RINPERSONS+RINPERSON) and special household IDs used for income data (RINPERSONSHKW+RINPERSONHKW).

  10. VEHTAB (CBS codebook, ODISSEI portal): household wealth data in year YYYY (one file per year starting from 2006). The CBS codebook says that each dataset inlcudes information about wealth of households on January 1 of the survey year, but it includes information on the end of YYYY (derived from tax declarations). To link household wealth data to individuals, use the KOPPELPERSOONHUISHOUDEN dataset of the same reporting year. These files only contain the relationship between persons (RINPERSONS+RINPERSON) and special household IDs used for income data (RINPERSONSHKW+RINPERSONHKW).

  11. NABIJHEIDKINDOPVTAB (CBS codebook, ODISSEI portal): distance from objects (e.g. houses) to childcare (one file per year starting from 2012). To link information about objects to individuals, use the GBAADRESOBJECTBUS which includes personal IDs (RINPERSOON and RINPERSOONS) and object IDs (SOORTOBJECTNUMMER and RINOBJECTNUMMER).

  12. VBOWONINGTYPETAB (CBS codebook, ODISSEI portal): type of housing (one file per year starting from 2020) for each residential object. To link information about objects to individuals, use the GBAADRESOBJECTBUS which includes personal IDs (RINPERSOON and RINPERSOONS) and object IDs (SOORTOBJECTNUMMER and RINOBJECTNUMMER).

  13. GBAHUISHOUDENSBUS (CBS codebook, ODISSEI portal): which household a person belongs to, with household ID, start date and end date of each household, and characteristics of this household. One file per year, starting from 1995. Each file includes information about all households which a person belonged to for all people who have ever been registered in the Netherlands up to December 31 YYYY.

To identify members of the same household, use two household identifiers: start date of the household (DATUMAANVANGHH) and household ID (HUISHOUDNR).

Note that when either household address or household composition changes, it receives a new combination of the date of start of the household and the household ID.

Several different households can live at the same address. Household membership is in some cases inferred using the data about taxes, social benefits and allowances, and also survey data (Occupational Population Survey (EBB)), in addition to the information about addresses. Some characteristics of the household are also inferred based on this data. For example, the household type, e.g. whether two people who live together but are not married/not in a registered partnership are unmarried partners or not.

Note that Familienetwerktab is more conservative than the HUISHOUDENSBUS in labeling people as partners (slightly less people are labelled as unmarried couples in Familienetwerktab). In Familienetwerktab, people form an unmarried couple if they have a joint child or if they are tax partners, allowance partners or cohabiting partners for social assistance or if they have moved at the same time to the same address (and they are not brothers/sisters or parent/child).

  1. VSLGWBTAB (CBS codebook, ODISSEI portal): municipal and neighborhood codes of each object. Can be used to link regional characteristics to individuals. To link information about objects to individuals, use the GBAADRESOBJECTBUS which includes personal IDs (RINPERSOON and RINPERSOONS) and object IDs (SOORTOBJECTNUMMER and RINOBJECTNUMMER).

The network files are listed below. For each network dataset, there is one file per year, starting from 2009. Each file includes IDs of people who are registered in the Netherlands on January YYYY; for each person, IDs of people in this person’s network are listed. Note that only people who are registered in the Netherlands on January YYYY are included as ego’s ties.

  1. Familienetwerktab CBS codebook, ODISSEI portal): the network of family relations.

  2. Burennetwerktab CBS codebook, ODISSEI portal): neighbors network.

  3. Colleganetwerktab CBS codebook, ODISSEI portal): colleagues network.

  4. Huisgenotennetwerktab CBS codebook, ODISSEI portal): household network. Note that here a ‘household’ means people living at the same address.

  5. Klasgenotennetwerktab CBS codebook, ODISSEI portal): classmates network.

Photo by Nastya Dulhiier on Unsplash