Data Pre Processing

Here we show the functions used to create all of our raw data. These functions show where the primary data sources can be found.

damage_assessment(file_name: str)

Satellite damage assessments provides us with a damage assessment table. This table contains each damaged building’s district location and its damage serverity (moderate, severe, and destroyed).

The total number of buildings per city with moderate and severe damaged:


The total number of buildings which were destroyed per city:


health_facilities(file_name: str)

TODO Who provided this data?

We drop any of the rows containing Other as a type

We apply this mapping to the df and count the number of facilities and their functionality per district

iom_displacement(file_name: str)

IOM has collected raw numbers of IDPs in Yemen. This data can be found on HDX. From sheet ‘01_1_IDPs Origin’ of this csv file we extract the following fields:

  • District Pcode

  • Total IDPs Individuals

  • District of Origin Pcode

From this fields we can calculate:

Total number of idps in each district


Number of people who left each district and did not return


Also from IOM displacement data (sheet ‘01_2Returnees Origin’) we can extract:

Number of people who returned to the district after being an IDP


nightlight(file_name: str)

Using the methodology described in Nightlight analysis we are provided with a table containing the average monthly floating point radiance value (nanoWatts/cm2/sr), per city.


nvdi(file_name: str)

Here for the docs, what do we do with NVDI?

projected_census(file_name: str)

HDX provides us the 2017 projected census table from the Yemen Bureau of statistics.


The 2017 population estimate at a district level, assuming there was no war.

schools(file_name: str)

HDX provides us with a school data table from 2017. This table contains information about the each schools functionality status.

\[{n}_{{schools}_{non\, functional}} \]

The total number of functional and non functional schools per city.

Note about this data set:

  • There is no data provided for Marib

wfp_prices(file_name: str, base_dir: str)

HDX houses WFP data on prices of common goods and services for 24 citys. Each row in this data set contains a price observation for a specific item in a specific city.

The median daily labor rate per year

This is calculated on a national (24 city Median) level


As well as on a city level


The median cost of the food basket per year

This is calculated on a national (24 city Median) level


As well as on a city level


The food basket consists of the following items, weighted according to calorie value.

  • Oil: 0.55

  • Onion: 1.42

  • Red beans: 0.81

  • Sugar: 1.63

  • Wheat: 6.45

For each year we sum the different food items multiplied by their respective weight in order to calculate the cost of a food basket.

Notes about this data set:

  • Data starts at 2009, but is minimal. In 2012 there is more data availabe but not for all of our citys of interest. In 2016 data is available for all of our citys of interest.

  • Wage info starts in 2016 we are using “(qualified labour) Retail”

Citys with data:

‘Zungubar City’ ‘Addaleh Town’ ‘Aden City’ ‘Al Bayda City’ ‘Al Hudaydah City (Hodieda)’ ‘Al Hazum’ ‘Al Ghaidha’ ‘Mahweet City’ ‘Amran City’ ‘Dhamar City’ ‘Soqatra (Hudaibo)’ ‘Sayoun City’ ‘Mukalla City’ ‘Haradh Town’ ‘Hajjah City’ ‘Ibb City’ ‘Al Hawta Town’ ‘Marib City’ “Sa’ada City” ‘Bani Matar’ ‘Attaq Town’ ‘Taizz City’ “Sana’a City” ‘Al Jabeen’