Fetching data from APIs
Last updated on 2024-12-11 | Edit this page
Overview
Questions
- How to get data using a public API ?
- How to transform a JSON output to a panda dataframe ?
- Why the API documentation is essential to succeed in getting the expected data ?
Objectives
- Learn how to fetch data from an API using Python and load it into a Pandas DataFrame for analysis.
Prerequisites
- Basic understanding of Python (variables, functions, and loops).
- Python installed on your computer.
- Install the following libraries:
requests
andpandas
:
Steps
2. Understand the API Endpoint
For this exercise, we will use the World Development Indicators (WDI) API from the World Bank. This API provides economic and development data for countries. An example endpoint is:
http://api.worldbank.org/v2/country/all/indicator/NY.GDP.MKTP.CD?format=json&date=2020:2021
This URL fetches GDP data (indicator NY.GDP.MKTP.CD
) for
all countries for the years 2020 and 2021.
3. Make an API Request
Use the requests
library to fetch data from the API.
Example:
PYTHON
url = "http://api.worldbank.org/v2/country/all/indicator/NY.GDP.MKTP.CD?format=json&date=2020:2021"
response = requests.get(url)
data = response.json()
-
response.json()
converts the API response into a Python dictionary or list.
4. Extract Relevant Data
Examine the JSON response structure. For example, the WDI API returns a list where the second element contains the actual data. Extract the relevant data for analysis:
PYTHON
# Extract data from JSON response
data_records = data[1]
# Prepare a list for DataFrame creation
data_list = []
for record in data_records:
country = record["country"]["value"]
year = record["date"]
gdp_value = record["value"]
data_list.append({
"Country": country,
"Year": int(year),
"GDP": gdp_value
})
Complete Example Code
PYTHON
import requests
import pandas as pd
# API Request
url = "http://api.worldbank.org/v2/country/all/indicator/NY.GDP.MKTP.CD?format=json&date=2020:2021"
response = requests.get(url)
data = response.json()
# Extract Data
data_records = data[1]
data_list = []
for record in data_records:
country = record["country"]["value"]
year = record["date"]
gdp_value = record["value"]
data_list.append({
"Country": country,
"Year": int(year),
"GDP": gdp_value
})
# Create DataFrame
df = pd.DataFrame(data_list)
# Save DataFrame
df.to_csv("wdi_gdp_data.csv", index=False)
print("Data saved to wdi_gdp_data.csv")