How to get JSON from webpage into Python script

python

json-fetching

http-requests

python-requests

byAlex Kataev·Nov 12, 2024

Use Python's requests module to fetch JSON data. If the library is not ready on your machine, simply run pip install requests to add it, and you are good to go. A get request to the target page can be made and parsed as JSON in a swift:

import requests
json_data = requests.get('YOUR_WEBPAGE_URL').json() # It's JSON time!
print(json_data)

Just replace 'YOUR_WEBPAGE_URL' with your target URL and voila, you've got JSON from a web source deployed into your Python script.

JSON Fetching: The Devil's in the Details

Decoding JSON data

When dealing with multiple APIs or different web data formats, you might need to decode the byte content before parsing as JSON. Nothing to panick, let Python do the heavy lifting:

import requests

response = requests.get('YOUR_WEBPAGE_URL')
data = response.content.decode('utf-8') # Decoding the Enigma code
json_data = json.loads(data) # Presto! JSON in the house.

Error Handling in JSON fetching

There's no such thing as too safe. Handle exceptions to avoid abrupt crashes when the requested page fails or the data doesn't stick with the JSON format. This is how we do it:

import requests
from requests.exceptions import RequestException
import json

try:
    response = requests.get('YOUR_WEBPAGE_URL')
    response.raise_for_status() 
    json_data = response.json() 
except RequestException as e:
    print(f"HTTP Request failed, no chocolate cookie for you: {e}")
except json.JSONDecodeError:
    print("Failed to decode JSON data. Try harder, Neo.")

Cracking the HTTP request

If you need additional control over a HTTP request or find another method suiting your purpose better, Python has got your back.

Riding with urllib.request

Python's standard library brings a champ urllib.request that can do wonders for HTTP requests. Though it's a bit verbose, it's always there when you need a friend:

import urllib.request
import json

url = "YOUR_WEBPAGE_URL"
req = urllib.request.Request(url)

try:
    with urllib.request.urlopen(req) as response: # Knocking HTTP's door
        data = response.read().decode('utf-8') # Translate the gibberish
        json_data = json.loads(data) # Ta-da! JSON magic.
except urllib.error.URLError as e:
    print(f"Ouch! URL Error: {e.reason}")
except json.JSONDecodeError:
    print("Whops! Failed to decode JSON data")

The art of setting request headers

Sometimes, certain webpages demand headers to be sent along with the request. No biggie, requests has got it covered:

import requests
import json

url = 'YOUR_WEBPAGE_URL'
headers = {'User-Agent': 'My User Agent 1.0', 'From': '[email protected]'} 

try:
    response = requests.get(url, headers=headers) # Sending a hello with request
    response.raise_for_status() 
    json_data = response.json() 
except requests.exceptions.RequestException as e:
    print(f"Uh-oh! Request failed: {e}")

Parsing JSON from APIs

Let's roll with the GitHub API. It provides JSON data as a gift about repositories:

import requests

url = 'https://api.github.com/repos/psf/requests'
json_data = requests.get(url).json() # GitHub API at your service!
print(f"Fetching name of the repository: {json_data['name']}")
print(f"Stars: {json_data['stargazers_count']}") # Who's the popular kid now, huh?

In this case, you're fetching and parsing JSON data to get the number of stars for the requests library repository.

Going beyond HTTP requests

requests is usually the first weapon of choice for its simplicity. However, for some special missions, you might need some special arms:

BeautifulSoup: When there is no API, and you need to decode HTML to extract JSON.
Scrapy: For intense web scraping and crawling scenes that need more than a simple JSON fetching.
aiohttp: For asynchronous web requests. Handy when handling a plethora of simultaneous HTTP connections.