Explain Codes LogoExplain Codes Logo

How to get JSON from webpage into Python script

python
json-fetching
http-requests
python-requests
Alex KataevbyAlex Kataev·Nov 12, 2024
TLDR

Use Python's requests module to fetch JSON data. If the library is not ready on your machine, simply run pip install requests to add it, and you are good to go. A get request to the target page can be made and parsed as JSON in a swift:

import requests json_data = requests.get('YOUR_WEBPAGE_URL').json() # It's JSON time! print(json_data)

Just replace 'YOUR_WEBPAGE_URL' with your target URL and voila, you've got JSON from a web source deployed into your Python script.

JSON Fetching: The Devil's in the Details

Decoding JSON data

When dealing with multiple APIs or different web data formats, you might need to decode the byte content before parsing as JSON. Nothing to panick, let Python do the heavy lifting:

import requests response = requests.get('YOUR_WEBPAGE_URL') data = response.content.decode('utf-8') # Decoding the Enigma code json_data = json.loads(data) # Presto! JSON in the house.

Error Handling in JSON fetching

There's no such thing as too safe. Handle exceptions to avoid abrupt crashes when the requested page fails or the data doesn't stick with the JSON format. This is how we do it:

import requests from requests.exceptions import RequestException import json try: response = requests.get('YOUR_WEBPAGE_URL') response.raise_for_status() json_data = response.json() except RequestException as e: print(f"HTTP Request failed, no chocolate cookie for you: {e}") except json.JSONDecodeError: print("Failed to decode JSON data. Try harder, Neo.")

Cracking the HTTP request

If you need additional control over a HTTP request or find another method suiting your purpose better, Python has got your back.

Riding with urllib.request

Python's standard library brings a champ urllib.request that can do wonders for HTTP requests. Though it's a bit verbose, it's always there when you need a friend:

import urllib.request import json url = "YOUR_WEBPAGE_URL" req = urllib.request.Request(url) try: with urllib.request.urlopen(req) as response: # Knocking HTTP's door data = response.read().decode('utf-8') # Translate the gibberish json_data = json.loads(data) # Ta-da! JSON magic. except urllib.error.URLError as e: print(f"Ouch! URL Error: {e.reason}") except json.JSONDecodeError: print("Whops! Failed to decode JSON data")

The art of setting request headers

Sometimes, certain webpages demand headers to be sent along with the request. No biggie, requests has got it covered:

import requests import json url = 'YOUR_WEBPAGE_URL' headers = {'User-Agent': 'My User Agent 1.0', 'From': '[email protected]'} try: response = requests.get(url, headers=headers) # Sending a hello with request response.raise_for_status() json_data = response.json() except requests.exceptions.RequestException as e: print(f"Uh-oh! Request failed: {e}")

Parsing JSON from APIs

Let's roll with the GitHub API. It provides JSON data as a gift about repositories:

import requests url = 'https://api.github.com/repos/psf/requests' json_data = requests.get(url).json() # GitHub API at your service! print(f"Fetching name of the repository: {json_data['name']}") print(f"Stars: {json_data['stargazers_count']}") # Who's the popular kid now, huh?

In this case, you're fetching and parsing JSON data to get the number of stars for the requests library repository.

Going beyond HTTP requests

requests is usually the first weapon of choice for its simplicity. However, for some special missions, you might need some special arms:

  • BeautifulSoup: When there is no API, and you need to decode HTML to extract JSON.
  • Scrapy: For intense web scraping and crawling scenes that need more than a simple JSON fetching.
  • aiohttp: For asynchronous web requests. Handy when handling a plethora of simultaneous HTTP connections.