Explain Codes LogoExplain Codes Logo

Sending "User-agent" using Requests library in Python

python
user-agent
web-scraping
http-headers
Alex KataevbyAlex Kataev·Aug 17, 2024
TLDR

Set a User-Agent in Requests using the headers parameter:

import requests requests.get('http://example.com', headers={'User-Agent': 'Custom Agent'})

This simple yet powerful code line sends a GET request with a custom User-Agent.

Unmasking "User-Agent"

Understanding the User-Agent header is a cornerstone in HTTP communication. This field identifies the client software originating the request, allowing the server to optimise the content delivery.

Suits for session: Persistent User-Agent

The headers of a session are like the session’s wardrobe: they persist.

session = requests.Session() session.headers.update({'User-Agent': 'Persistent Agent'}) # Session now wears a suit! response = session.get('http://example.com')

This consistent User-Agent gets used across requests within the session.

Playing nice: Keeping Defaults

In older versions of the requests, keep your default headers and add your special ones:

headers = requests.utils.default_headers() headers.update({'User-Agent': 'Combiner Agent'}) # Adding to an outfit rather than replacing! response = requests.get('http://example.com', headers=headers)

This technique maintains the default headers in each request.

Eavesdropping on ourselves: Debug Headers

Ensure the headers are being sent as expected:

request = requests.Request('GET', 'http://example.com', headers={'User-Agent': 'Debugger Agent'}) prepared = request.prepare() print(f"Headers sent: {prepared.headers}") # No secrets!

This allows you to debug sent headers effectively.

Champion Sessions: Handling cookies and connections

Sessions can handle cookies and persistent connections. They're like multitasking superheroes of headers management:

with requests.Session() as session: session.headers.update({'User-Agent': 'Session Magician'}) # Not just an Agent, now a Magician! # Further requests, reusing the session

Privacy tip: To avoid tracking cookies, switch off cookie handling:

session = requests.Session() session.cookies.clear_session_cookies() # No cookie crumbs left behind!

Uppercase, Lowercase: It doesn't matter

Header names in a Python dictionary are case-insensitive:

headers = { 'user-agent': 'No-Yelling', 'content-type': 'text/html', } # The server won't get mad if you forget to capitalize

Stealth mode: Mimicking Browsers

Crafting the User-Agent to mimic a real browser can help to evade detection:

headers = {'User-Agent': 'Mozilla/5.0 (compatible; HandsomeBrowser/1.2)'} # Like a disguise mustache for your browser!

This can improve access during web scraping operations.

Imitation game: User-Agent disguise

A well-crafted User-Agent strings make your requests look like they come from major browsers:

headers = {'User-Agent': 'Mozilla/5.0...'} # I swear I’m Firefox!

This can improve your requests' passability, especially when web scraping.