A high performance, concurrent http client library for python with gevent
A high performance,
concurrent HTTP client library for python using
gevent.
gevent.httplib
support for patching http.client
was removed in
gevent 1.0,
geventhttpclient
now provides that missing functionality.
geventhttpclient
uses a fast http parser,
written in C.
geventhttpclient
has been specifically designed for high concurrency,
streaming and support HTTP 1.1 persistent connections. More generally it is
designed for efficiently pulling from REST APIs and streaming APIs
like Twitter’s.
Safe SSL support is provided by default. geventhttpclient
depends on
the certifi CA Bundle. This is the same CA Bundle which ships with the
Requests codebase, and is derived from Mozilla Firefox’s canonical set.
Since version 2.3, geventhttpclient
features a largely requests
compatible interface. It covers basic HTTP usage including cookie
management, form data encoding or decoding of compressed data,
but otherwise isn’t as feature rich as the original requests
. For
simple use-cases, it can serve as a drop-in replacement.
import geventhttpclient as requests
requests.get("https://github.com").text
requests.post("http://httpbingo.org/post", data="asdfasd").json()
from geventhttpclient import Session
s = Session()
s.get("http://httpbingo.org/headers").json()
s.get("https://github.com").content
This interface builds on top of the lower level HTTPClient
.
from geventhttpclient import HTTPClient
from geventhttpclient.url import URL
url = URL("http://gevent.org/")
client = HTTPClient(url.host)
response = client.get(url.request_uri)
response.status_code
body = response.read()
client.close()
geventhttpclient.httplib
module contains classes for drop in
replacement of http.client
connection and response objects.
If you use http.client directly you can replace the httplib
imports
by geventhttpclient.httplib
.
# from http.client import HTTPConnection
from geventhttpclient.httplib import HTTPConnection
If you use httplib2
, urllib
or urllib2
; you can patch httplib
to
use the wrappers from geventhttpclient
. For httplib2
, make sure you
patch before you import or the super()
calls will fail.
import geventhttpclient.httplib
geventhttpclient.httplib.patch()
import httplib2
HTTPClient
has a connection pool built in and is greenlet safe by design.
You can use the same instance among several greenlets. It is the low
level building block of this library.
import gevent.pool
import json
from geventhttpclient import HTTPClient
from geventhttpclient.url import URL
# go to http://developers.facebook.com/tools/explorer and copy the access token
TOKEN = "<MY_DEV_TOKEN>"
url = URL("https://graph.facebook.com/me/friends", params={"access_token": TOKEN})
# setting the concurrency to 10 allow to create 10 connections and
# reuse them.
client = HTTPClient.from_url(url, concurrency=10)
response = client.get(url.request_uri)
assert response.status_code == 200
# response comply to the read protocol. It passes the stream to
# the json parser as it's being read.
data = json.load(response)["data"]
def print_friend_username(client, friend_id):
friend_url = URL(f"/{friend_id}", params={"access_token": TOKEN})
# the greenlet will block until a connection is available
response = client.get(friend_url.request_uri)
assert response.status_code == 200
friend = json.load(response)
if "username" in friend:
print(f"{friend['username']}: {friend['name']}")
else:
print(f"{friend['name']} has no username.")
# allow to run 20 greenlet at a time, this is more than concurrency
# of the http client but isn't a problem since the client has its own
# connection pool.
pool = gevent.pool.Pool(20)
for item in data:
friend_id = item["id"]
pool.spawn(print_friend_username, client, friend_id)
pool.join()
client.close()
geventhttpclient
supports streaming. Response objects have a read(n)
and
readline()
method that read the stream incrementally.
See examples/twitter_streaming.py
for pulling twitter stream API.
Here is an example on how to download a big file chunk by chunk to save memory:
from geventhttpclient import HTTPClient, URL
url = URL("http://127.0.0.1:80/100.dat")
client = HTTPClient.from_url(url)
response = client.get(url.query_string)
assert response.status_code == 200
CHUNK_SIZE = 1024 * 16 # 16KB
with open("/tmp/100.dat", "w") as f:
data = response.read(CHUNK_SIZE)
while data:
f.write(data)
data = response.read(CHUNK_SIZE)
The benchmark runs 10000 GET
requests against a local nginx server in the default
configuration with a concurrency of 10. See benchmarks
folder. The requests per
second for a couple of popular clients is given in the table below. Please read
benchmarks/README.md
for more details. Also note, HTTPX is better be
used with asyncio
, not gevent
.
HTTP Client | RPS |
---|---|
GeventHTTPClient | 7268.9 |
Httplib2 (patched) | 2323.9 |
Urllib3 | 2242.5 |
Requests | 1046.1 |
Httpx | 770.3 |
Linux(x86_64), Python 3.11.6 @ Intel i7-7560U
This package is distributed under the MIT license.
Previous versions of geventhttpclient used http_parser.c
, which in turn was
based on http/ngx_http_parse.c
from NGINX, copyright Igor
Sysoev, Joyent, Inc., and other Node contributors. For more information, see
http://github.com/joyent/http-parser