Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.2.1 #3

Merged
merged 8 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "pyreqwest_impersonate"
version = "0.2.0"
version = "0.2.1"
edition = "2021"
description = "HTTP client that can impersonate web browsers, mimicking their headers and `TLS/JA3/JA4/HTTP2` fingerprints"
authors = ["deedy5"]
Expand Down
38 changes: 23 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

The fastest python HTTP client that can impersonate web browsers by mimicking their headers and `TLS/JA3/JA4/HTTP2` fingerprints.</br>
Binding to the Rust `reqwest_impersonate` library.</br>
🏁 Check the benchmarks for more details.
🏁 Check the [benchmark](https://github.com/deedy5/pyreqwest_impersonate/tree/main/benchmark) for more details.


Provides precompiled wheels:
Expand All @@ -29,7 +29,7 @@ pip install -U pyreqwest_impersonate
## Usage
### I. Client

A blocking HTTP client that can impersonate web browsers.
A blocking HTTP client that can impersonate web browsers. Not thread-safe!
```python3
class Client:
"""Initializes a blocking HTTP client that can impersonate web browsers.
Expand All @@ -53,6 +53,10 @@ class Client:
verify (bool, optional): Verify SSL certificates. Default is True.
http1 (bool, optional): Use only HTTP/1.1. Default is None.
http2 (bool, optional): Use only HTTP/2. Default is None.

Note:
The Client instance is not thread-safe, meaning it should be initialized once and reused across a multi-threaded environment.

"""
```

Expand Down Expand Up @@ -92,7 +96,8 @@ Performs a POST request to the specified URL.
```python
from pyreqwest_impersonate import Client

client = Client(impersonate="chrome_123")
# Not thread-safe! Initialize the Client instance once and reuse it across threads
client = Client(impersonate="chrome_123")

# get request
resp = client.get("https://tls.peet.ws/api/all")
Expand All @@ -118,28 +123,31 @@ TODO

#### Response attributes and methods

- `cookies`: Fetches the cookies from the response as a dictionary.
- `headers`: Retrieves the headers from the response as a dictionary.
- `status_code`: Gets the status code of the response as an integer.
- `url`: Returns the URL of the response as a string.
- `content`: Provides the content of the response as bytes.
- `text`: Decodes the response body into text, automatically detecting the character encoding.
- `json()`: Parses the response body as JSON, converting it into a Python object for easy manipulation.
- `content` (bytes): Provides the content of the response as bytes.
- `cookies` (dict): Fetches the cookies from the response as a dictionary.
- `headers` (dict): Retrieves the headers from the response as a dictionary.
- `json()` (function): Parses the response body as JSON, converting it into a Python object for easy manipulation.
- `raw` (list[int]): Contains the raw byte representation of the HTTP response body.
- `status_code` (int): Gets the status code of the response as an integer.
- `text` (str): Decodes the response body into text, automatically detecting the character encoding.
- `url` (str): Returns the URL of the response as a string.

#### Example

```python
from pyreqwest_impersonate import Client

# Not thread-safe! Initialize the Client instance once and reuse it across threads
client = Client()

response = client.get("https://example.com")

print(response.status_code) # Access the status code
print(response.url) # Access the URL
print(response.headers) # Access headers
print(response.cookies) # Access cookies
print(response.content) # Get the content as bytes
print(response.text) # Decode the content as text
print(response.cookies) # Access cookies
print(response.headers) # Access headers
print(response.json()) # Parse the content as JSON
print(response.raw) # Raw response
print(response.status_code) # Access the status code
print(response.text) # Decode the content as text
print(response.url) # Access the URL
```
12 changes: 6 additions & 6 deletions benchmark/1_threads.csv
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name,threads,cpu_time 50k,cpu_time 5k,duration 50k,duration 5k
curl_cffi,1,5.735,1.521,7.957,3.25
httpx,1,3.801,2.116,6.117,3.987
pyreqwest_impersonate,1,0.855,0.297,1.977,1.238
requests,1,5.787,2.814,8.355,4.73
tls_client,1,6.414,1.96,6.941,3.153
name,threads,cpu_time 50k,cpu_time 5k,time 50k,time 5k
curl_cffi 0.6.2,1,5.617,1.618,7.681,3.367
httpx 0.27.0,1,2.58,1.934,4.206,3.605
pyreqwest_impersonate 0.2.1,1,1.706,0.38,3.486,1.133
requests 2.31.0,1,4.852,3.121,6.993,4.743
tls_client 1.0.1,1,5.608,1.87,6.333,2.71
12 changes: 6 additions & 6 deletions benchmark/4_threads.csv
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name,threads,cpu_time 50k,cpu_time 5k,duration 50k,duration 5k
curl_cffi,4,4.014,1.255,1.567,0.866
httpx,4,2.105,1.461,1.505,1.307
pyreqwest_impersonate,4,1.15,0.399,0.875,0.751
requests,4,4.14,3.006,3.356,2.802
tls_client,4,3.803,1.357,1.382,0.832
name,threads,cpu_time 50k,cpu_time 5k,time 50k,time 5k
curl_cffi 0.6.2,4,3.859,1.124,1.415,0.703
httpx 0.27.0,4,2.172,1.422,1.485,1.228
pyreqwest_impersonate 0.2.1,4,1.168,0.477,2.025,1.617
requests 2.31.0,4,4.036,3.237,3.221,3.08
tls_client 1.0.1,4,3.52,1.185,1.252,0.723
85 changes: 56 additions & 29 deletions benchmark/benchmark.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from importlib.metadata import version
import pandas as pd
import requests
import httpx
Expand All @@ -8,32 +9,48 @@
import curl_cffi.requests

results = []
PACKAGES = [
("requests", requests.Session),
("httpx", httpx.Client),
("tls_client", tls_client.Session),
("curl_cffi", curl_cffi.requests.Session),
("pyreqwest_impersonate", pyreqwest_impersonate.Client),
]


def add_package_version(packages):
return [(f"{name} {version(name)}", classname) for name, classname in packages]


def session_get_test(session_class, requests_number):
s = session_class()
for _ in range(requests_number):
s.get(url).text


PACKAGES = add_package_version(PACKAGES)

# one thread
requests_number = 2000
for response_size in ["5k", "50k"]:
url = f"http://127.0.0.1:8000/{response_size}"
print(f"\nOne worker, {response_size=}, {requests_number=}")
for name, session_class in [
("requests", requests.Session),
("httpx", httpx.Client),
("tls_client", tls_client.Session),
("curl_cffi", curl_cffi.requests.Session),
("pyreqwest_impersonate", pyreqwest_impersonate.Client),
]:
for name, session_class in PACKAGES:
start = time.perf_counter()
cpu_start = time.process_time()
session_get_test(session_class, requests_number)
dur = round(time.perf_counter() - start, 3)
cpu_dur = round(time.process_time() - cpu_start, 3)
results.append({"name": name, "threads": 1, "response_size": response_size, "duration": dur, "cpu_time": cpu_dur})
print(f" name: {name:<22} {response_size=} {dur=} {cpu_dur=}")
results.append(
{
"name": name,
"threads": 1,
"size": response_size,
"time": dur,
"cpu_time": cpu_dur,
}
)
print(f" name: {name:<30} time: {dur} cpu_time: {cpu_dur}")


# multiple threads
Expand All @@ -42,36 +59,46 @@ def session_get_test(session_class, requests_number):
for response_size in ["5k", "50k"]:
url = f"http://127.0.0.1:8000/{response_size}"
print(f"\n{threads_number} workers, {response_size=}, {requests_number=}")
for name, session_class in [
("requests", requests.Session),
("httpx", httpx.Client),
("tls_client", tls_client.Session),
("curl_cffi", curl_cffi.requests.Session),
("pyreqwest_impersonate", pyreqwest_impersonate.Client),
]:
for name, session_class in PACKAGES:
start = time.perf_counter()
cpu_start = time.process_time()
with ThreadPoolExecutor(threads_number) as executor:
futures = [executor.submit(session_get_test, session_class, requests_number) for _ in range(threads_number)]
futures = [
executor.submit(session_get_test, session_class, requests_number)
for _ in range(threads_number)
]
for f in as_completed(futures):
f.result()
dur = round(time.perf_counter() - start, 3)
cpu_dur = round(time.process_time() - cpu_start, 3)
results.append({"name": name, "threads": threads_number, "response_size": response_size, "duration": dur, "cpu_time": cpu_dur})
print(f" name: {name:<22} {response_size=} {dur=} {cpu_dur=}")

results.append(
{
"name": name,
"threads": threads_number,
"size": response_size,
"time": dur,
"cpu_time": cpu_dur,
}
)
print(f" name: {name:<30} time: {dur} cpu_time: {cpu_dur}")


df = pd.DataFrame(results)
pivot_df = df.pivot_table(index=['name', 'threads'], columns='response_size', values=['duration', 'cpu_time'], aggfunc='mean')
pivot_df = df.pivot_table(
index=["name", "threads"],
columns="size",
values=["time", "cpu_time"],
aggfunc="mean",
)
pivot_df.reset_index(inplace=True)
pivot_df.columns = [' '.join(col).strip() for col in pivot_df.columns.values]
pivot_df = pivot_df[['name', 'threads'] + [col for col in pivot_df.columns if col not in ['name', 'threads']]]
unique_threads = pivot_df['threads'].unique()
pivot_df.columns = [" ".join(col).strip() for col in pivot_df.columns.values]
pivot_df = pivot_df[
["name", "threads"]
+ [col for col in pivot_df.columns if col not in ["name", "threads"]]
]
unique_threads = pivot_df["threads"].unique()
for thread in unique_threads:
thread_df = pivot_df[pivot_df['threads'] == thread]
thread_df = pivot_df[pivot_df["threads"] == thread]
print(f"\nTable for {thread} threads:")
print(thread_df.to_string(index=False))
thread_df.to_csv(f'{thread}_threads.csv', index=False)



thread_df.to_csv(f"{thread}_threads.csv", index=False)
6 changes: 0 additions & 6 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,7 @@ dependencies = []
[project.optional-dependencies]
dev = [
"pytest>=8.1.1",
"pytest-retry>=1.6.2",
]

[tool.maturin]
features = ["pyo3/extension-module"]

[tool.pytest.ini_options]
retries = 3
retry_delay = 0.5
cumulative_timing = false
Loading
Loading