You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I can't find where this is coming from or what it means agent4 run prompt=[Scrubbed due to 'Cookie']? It happens when I run Pydantic AI with logging turned on. I'm using Playwright, but didn't find any "Scrubbed" keyword in both Pydantic AI nor Playwright source, or any hits on Google. Where is this coming from and could it impact the results? If I prompt Qwen in Ollama it returns the JSON I expect.
Any help would be appreciated to understanding if the prompt is being scrubbed to assist on how to remedy it, or maybe there's another way of tackling this issue? TIA.
importasynciofrompprintimportpprintfromtypingimportList, Optional, TypedDict, Unionimporthttpxfrompydantic_aiimportAgentfrompydantic_ai.models.ollamaimportOllamaModelimporthtml2textimportnest_asyncioimportlogfirefromloggingimportbasicConfigfromplaywright.async_apiimportasync_playwrightfrompydanticimportBaseModel, Fieldfromtqdmimporttqdmfromdevtoolsimportdebugnest_asyncio.apply()
logfire.configure(send_to_logfire='if-token-present')
logfire.ConsoleOptions.min_log_level='trace'logfire.ConsoleOptions.verbose=TruebasicConfig(handlers=[logfire.LogfireLoggingHandler()])
# %%USER_AGENT="Mozilla/5.01 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36"#MODEL = "llama3.2:latest"MODEL="qwen2.5-coder:14b"llm=OllamaModel(model_name=MODEL)
# %%SYSTEM_PROMPT="""You're an expert text extractor. You extract information from webpage content.Always extract data without changing it and any other output.Ignore everything but car information."""defcreate_scrape_prompt(page_content: str) ->str:
returnf"""Convert the following list of cars into valid JSON format, including details such as model, features, horsepower,price, mileage, and year for each car. The list includes:\```{page_content}\```""".strip()
playwright=Nonebrowser=Noneasyncdeffetch_page(url, user_agent=USER_AGENT) ->str:
globalplaywright, browserifplaywright==None:
playwright=awaitasync_playwright().start()
browser=awaitplaywright.chromium.launch()
context=awaitbrowser.new_context(user_agent=USER_AGENT)
page=awaitcontext.new_page()
awaitpage.goto(url, timeout=10000)
content=awaitpage.content()
markdown_converter=html2text.HTML2Text()
markdown_converter.body_width=0markdown_converter.ignore_links=False#return contentreturnmarkdown_converter.handle(content)
auto_content=''asyncdefinner_fetch_page():
globalauto_contentprint('fetching page')
auto_content=awaitfetch_page("https://www.autoscout24.com/lst?atype=C&cy=D%2CA%2CB%2CE%2CF%2CI%2CL%2CNL&desc=0&fregfrom=2018&gear=M&powerfrom=309&powerto=478&powertype=hp&search_id=1tih4oks815&sort=standard&ustate=N%2CU")
print('fetched page', auto_content)
asyncio.run(inner_fetch_page())
classCarListing(TypedDict):
"""Information about a car listing"""make: str|None=Field("Make of the car e.g. Toyota", examples=["Toyota", "Lexus"])
model: str|None=Field("Model of the car, maximum 3 words e.g. Land Cruiser", examples=["Land Cruiser", "RC F Advantage"])
horsepower: str|None=Field("Horsepower (HP) of the engine e.g. 231", examples=["231", "467"])
price: str|None=Field("Price in euro e.g. 34000", examples=["34,000", "45000"])
mileage: str|None=Field("Number of kilometers on the odometer e.g. 73400", examples=["73400", "12,000"])
year: str|None=Field("Year of registration (if available) e.g. 2015" , examples=["2015", "2020"])
url: str|None=Field(
"Url to the listing e.g. https://www.autoscout24.com/offers/lexus-rc-f-advantage-coupe-gasoline-grey-19484ec1-ee56-4bfd-8769-054f03515792",
examples=["https://www.autoscout24.com/offers/lexus-rc-f-advantage-coupe-gasoline-grey-19484ec1-ee56-4bfd-8769-054f03515792"]
)
classCarListings(BaseModel):
"""List of car listings"""cars: List[CarListing] =Field("List of cars for sale.")
ollama_model=OllamaModel(
model_name=MODEL
)
agent4=Agent(model=ollama_model, result_type=CarListings, retries=3, system_prompt=SYSTEM_PROMPT)
try:
result3=agent4.run_sync(create_scrape_prompt(auto_content))
except:
passfinally:
debug(agent4.last_run_messages)
debug(result3)
#rows = [listing.__dict__ for listing in extraction.cars]#listings_df = pd.DataFrame(car_extract)# print(car_extract)# #listings_df["model"] = listings_df.model.apply(filter_model)# #listings_df# # %%# listings_df.to_csv("car-listings.csv", index=None)asyncio.run(playwright.stop())
asyncio.run(browser.close())
The text was updated successfully, but these errors were encountered:
I did follow the link returned in the exeption, but didn't see much to enable it besides setting up an account to use their SaaS product? Does that sound right? If so then I'm not sure if this is the logging solution for me, as the reason to go with local LLMs would be to keep with data privacy. I'll keep an eye for anything else online, but a quick search doesn't yield much, but I didn't search too in depth.
I can't find where this is coming from or what it means
agent4 run prompt=[Scrubbed due to 'Cookie']
? It happens when I run Pydantic AI with logging turned on. I'm using Playwright, but didn't find any "Scrubbed" keyword in both Pydantic AI nor Playwright source, or any hits on Google. Where is this coming from and could it impact the results? If I prompt Qwen in Ollama it returns the JSON I expect.This was adapted from this notebook but I'm attempting Pydantic AI for some extra functionality: https://github.com/curiousily/AI-Bootcamp/blob/master/20.scraping-with-llm.ipynb (last activity in the notebook).
Any help would be appreciated to understanding if the prompt is being scrubbed to assist on how to remedy it, or maybe there's another way of tackling this issue? TIA.
The text was updated successfully, but these errors were encountered: