Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: HTTP request tool #9228

Merged
merged 72 commits into from
Jun 19, 2024
Merged

Conversation

michael-radency
Copy link
Contributor

@michael-radency michael-radency commented Apr 26, 2024

Summary

This tool enables the model to make HTTP requests with various parameters, send in path, body, headers or query . It also allows the model to utilize credentials from other nodes for authentication purposes. Optimize Response option allows some optimization to reduce context send to model.

Related tickets and issues

https://linear.app/n8n/issue/AI-162/tool-to-visit-a-website

@michael-radency michael-radency added node/new Creation of an entirely new node n8n team Authored by the n8n team labels Apr 26, 2024
);
}
const returnData: string[] = [];
const html = cheerio.load(response);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use @mozilla/readability and jsdom here to cleanly extract the content that's likely relevant to an end-user.

Something like this perhaps:

import { JSDOM } from 'jsdom'
import { Readability } from '@mozilla/readability'

const dom = await JSDOM.fromURL(url)
const article = new Readability(dom.window.document, {
    keepClasses: true,
}).parse()

and then use article.content.

we could also consider using turndown to convert the html into markdown, which LLM tend to handle better than html IMO.

import Turndown from 'turndown'
const markdown = turndown.turndown(article.content)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@netroy
what would be advantages over html-to-text + Cheerio? since we already using such setup for Html node

Copy link
Member

@netroy netroy May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cheerio is great to either use css selectors to extract text, but leaves the burden of determining the semantics in the markup to the end user.

First download an article via curl https://www.bbc.com/news/articles/cldd6x6gglxo > news.html.

Then try this with cheerio:

const fs = require('fs');
const cheerio = require('cheerio');
const html = fs.readFileSync('news.html', 'utf8');
const $ = cheerio.load(html);
console.log($('body').text());

I got this
image

With just readability:

(async () => {
	const { JSDOM } = require('jsdom');
	const { Readability } = require('@mozilla/readability');
	const Turndown = require('turndown');

	const dom = await JSDOM.fromFile('news.html');
	const article = new Readability(dom.window.document, {
		keepClasses: true,
	}).parse();
	console.log(article.textContent);
})();

I got this
image

With readability + turndown:

(async () => {
	const { JSDOM } = require('jsdom');
	const { Readability } = require('@mozilla/readability');
	const Turndown = require('turndown');

	const dom = await JSDOM.fromFile('news.html');
	const article = new Readability(dom.window.document, {
		keepClasses: true,
	}).parse();
	const turndown = new Turndown({
		headingStyle: 'atx',
		hr: '---',
		bulletListMarker: '-',
		codeBlockStyle: 'fenced',
	});
	const markdown = turndown.turndown(article.content);
	console.log(markdown);
})();

I got this
image

Perhaps we should add a "Extract as Markdown" option in the node to determine if we want to use markup to reduce semantic noise in the extracted text?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for using @mozilla/readability

);
}
const returnData: string[] = [];
const html = cheerio.load(response);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for using @mozilla/readability

Comment on lines 743 to 753
if (!Object.keys(options.headers as IDataObject).length) {
delete options.headers;
}

if (!Object.keys(options.qs as IDataObject).length) {
delete options.qs;
}

if (!Object.keys(options.body as IDataObject).length) {
delete options.body;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!Object.keys(options.headers as IDataObject).length) {
delete options.headers;
}
if (!Object.keys(options.qs as IDataObject).length) {
delete options.qs;
}
if (!Object.keys(options.body as IDataObject).length) {
delete options.body;
}
if (options) {
options.url = encodeURI(options.url);
if (options.headers && !Object.keys(options.headers).length) {
delete options.headers;
}
if (options.qs && !Object.keys(options.qs).length) {
delete options.qs;
}
if (options.body && !Object.keys(options.body).length) {
delete options.body;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also want to add a .dark icon to improve contrast
CleanShot 2024-06-17 at 11 41 38@2x

Copy link

cypress bot commented Jun 19, 2024

1 flaky test on run #5573 ↗︎

0 395 0 0 Flakiness 1

Details:

🌳 🖥️ browsers:node18.12.0-chrome107 🤖 michael-radency 🗃️ e2e/*
Project: n8n Commit: 853da9022c
Status: Passed Duration: 04:29 💡
Started: Jun 19, 2024 7:47 AM Ended: Jun 19, 2024 7:51 AM
Flakiness  e2e/5-ndv.cy.ts • 1 flaky test

View Output Video

Test Artifacts
NDV > should not retrieve remote options when required params throw errors Screenshots Video

Review all test suite changes for PR #9228 ↗︎

Copy link
Contributor

✅ All Cypress E2E specs passed

@michael-radency michael-radency merged commit be2635e into master Jun 19, 2024
26 checks passed
@michael-radency michael-radency deleted the ai-162-tool-to-visit-a-website branch June 19, 2024 07:54
Copy link
Member

@netroy netroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR also updated packages/nodes-base/nodes/Splunk/splunk.svg. Is that intentional?

adrian-martinez-onestic pushed a commit to onesdata/n8n-fork that referenced this pull request Jun 20, 2024
adrian-martinez-onestic pushed a commit to onesdata/n8n-fork that referenced this pull request Jun 20, 2024
This was referenced Jun 20, 2024
@janober
Copy link
Member

janober commented Jun 20, 2024

Got released with [email protected]

1 similar comment
@janober
Copy link
Member

janober commented Jun 20, 2024

Got released with [email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
n8n team Authored by the n8n team node/new Creation of an entirely new node
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants