Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webmagic抓取AJAX渲染后的数据 #1

Open
licaibo opened this issue Oct 20, 2016 · 3 comments
Open

webmagic抓取AJAX渲染后的数据 #1

licaibo opened this issue Oct 20, 2016 · 3 comments

Comments

@licaibo
Copy link

licaibo commented Oct 20, 2016

如果CSDN数据是通过AJAX渲染后才出现的,对于这种情况怎么去爬呢?

@liyifeng1994
Copy link
Owner

目前还没用webmagic试过。你可以用chrome按F12去debug,分析network(xhr),找到请求url和params,用httpclient去爬取页面数据(大多数返回json格式),然后用jackson(天梯图效率最高)去解析json数据即可~

@liyifeng1994
Copy link
Owner

参考下我之前做过的一个业务,爬取“一个ONE"APP电影信息,数据都是通过ajax请求返回的,和你的情况很像
https://github.com/liyifeng1994/xfshxzs/blob/master/src/main/java/com/soecode/xfshxzs/service/MovieService.java

@liyifeng1994
Copy link
Owner

刚查了一下,其实webmagic本身也有对json数据解析的支持,首先要用浏览器debug分析请求url(这个是重点),然后这里给出官方的相关ajax爬取案例
https://github.com/code4craft/webmagic/blob/master/webmagic-samples/src/main/java/us/codecraft/webmagic/samples/AngularJSProcessor.java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants