We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
如果CSDN数据是通过AJAX渲染后才出现的,对于这种情况怎么去爬呢?
The text was updated successfully, but these errors were encountered:
目前还没用webmagic试过。你可以用chrome按F12去debug,分析network(xhr),找到请求url和params,用httpclient去爬取页面数据(大多数返回json格式),然后用jackson(天梯图效率最高)去解析json数据即可~
Sorry, something went wrong.
参考下我之前做过的一个业务,爬取“一个ONE"APP电影信息,数据都是通过ajax请求返回的,和你的情况很像 https://github.com/liyifeng1994/xfshxzs/blob/master/src/main/java/com/soecode/xfshxzs/service/MovieService.java
刚查了一下,其实webmagic本身也有对json数据解析的支持,首先要用浏览器debug分析请求url(这个是重点),然后这里给出官方的相关ajax爬取案例 https://github.com/code4craft/webmagic/blob/master/webmagic-samples/src/main/java/us/codecraft/webmagic/samples/AngularJSProcessor.java
No branches or pull requests
如果CSDN数据是通过AJAX渲染后才出现的,对于这种情况怎么去爬呢?
The text was updated successfully, but these errors were encountered: