警告
本文最后更新于 2020-11-23 17:19,文中内容可能已过时。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| import scrapy
class FirstSpider(scrapy.Spider):
name = 'first'
# allowed_domains = ['www.soulchild.cn']
start_urls = ['http://www.qiushibaike.com/text']
def parse(self, response):
div_list = response.xpath('//div[contains(@class,"article") and contains(@class,"mb15")]')
all_data = []
for i in div_list:
author = i.xpath('./div[@class="author clearfix"]//h2/text()')[0].get()
content = ''.join(i.xpath('.//div[@class="content"]/span//text()').getall())
res = {
"author": author,
"content": content,
}
all_data.append(res)
return all_data
|
将parse方法的返回值输出到本地csv文件中
1
| scrapy crawl first -o qs.csv
|
支持的格式:
'json', 'jsonlines', 'jl', 'csv', 'xml', 'marshal', 'pickle'