我的第一个Python爬虫程序
Acid

没有什么Python基础(学不下去) 就直接搜教程写的爬虫了解一下大概流程

Talk is cheap, show me the code.

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# -*- coding:UTF-8 -*-
import requests,sys
from bs4 import BeautifulSoup
if __name__ == '__main__':
server = 'https://www.biqukan.com/'
target1 = 'https://www.biqukan.com/2_2760/'
webpage1 = requests.get(url = target1)
html_code1 = webpage1.text.encode('iso-8859-1')
soup1 = BeautifulSoup(html_code1)
index = soup1.find_all('div',class_ = 'listmain')
soup2 = BeautifulSoup(str(index[0]))
a = soup2.find_all('a')
cnt = 0
for each in a:
target2 = server+each.get('href')
webpage2 = requests.get(url = target2)
html_code2 = webpage2.text.encode('iso-8859-1')
soup3 = BeautifulSoup(html_code2)
txt = soup3.find_all('div',class_ = 'showtxt',id = 'content')
with open('./庆余年.txt','a') as f:
print('TITLE:',each.string,'\n',file = f)
print(txt[0].text.replace('\xa0'*8,'\n\n'),'\n',file = f)
cnt = cnt+1
print(cnt)

总结

这代码其实爬不出完整的小说 但是好歹有结果 体验了一波爬虫还是很值得的
过程中经历了字符编码问题 有一堆报错 还有不会文件输出等等
Python代码写的和C++感觉确实不一样 很好玩
1

  • 本文标题:我的第一个Python爬虫程序
  • 本文作者:Acid
  • 创建时间:2020-02-07 23:21:02
  • 本文链接:https://blog.acidwits.xyz/2020/02/07/0006/
  • 版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
 评论