文章 二月 07, 2020

我写出了人生第一个爬虫程序!

文章字数 1.5k 阅读约需 1 mins. 阅读次数 1000000

Talk is cheap, show me the code.

# -*- coding:UTF-8 -*-
import requests,sys
from bs4 import BeautifulSoup
if __name__ == '__main__':
	server = 'https://www.biqukan.com/'
	target1 = 'https://www.biqukan.com/2_2760/'
	webpage1 = requests.get(url = target1)
	html_code1 = webpage1.text.encode('iso-8859-1')
	soup1 = BeautifulSoup(html_code1)
	index = soup1.find_all('div',class_ = 'listmain')
	soup2 = BeautifulSoup(str(index[0]))
	a = soup2.find_all('a')
	cnt = 0
	for each in a:
		target2 = server+each.get('href')
		webpage2 = requests.get(url = target2)
		html_code2 = webpage2.text.encode('iso-8859-1')
		soup3 = BeautifulSoup(html_code2)
		txt = soup3.find_all('div',class_ = 'showtxt',id = 'content')
		with open('./庆余年.txt','a') as f:
			print('TITLE:',each.string,'\n',file = f)
			print(txt[0].text.replace('\xa0'*8,'\n\n'),'\n',file = f)
			cnt = cnt+1
			print(cnt)

这份代码还是有很多问题,但是好歹还是有成果的

1

感觉过程还是比较艰辛,毕竟小白
中间出现了字符编码问题,报错一堆,文件输出等等
不过成就感还是挺大的
感觉Python貌似确实挺慢的啊 大概是我代码的问题 3秒钟一章的速度
但是代码还是相对简单
学编程嘛,要有耐心!😁

0%