爬取两万多租房数据，告诉你广州房租现状

发布时间：2019-01-01 03:06:59 所属栏目：教程来源：zone7

导读：概述前言统计结果爬虫代码实现爬虫分析实现后记前言建议在看这篇文章之前，请看完这三篇文章，因为本文是依赖于前三篇文章的：爬虫利器初体验(1) 听说你的爬虫又被封了?(2) 爬取数据不保存，就是耍流氓(3) 八月份的时候，由于脑洞大开，决定用 pyt

最后解析一个页面的数据

def parse(self, response): # 解析一个页面的数据 
    self.logger.info("==========================") 
    soup = BeautifulSoup(response.body, "html5lib") 
    divs = soup.find_all("dd", attrs={"class": "info rel"})  # 获取需要爬取得 div 
    for div in divs: 
        ps = div.find_all("p") 
        try:  # 捕获异常，因为页面中有些数据没有被填写完整，或者被插入了一条广告，则会没有相应的标签，所以会报错 
            for index, p in enumerate(ps):  # 从源码中可以看出，每一条 p 标签都有我们想要的信息，故在此遍历 p 标签， 
                text = p.text.strip() 
                print(text)  # 输出看看是否为我们想要的信息 
            roomMsg = ps[1].text.split("|") 
            area = roomMsg[2].strip()[:len(roomMsg[2]) - 1] 
            item = RenthousescrapyItem() 
            item["title"] = ps[0].text.strip() 
            item["rooms"] = roomMsg[1].strip() 
            item["area"] = int(float(area)) 
            item["price"] = int(ps[len(ps) - 1].text.strip()[:len(ps[len(ps) - 1].text.strip()) - 3]) 
            item["address"] = ps[2].text.strip() 
            item["traffic"] = ps[3].text.strip() 
            if (self.baseUrl+"house/") in response.url: # 对不限区域的地方进行区分 
                item["region"] = "不限" 
            else: 
                item["region"] = ps[2].text.strip()[:2] 
            item["direction"] = roomMsg[3].strip() 
            print(item) 
            yield item 
        except: 
            print("糟糕，出现 exception") 
            continue 
    if len(self.allUrlList) != 0:  
        url = self.allUrlList.pop(0) 
        yield Request(url, callback=self.parse, dont_filter=True)

数据分析实现

这里主要通过 pymongo 的一些聚合运算来进行统计，再结合相关的图标库，来进行数据的展示。

（编辑：晋中站长网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

5/12

首页

尾页

xp强行删除开机密码,教	耳机没声音,教您耳机没
电脑机箱多少钱电脑机	三星曲面电脑显示器怎