200字范文,内容丰富有趣,生活中的好帮手!
200字范文 > 爬虫获取斗鱼主播人气

爬虫获取斗鱼主播人气

时间:2023-07-02 18:54:15

相关推荐

爬虫获取斗鱼主播人气

获取斗鱼页面中DOTA2游戏主播的人气值,并进行排序

代码:

import requestsimport reimport randomclass Spider():# url = '/g_LOL'url = '/g_DOTA2'root_pattern = '<div class="DyListCover-info">([\d\D]*?)</div>'name_pattern = '</use></svg>([\d\D]{0,20}?)</h2>'number_pattern = '</use></svg>([\d\D]*?)</span>'def __fetch_content(self):r = requests.get(Spider.url)htmls = r.textreturn htmlsdef __analysis(self, htmls):root_html = re.findall(Spider.root_pattern, htmls)[1::2]# print(root_html[0])anchors = []# str_max = ""for html in root_html:name = re.findall(Spider.name_pattern, html)number = re.findall(Spider.number_pattern, html)anchor = {'name': name, 'number': number}# if len(anchor['name']) > len(str_max):#str_max = anchor['name']anchors.append(anchor)# print(anchors[0], str_max, len(str_max))return anchorsdef __refine(self, anchors):l = lambda anchor: {'name': anchor['name'][0].strip(),'number': anchor['number'][0].strip()}anchors_refine = list(map(l, anchors))return anchors_refinedef __sort(self, anchors):shuffle_list = list(range(len(anchors)))random.shuffle(shuffle_list)anchors_shuffle = [anchors[i] for i in shuffle_list]anchors = sorted(anchors_shuffle, key=self.__sort_seed, reverse=True)return anchorsdef __sort_seed(self, anchor):r = re.findall('\d*', anchor['number'])number = float(r[0])if '万' in anchor['number']:number *= 10000return number# return anchor['number'] # wrongdef __show(self, anchors):for i, anchor in enumerate(anchors):print('rank', i+1, anchor['name'], anchor['number'])def go(self):htmls = self.__fetch_content()anchors = self.__analysis(htmls)anchors = self.__refine(anchors)anchors = self.__sort(anchors)self.__show(anchors)spider = Spider()spider.go()

结果:

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。