200字范文,内容丰富有趣,生活中的好帮手!
200字范文 > 大乐透python预测程序_scrapy框架爬取大乐透数据

大乐透python预测程序_scrapy框架爬取大乐透数据

时间:2022-07-05 15:51:03

相关推荐

大乐透python预测程序_scrapy框架爬取大乐透数据

[Python] 纯文本查看 复制代码# -*- coding: utf-8 -*-

# Scrapy settings for lottery_spider project

#

# For simplicity, this file contains only settings considered important or

# commonly used. You can find more settings consulting the documentation:

#

# /en/latest/topics/settings.html

# /en/latest/topics/downloader-middleware.html

# /en/latest/topics/spider-middleware.html

BOT_NAME = 'lottery_spider'

SPIDER_MODULES = ['lottery_spider.spiders']

NEWSPIDER_MODULE = 'lottery_spider.spiders'

# Crawl responsibly by identifying yourself (and your website) on the user-agent

#USER_AGENT = 'lottery_spider (+)'

# Obey robots.txt rules

ROBOTSTXT_OBEY = False #False,不去寻找网站设置的rebots.txt文件;

# Configure maximum concurrent requests performed by Scrapy (default: 16)

#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)

# See /en/latest/topics/settings.html#download-delay

# See also autothrottle settings and docs

DOWNLOAD_DELAY = 1 #配置爬虫速度,1秒一次

# The download delay setting will honor only one of:

#CONCURRENT_REQUESTS_PER_DOMAIN = 16

#CONCURRENT_REQUESTS_PER_IP = 16

# Disable cookies (enabled by default)

#COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)

#TELNETCONSOLE_ENABLED = False

# Override the default request headers:

DEFAULT_REQUEST_HEADERS = { #配置爬虫的请求头,模拟浏览器请求;

'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

'Accept-Language': 'en',

'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'

}

# Enable or disable spider middlewares

# See /en/latest/topics/spider-middleware.html

#SPIDER_MIDDLEWARES = {

# 'lottery_spider.middlewares.LotterySpiderSpiderMiddleware': 543,

#}

# Enable or disable downloader middlewares

# See /en/latest/topics/downloader-middleware.html

#DOWNLOADER_MIDDLEWARES = {

# 'lottery_spider.middlewares.LotterySpiderDownloaderMiddleware': 543,

#}

# Enable or disable extensions

# See /en/latest/topics/extensions.html

#EXTENSIONS = {

# 'scrapy.extensions.telnet.TelnetConsole': None,

#}

# Configure item pipelines

# See /en/latest/topics/item-pipeline.html

ITEM_PIPELINES = { #取消此配置的注释,让pipelines.py可以运行;

'lottery_spider.pipelines.LotterySpiderPipeline': 300,

}

# Enable and configure the AutoThrottle extension (disabled by default)

# See /en/latest/topics/autothrottle.html

#AUTOTHROTTLE_ENABLED = True

# The initial download delay

#AUTOTHROTTLE_START_DELAY = 5

# The maximum download delay to be set in case of high latencies

#AUTOTHROTTLE_MAX_DELAY = 60

# The average number of requests Scrapy should be sending in parallel to

# each remote server

#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0

# Enable showing throttling stats for every response received:

#AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)

# See /en/latest/topics/downloader-middleware.html#httpcache-middleware-settings

#HTTPCACHE_ENABLED = True

#HTTPCACHE_EXPIRATION_SECS = 0

#HTTPCACHE_DIR = 'httpcache'

#HTTPCACHE_IGNORE_HTTP_CODES = []

#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。