200字范文 > 数据分析师招聘岗位分析

数据分析师招聘岗位分析

时间：2020-03-25 01:51:37

相关推荐

数据分析师招聘岗位分析

1.本文的目的和内容

1.1.本文的目的：
通过分析能够了解公司对于数据分析岗位的要求及待遇

1.2本文的内容：
主要针对以下几个问题：
1.数据分析岗位不同城市的需求分布；
2.数据分析岗位不同工作经验的需求分布；
3.数据分析岗位薪资整体情况；
4.不同城市的薪资分布；
5.不同工作经验的薪资分布；
6.数据分析岗位对于学历的要求；
7.不同工作经验对于学历的要求；
8.数据分析岗位对于工作技能的要求；
9.不同工作技能对于薪资的影响

2.数据获取

本项目所使用的数据集全部来自拉勾网，主要拉勾网上的岗位信息非常完整、整洁

本次爬取信息的时候，主要获得了以下信息：
[‘companyName’(公司),‘positionName(职位)’,
‘city’(城市),‘salary’(工资), ‘education’(学历),‘workYear’(工作经验), ‘describition’(职位具体描述)]

#爬取拉钩招聘网站import jsonimport timeimport requestsimport pandas as pdfrom pyquery import PyQuery as pq'''主要思路:发送第一个请求，得到cookie，通过传入对应的data 和 cookie，发送第二请求得到的json中获取对应的职位，要得到每个职位对应的具体描述，还需在json 中得到showId，positonId，发送第三个请求得到对应的职位具体描述'''#定义一个类，用于爬取拉钩数据class lagouRequestContent():def __init__(self):self.url_first = '/jobs/list_%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90?labelWords=sug&fromSearch=true&suginput=%E6%95%B0%E6%8D%AE'self.url_second = '/jobs/positionAjax.json?needAddtionalResult=false'self.url_third = '/jobs/{}.html?show='self.headers_first = {#第一个header'Connection': 'keep-alive','Cache-Control': 'max-age=0','Upgrade-Insecure-Requests': '1','Connection':'close','User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36','Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8','Accept-Encoding': 'gzip, deflate, br','Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8'}self.headers_second = {#第二个header'Accept': 'application/json, text/javascript, */*; q=0.01','Accept-Encoding': 'gzip, deflate, br','Accept-Language': 'zh-CN,zh;q=0.9','Connection': 'keep-alive','Content-Length': '55','Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8','Host': '','Origin': '','Referer': '/jobs/list_%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90?labelWords=sug&fromSearch=true&suginput=%E6%95%B0%E6%8D%AE','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36','X-Anit-Forge-Code': '0',# 'Connection':'close','X-Anit-Forge-Token': 'None','X-Requested-With': 'XMLHttpRequest'}self.headers_third = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3','Accept-Encoding': 'gzip, deflate, br','Accept-Language': 'zh-CN,zh;q=0.9','Cache-Control': 'max-age=0','Connection': 'keep-alive',# 'Connection':'close','Host': '','Upgrade-Insecure-Requests': '1','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}#'北京','上海','深圳','广州','杭州','成都','重庆','南京','武汉','西安','佛山','东莞','昆明','珠海','无锡','厦门','长沙',# '天津','福州','济南','大连','郑州','青岛','合肥','宁波','贵阳','长春','太原','石家庄','南昌'self.province_city = ['全国']#第一次请求返回cookie和页数def get_lagou_content_first(self,province_city):#第一个请求的参数params_first = {'px': 'default','city': province_city}#请求原网页，以便获取cookiesresponse_first = requests.get(url = self.url_first,headers = self.headers_first,params = params_first)# 请求原网页if response_first.status_code == 200:doc = pq(response_first.content.decode('utf8'))#得到页数content_data = doc.find('.totalNum').text()#得到cookiecookie = response_first.cookies#获取每一页的数据for num in range(1,int(content_data)+1):self.get_lagou_content(num,province_city,cookie)time.sleep(10)print('---'*10+str(num))else:return None#发送请求def get_lagou_content(self,nums,province_city,cookie):#设置data--->form表单if nums ==1:data = {'first':'true','pn':nums,'kd':'数据分析'}else:data = {'first':'false','pn':nums,'kd':'数据分析'}#设置参数params = {'px': 'default','city': province_city,'needAddtionalResult': 'false'}response_second = requests.post(url = self.url_second,headers = self.headers_second,data =data,cookies = cookie,params=params )cookie_second = response_second.cookiesif response_second.status_code == 200:# print(response_second.status_code)return self.parse_lagou_content(response_second,nums,cookie_second)else:return None#解析请求内容def parse_lagou_content(self,data,nums,cookie):#将str类型的json 格式的，————>python对象# print(type(data.content.decode('utf8')))content_loads = json.loads(data.content.decode("utf8"))# print(type(content_loads))#<class 'dict'># print(type(content_loads['content']['positionResult']['result']))#<class 'list'># print(type(content_loads['content']['positionResult']['result'][0]))#<class 'dict'># print(content_loads['content']['positionResult']['result'][0]['companyFullName'])#<class 'str'>#设置参数params_third = {'show': content_loads['content']['showId']}return self.save_lagou_content(content_loads['content']['positionResult']['result'],nums,cookie,params_third)#保存需要的数据def save_lagou_content(self,value_data,nums,cookie,params_third):lagou_data = []#里面存储形式为 [ [],[],[] ]#遍历for i in range(len(value_data)):lagou_list = []# print(value_data[i]['companyFullName'])lagou_list.append(value_data[i]['companyFullName'])#公司名称lagou_list.append(value_data[i]['positionName'])#职位lagou_list.append(value_data[i]['city'])#城市lagou_list.append(value_data[i]['salary'])#薪资lagou_list.append(value_data[i]['companySize'])#公司大小lagou_list.append('/'.join(value_data[i]['skillLables']))#技能要求lagou_list.append(value_data[i]['education'])#教育lagou_list.append(value_data[i]['workYear'])#工作经历lagou_list.append(value_data[i]['financeStage'])#几轮融资lagou_list.append(value_data[i]['createTime'])#创建时间lagou_list.append(value_data[i]['resumeProcessRate'])#处理率lagou_list.append(value_data[i]['resumeProcessDay'])#天处理率lagou_list.append(value_data[i]['firstType'])lagou_list.append(value_data[i]['secondType'])lagou_list.append(value_data[i]['thirdType'])lagou_list.append(value_data[i]['hitags'])lagou_list.append('/'.join(value_data[i]['companyLabelList']))lagou_list.append('/'.join(value_data[i]['positionLables']))position_data = self.get_lagou_detail_content(cookie,params_third,value_data[i]['positionId'])lagou_list.append(position_data)lagou_data.append(lagou_list)lagou_pd = pd.DataFrame(data = lagou_data,columns = ['公司名称','职位','城市','薪资','公司规模','技能要求','学历','工作经历','融资规模','创建时间','总处理率','天处理率','第一职位类型','第二职位类型','第三职位类型','职位诱惑','福利','职位类型','职位描述'])#if nums == 1:lagou_pd.to_csv('lagouDataPositionCity.csv',mode = 'a',index = None)else:lagou_pd.to_csv('lagouDataPositionCity.csv',mode = 'a',index = None,header = 0)#去掉列名#请求发送获取职位描述 def get_lagou_detail_content(self,cookie,params_third,positionalId):time.sleep(10)response_third = requests.get(url = self.url_third.format(positionalId),headers = self.headers_third,cookies = cookie,params = params_third)if response_third.status_code == 200:doc = pq(response_third.content.decode('utf8'))#获取职位信息position_desc = doc.find('.job-detail').find('p').text()return position_descelse:return None#run，def run(self):#循环调用，提取数据for i in range(len(self.province_city)):self.get_lagou_content_first(self.province_city[i])time.sleep(5)print(i)if __name__ == '__main__':lagou_request = lagouRequestContent()lagou_request.run()

3.分析

3.1.数据导入和基本的清洗

#招聘数据处理import pandas as pdimport numpy as npimport matplotlib.pyplot as pltfrom matplotlib import font_managerfrom matplotlib import style# print(plt.style.available)#导入数据,去除原来的列名lagou_data = pd.read_csv('lagouDataPosition.csv',skiprows = 1)#重先命名列名lagou_data.columns = ['companyName','positionName','city','salary',\'companySize','skillLables','education','workYear',\'financeStage','createTime','resumeProcessRate',\'resumeProcessDay','firstType','secondType',\'thirdType','hitags','companyLabelList','positionLables','describition']lagou_data.head()lagou_data.describe()#由于重复写入了类名，所以删除列名lagou_data.drop(index = lagou_data[lagou_data['workYear'] == '工作经历'].index,inplace = True)lagou_data.describe()# lagou_data.info()

可以看到，经过初步清理后，数据记录1832条。数据相对完整，几乎没有缺失值。

3.2.数据分析岗位的城市分布

#不同城市招聘的职位数量lagou_groupCity = lagou_data.groupby(by = 'city',as_index = False).count().sort_values('companyName',ascending = False)# print(lagou_groupCity[['city','companyName']])# #绘图plt.style.use('seaborn-whitegrid')#设置图片大小plt.figure(figsize = (12.7,7.62),dpi = 80)# #画图color = 'green'rects = plt.bar(lagou_groupCity['city'],lagou_groupCity['companyName'],width = 0.5,color = 'b')# #设置中文my_font = font_manager.FontProperties(fname = r'C:\Windows\Fonts\msyh.ttc',size = 16)# #设置坐标轴格式plt.xticks(fontproperties = my_font,rotation = 45,size = 9)plt.yticks(size = 14)plt.grid(alpha = 0)# #设置坐标轴的标题# plt.xlabel('城市',fontproperties = my_font)# plt.ylabel('职位数',fontproperties = my_font)plt.title('北京对数据分析岗位需求最多',fontproperties = my_font,size = 14)plt.margins(0,0.2)# #获取当前图的图像ax = plt.gca()# #设置四周包围线ax.spines['right'].set_color('none')ax.spines['top'].set_color('none')# #设置显示数字for rect_bar in rects:#得到y值height = rect_bar.get_height()#rect_bar.get_x() 得到x值plt.text(rect_bar.get_x()+rect_bar.get_width()/2,height+2,str(height),ha = 'center')plt.savefig('city1.png',bbox_inches='tight')plt.show()

全国有29个城市的企业对数据分析师有人才需求，从图表可以看出北京，上海，深圳，广州，杭州，招聘职位数量均超100，远超与其它城市是其他城市的数倍以上，其他城市需求数量，均不足50。

数据分析岗位需求大量集中在北上广深四大一线城市以及杭州。对于数据分析岗位，工作选择机会，也主要分布在这五座城市，从另一个方面说，这些城市也都聚集大量优秀人才，相应的竞争压力也会很大。对于其他城市的数据分析职位的工作机会相对较小。

3.2.数据分析岗位的不同工作经验需求分布

#不同工作经历分布lagou_groupWorkYear = lagou_data.groupby(by = 'workYear',as_index = False).count().sort_values('companyName',ascending = False)lagou_groupWorkYearplt.style.use('seaborn-whitegrid')#设置图片大小plt.figure(figsize = (6.5,4),dpi = 80)# #画图color = 'green'rects = plt.bar(lagou_groupWorkYear['workYear'],lagou_groupWorkYear['companyName'],color = 'b',width = 0.6)# #设置中文my_font = font_manager.FontProperties(fname = r'C:\Windows\Fonts\msyh.ttc',size = 16)# #设置坐标轴格式plt.xticks(fontproperties = my_font,rotation = 0,size = 9)plt.yticks(size = 14)plt.grid(alpha = 0)# #设置坐标轴的标题# plt.xlabel('城市',fontproperties = my_font)# plt.ylabel('职位数',fontproperties = my_font)plt.title('工作经验在3~5年市场需求最大',fontproperties = my_font,size = 14)plt.margins(0,0.2)# #获取当前图的图像ax = plt.gca()# #设置四周包围线ax.spines['right'].set_color('none')ax.spines['top'].set_color('none')#设置显示数字for rect_bar in rects:#得到y值height = rect_bar.get_height()#rect_bar.get_x() 得到x值plt.text(rect_bar.get_x()+rect_bar.get_width()/2,height+2,str(height),ha = 'center',size = 12)plt.savefig('city2.png',bbox_inches='tight')plt.show()

对于数据分析岗位工作经验分布来看，工作3-5年经验的数据分析师市场需求量最大，其次是1-3年工作经验的数据分析师。

总体上对于工作经验1年以上的的数据分析需求比较大，对于工作经验不足的，市场需求量相对少

3.3.数据分析岗位的薪资分布

#薪资分布#把K/k替换，lagou_salary = lagou_data['salary'].str.replace('k','').str.replace('K','')#计算avgdef avg_salary(data):avg = (float(data.split('-')[0])+float(data.split('-')[1]))/2return avglagou_salary = lagou_salary.apply(avg_salary)lagou_salary.describe()lagou_data['avgSalary'] = lagou_salary#查看薪资的描述lagou_data['avgSalary'].describe()'''count 1832.000000mean 18.854803std 9.756914min 1.00000025% 11.50000050% 18.25000075% 24.000000max 75.000000'''# lagou_data[['city','salary','avgSalary']]num = np.arange(1,77,5)bins = pd.cut(lagou_salary,num)salary_cut = lagou_salary.groupby(by = bins).count()#将series转换为dataFramedict_df = {'salary':salary_cut.index,'count':salary_cut.values}df_salary = pd.DataFrame(dict_df)#type(df_salary['salary'][0])#把( ] 转换为 ~ df_salary['salary'] = [str(x.left)+' ~ '+str(x.right) for x in df_salary['salary']]#把pandas._libs.interval.Interval类型转换为str类型df_salary['salary'] = df_salary['salary'].astype('str')df_salary['salary']#画图plt.style.use('seaborn-whitegrid')plt.figure(figsize = (6.5,4),dpi = 80)rects = plt.bar(df_salary['salary'],df_salary['count'])#设置中文my_font = font_manager.FontProperties(fname = r'C:\Windows\Fonts\msyh.ttc',size = 16)#设置坐标轴格式plt.xticks(fontproperties = my_font,rotation = -70,size = 9)plt.yticks(size = 14)plt.xlabel('K/月',fontproperties = my_font,size = 12)plt.grid(alpha = 0)#设置标题plt.title('薪酬在11~16k分布最多',fontproperties = my_font,size = 14)plt.margins(0,0.2)#获取当前图的图像ax = plt.gca()#设置四周包围线ax.spines['right'].set_color('none')ax.spines['top'].set_color('none')#设置显示数字for rect_bar in rects:#得到y值height = rect_bar.get_height()#rect_bar.get_x() 得到x值plt.text(rect_bar.get_x()+rect_bar.get_width()/2,height+2,str(height),ha = 'center',size = 12)plt.grid(alpha = 0)plt.savefig('city3.png',bbox_inches='tight')plt.show()

由于各公司给出薪资值是一个区间值，并且薪资范围存在交叉，为了便于分析，我取其均值进行分析。
我们可以明显知道，大多数公司对与数据分析岗位的给出的薪资集中在11-26k/每月，其中薪资在11-16k/月的最多，只有少数人能够获得更高的薪酬。

综合来看，数据分析师的薪资整体水平还是可观的，从这方面说，选择这个职业还是不错的。

3.4.数据分析岗位的不同城市的薪资分布

#不同城市的薪酬分布lagou_groupCity_salary = lagou_data.groupby(by = ['city'])lagou_groupCity_salary.get_group('北京')['avgSalary']df_y = []df_x = []for value in lagou_groupCity[0:10]['city']:data = lagou_groupCity_salary.get_group(value)['avgSalary'].valuesdf_x.append(value)df_y.append(data)#保存excel，# pd.DataFrame({'city':df_x,'salary':df_y}).set_index('city').to_excel('city_salary.xlsx')#画图plt.style.use('seaborn-whitegrid')plt.figure(figsize = (6.5,4),dpi = 80)plt.boxplot(df_y,labels = df_x)# #设置中文my_font = font_manager.FontProperties(fname = r'C:\Windows\Fonts\msyh.ttc',size = 16)# #设置坐标轴格式plt.xticks(fontproperties = my_font,rotation = 0,size = 9)plt.yticks(size = 14)plt.grid(alpha = 0)plt.title('不同城市薪酬分布情况',fontproperties = my_font,size = 14)plt.margins(0,0.2)# #获取当前图的图像ax = plt.gca()# #设置四周包围线ax.spines['right'].set_color('none')ax.spines['top'].set_color('none')plt.grid(alpha = 0.2)plt.savefig('city4.png',bbox_inches='tight')plt.show()

我重点关注排名前十的城市，忽略了人才需求小的城市。从图表上看，这十大城市的薪酬分布情况总体来说都比较集中，与我们前面看到的全国的薪酬总体情况分布是一致的。北京市薪酬分布中位数大约在21k,居全国首位。其次是上海、深圳，约20k，之后是广州和杭州。
另外，我们可以知道，在北上广深及杭州，相对于其它城市出现许多异常值即很高的薪资，说明这些城市有更多的公司能够开出更高的薪资。

从待遇上看，数据分析师留在北京发展是个不错的选择。
对于挑战更高的薪资的选择北上广深及杭州会有更好的机会

3.5.数据分析岗位的不同工作经验的薪资分布

#不同工作经历的薪酬分布lagou_groupWork_salary = lagou_data.groupby(by = ['workYear'],as_index = False)lagou_groupWorkYear = lagou_data.groupby(by = 'workYear',as_index = False).count()#重新排序indexlagou_groupWorkYear.index = [3,0,4,2,1,6,5]# lagou_groupWorkYear.index = [2,3,1,0,5,4]lagou_groupWorkYear.sort_index(ascending = False,inplace = True)lagou_groupWorkYeardf_y = []df_x = []for value in lagou_groupWorkYear['workYear']:data = lagou_groupWork_salary.get_group(value)['avgSalary'].valuesdf_x.append(value)df_y.append(data)#保存excel，# pd.DataFrame({'workYear':df_x,'salary':df_y}).set_index('workYear').to_excel('workYear_salary.xlsx')#画图plt.style.use('seaborn-whitegrid')plt.figure(figsize = (6.5,4),dpi = 80)plt.boxplot(df_y,labels = df_x)# #设置中文my_font = font_manager.FontProperties(fname = r'C:\Windows\Fonts\msyh.ttc',size = 16)# #设置坐标轴格式plt.xticks(fontproperties = my_font,rotation = 0,size = 9)plt.yticks(size = 14)plt.grid(alpha = 0)plt.title('不同工作经验的薪资分布',fontproperties = my_font,size = 14)plt.margins(0,0.2)# #获取当前图的图像ax = plt.gca()# #设置四周包围线ax.spines['right'].set_color('none')ax.spines['top'].set_color('none')plt.grid(alpha = 0.2)plt.savefig('city5.png',bbox_inches='tight')plt.show()

与其他的工作岗位一样，数据分析岗位的薪资也是随着工作经验的提升，薪酬也在不断提高。

3.6.数据分析岗位的对于学历要求

#数据分析师对于学历的要求lagou_group_educaiotn = lagou_data.groupby(by = 'education',as_index = False).count()lagou_group_educaiotn#数据label_list = lagou_group_educaiotn['education'] # 各部分标签size = lagou_group_educaiotn['companyName'] # 各部分大小color = ["r", "g", "b",'c','y']# 各部分颜色# explode = [0,0.4,0,0,0] # 各部分突出值explode = [0,0.4,0,0,0] plt.style.use('seaborn-whitegrid')#设置图大小必须在绘制图之前plt.figure(figsize = (6.5,4),dpi = 80)#绘制图#pathches 返回每个小散形，l_text 返回标签的实例即labels，p_text 返回百分百标签实例即autopctpatches, l_text, p_text = plt.pie(size, explode=explode, colors=color, labels=label_list, labeldistance=1.1, autopct="%1.1f%%", shadow=False, startangle=0,textprops={'fontsize':10},pctdistance=0.8)#设置中文my_font = font_manager.FontProperties(fname = r'C:\Windows\Fonts\msyh.ttc',size = 12)plt.title('本科学历需求占比超80%',fontproperties = my_font,size = 14)for t in l_text: t.set_fontproperties(my_font)for p in patches:p.set_color('pink')breakfor l in l_text:l.set_size(10)#设置图列# plt.legend(prop = my_font,loc ='center right',bbox_to_anchor = (1.1,0,0.4,1))# help(plt.legend)plt.savefig('city6.png',bbox_inches='tight')plt.show()

全国29个城市的对于数据分析师岗位的学历有82.5%需求是本科，对于学历不是本科及以上的数据分析师岗位市场需求不大。

说明公司对于数据分析师岗位学历更倾向于本科及以上的，其市场需求也较大

3.7.数据分析岗位的对不同工作经验的学历要求

#不同工作经验对于学历的要求lagou_group_Education_Work = lagou_data.groupby(by = ['workYear','education'],as_index = False).count()lagou_group_Education_Work[lagou_group_Education_Work['workYear'] == '5-']['education']lagou_workYear = lagou_group_Education_Work.groupby(by = 'workYear',as_index = False).count()['workYear']df_y = []df_x = []def get_data(value):df_x.append(value)lagou_education = lagou_group_Education_Work[lagou_group_Education_Work['workYear'] == value][['workYear','education','companyName']]df_y.append(lagou_education)lagou_workYear.apply(get_data)#统一数据df_y[0].loc[4] = ['1-3年','博士',0]df_y[1].loc[4] = ['以上','不限',0]df_y[1].loc[5] = ['以上','大专',0]df_y[1].loc[6] = ['以上','本科',4]df_y[1].loc[7] = ['以上','硕士',0]df_y[1].loc[8] = ['以上','博士',0]df_y[2].loc[9] = ['1年以下','博士',0]df_y[3].loc[13] = ['3-5年','博士',0]df_y[4].drop(index = 14,inplace = True)df_y[4].loc[18] = ['5-','博士',1]df_y[5].loc[22] = ['不限','博士',0]df_y[6].loc[26] = ['应届毕业生','博士',0]df_x = ['不限','应届毕业生','1年以下','1-3年','3-5年','5-','以上' ]a = []a.append(list(df_y[5]['companyName']))a.append(list(df_y[6]['companyName']))a.append(list(df_y[2]['companyName']))a.append(list(df_y[0]['companyName']))a.append(list(df_y[3]['companyName']))a.append(list(df_y[4]['companyName']))a.append(list(df_y[1]['companyName']))b = pd.DataFrame(a)#绘图plt.style.use('seaborn-whitegrid')plt.figure(figsize = (6.5,4),dpi = 80)my_font = font_manager.FontProperties(fname = r'C:\Windows\Fonts\msyh.ttc',size = 12) #堆积柱状图x = np.array([1,2,3,4,5,6,7])width = 0.6plt.bar(x, np.array(b.loc[:,0]), color='b', width = width,label='不限')plt.bar(x, np.array(b.loc[:,1]), bottom=np.array(b.loc[:,0]), color='g', width = width,label='大专')plt.bar(x, np.array(b.loc[:,2]), bottom=np.array(b.loc[:,0])+np.array(b.loc[:,1]), color='c',width = width, label='本科')plt.bar(x, np.array(b.loc[:,3]), bottom=np.array(b.loc[:,0])+np.array(b.loc[:,1])+np.array(b.loc[:,2]), color='m',width = width, label='硕士')plt.bar(x, np.array(b.loc[:,4]), bottom=np.array(b.loc[:,0])+np.array(b.loc[:,1])+np.array(b.loc[:,2])+np.array(b.loc[:,3]), color='y',width = width, label='博士')# 显示范围# plt.xlim(-2, 22)# plt.ylim(0, 280)plt.xticks(x,df_x,fontproperties = my_font,rotation = 0,size = 9)#添加图例plt.legend(loc='upper right',prop = my_font)plt.title('不同工作经验对本科学历需求最多',fontproperties = my_font,size = 14)plt.margins(0,0.2)# #获取当前图的图像ax = plt.gca()# #设置四周包围线ax.spines['right'].set_color('none')ax.spines['top'].set_color('none')plt.grid(alpha = 0)plt.savefig('city7.png',bbox_inches='tight')plt.show()

不出所料，不同工作经验对于学历要主要都是是本科。
我们可以清楚的知道学历对于数据分析师的重要性，尽管有工作经验，各个公司对于数据分析岗位要求学历基本都是本科及以上。

3.8.数据分析岗位的工作技能要求

#对于技能要求from wordcloud import WordCloudimport jiebaimport jieba.analyseimport rea = lagou_data['describition']# #启用自定义字典jieba.load_userdict('userdict.txt')#停用的词stopwords_path='stopwords.txt'mywordList=[]def data_csv(value):#print(type(value))wordcloud = jieba.analyse.extract_tags(sentence = str(value),topK = 20,withWeight = False,allowPOS = ())#将一个generator的内容用/连接listStr='/'.join(wordcloud)#print(wordcloud)#打开停用词表text_stop=open(stopwords_path,encoding="utf8")#读取try:text_stop_text=text_stop.read()finally:text_stop.close()#关闭资源#将停用词格式化，用\n分开，返回一个列表text_stop_list=text_stop_text.split("\n")#对默认模式分词的进行遍历，去除停用词for myword in listStr.split('/'):#去除停用词if not(myword.split()) in text_stop_list and len(myword.strip())>1:mywordList.append(myword)#return mywordLista.apply(data_csv)# print(mywordList)text = re.sub('[\u4e00-\u9fa5]',' ','/'.join(mywordList))#'/'.join(mywordList)——>是把list转换为strtext = re.sub('nan','',text)text = re.sub('/',' ',text)text = re.sub('[0-9]','',text)#转换为listls = text.split(' ')#去除空值ls = [x for x in ls if x != '']plt.figure(figsize = (6.5,4),dpi = 80)wcd = WordCloud(font_path=r'C:\Windows\Fonts\msyh.ttc',background_color = 'white').generate(str(ls))plt.imshow(wcd)plt.axis("off")plt.savefig('city8.png',bbox_inches='tight')plt.show()

从词云可以知道，公司对于数据分析师的技能基本都需要SQL,python和Excel
数据分析师技能需求频率排在前列的有：Python，SQL，Excel, SAS，SPSS, Tableau,hive, Hadoop，BI,spark, PPT等。另外，Java, linux,office等

3.9.数据分析岗位的工作技能对薪资的影响

#不同技能对于薪资的影响lagou_groupSalary = lagou_data.groupby(by = 'salary',as_index = False)lagou_group_salary = lagou_data.groupby(by = 'salary',as_index = False).count()mywordList=[]dict_skill = {}for data in dictList[:20]:dict_skill[data[0]] = []for data in lagou_group_salary['salary']:#把salary的区间转换为均值salary = data.replace('k','').replace('K','')avg = (float(salary.split('-')[0])+float(salary.split('-')[1]))/2a = lagou_groupSalary.get_group(data)[['describition']].valuesvalue = re.sub(' ','',str(a))#分词wordcloud = jieba.analyse.extract_tags(sentence = value,topK = 100,withWeight = False,allowPOS = ())text = re.sub('[\u4e00-\u9fa5]',' ','/'.join(wordcloud))#'/'.join(mywordList)——>是把list转换为strtext = text.lower()text = re.sub('nan','',text)text = re.sub('[0-9]','',text)text = re.sub('/',' ',text)word_list = text.split(' ')#print([x for x in word_list if x != ''])for key in list(dict_skill.keys()):for value_skill in [x for x in word_list if x != '']:if value_skill == key:dict_skill.get(key).append(avg)#计算每一个技能的均值dict_avg = {}#存薪资dict_num = {}#存对应的需求数量def average(num):nsum = 0.0for value in num:nsum += valuereturn nsum / len(num)for key in list(dict_skill.keys()):dict_avg[key] = []dict_num[key] = []skill_avg = average(dict_skill.get(key))dict_avg.get(key).append(skill_avg)dict_num.get(key).append(len(dict_skill.get(key)))#画图plt.style.use('seaborn-whitegrid')plt.figure(figsize = (20,8),dpi = 80)plt.scatter(list(dict_avg.keys()),list(dict_avg.values()),s=pd.Series(list(dict_num.values())).rank()*50)# #设置中文my_font = font_manager.FontProperties(fname = r'C:\Windows\Fonts\msyh.ttc',size = 16)# #设置坐标轴格式plt.xticks(fontproperties = my_font,rotation = 0,size = 12)plt.yticks(size = 14)plt.grid(alpha = 0)#设置图形显示不全plt.xlim(-1)# plt.ylim(0)plt.title('不同工作技能的薪资提升',fontproperties = my_font,size = 16)plt.margins(0,0.2)# #获取当前图的图像ax = plt.gca()# #设置四周包围线ax.spines['right'].set_color('none')ax.spines['top'].set_color('none')#设置图形显示不全ax.spines['left'].set_position(('data',-1))plt.grid(alpha = 0.5)plt.savefig('city9.png',bbox_inches='tight')plt.show()

我对需求频数最高的前20个技能进行统计计算，得出每一个技能对应的薪资的均值，如上图，气泡图的大小代表该技能需求量的多少。在前20项技能中，Spark，Hive，Hadoop这三者的平均薪资水平最高。这三个工具中，Spark，Hive，Hadoop都是应用于分布式数据处理，说明公司对于掌握大数据相关的技术会给出更高的薪资。所以，挑战更高的薪资，掌握海量数据处理、分布式处理的相关技术是很好的方向。掌握SQL，Python，SAS和SPSS，能够适应更多公司的技能要求。

4.结论

1.数据分析岗位需求大量集中在北上广深四大一线城市以及杭州，相应的竞争压力也会很大。
2.总体上对于工作经验1年以上的的数据分析需求比较大，尤其是3-5年。
3.大多数公司对与数据分析岗位的给出的薪资集中在11-26k/每月，其中薪资在11-16k/月的最多，少数人能够获得更高的薪酬。
4.从待遇上看，数据分析师留在北京发展是个不错的选择，其次是深圳、上海。
5.对于挑战更高的薪资的选择北上广深及杭州会有更好的机会。
6.数据分析岗位的薪资也是随着工作经验的提升，薪酬也在不断提高。
7.数据分析师岗位学历更倾向于本科及以上的，尽管有工作经验，各个公司对于数据分析岗位要求学历基本都是本科及以上。学历对于数据分析岗位非常重要。
8.数据分析师需求频率排在前列的技能有：Python，SQL，Excel, SAS，SPSS, Tableau,hive, Hadoop，BI,spark, PPT等，其中SQL和Excel可以说是必备技能。
9.对于掌握大数据方面的技能能够获得更高的薪资。
10.掌握SQL，Python，SAS和SPSS，能够适应更多公司的要求。

5.讨论和总结

在本次分析过程中，主要针对工具型的技能进行了分析，没有对一些理论（如数学），业务等方面的分析，分析没有足够的细，对于工作技能这快分析也还是比较粗糙的。
本次代码还有很多可以优化的地方，后面继续对其进行修改。

特别说明

本次分析仅对作者自己的学习和总结。

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。