200字范文,内容丰富有趣,生活中的好帮手!
200字范文 > Java爬虫:对猫眼电影进行信息采集并存入Excel

Java爬虫:对猫眼电影进行信息采集并存入Excel

时间:2021-05-08 05:50:57

相关推荐

Java爬虫:对猫眼电影进行信息采集并存入Excel

Java爬虫:对猫眼电影进行信息采集并存入Excel

采集的目标以及内容 提取的信息封装为一个类获得总点评人数,想看人数,已看人数注意获得总点评人数,想看人数,已看人数的数据的方法图片链接,上映时间等进行信息提取提取的结果存入Excel

采集的目标以及内容

目标:TOP100榜,最受期待榜,热映口碑榜,国内票房榜,北美票房榜。

内容:图片,电影名,上映时间,主演人员,电影链接,电影评分,总点评人数,想看人数,已看人数。

用到的一些Maven依赖:

<dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>1.2.58</version></dependency><dependency><groupId>org.apache.httpcomponents</groupId><artifactId>httpclient</artifactId><version>4.5.10</version></dependency><dependency><groupId>org.jsoup</groupId><artifactId>jsoup</artifactId><version>1.11.3</version></dependency><!-- /artifact/org.apache.poi/poi --><dependency><groupId>org.apache.poi</groupId><artifactId>poi</artifactId><version>3.16</version></dependency>

提取的信息封装为一个类

public class Mao {private String picLink;//电影图片链接private String movie;//电影名private String releaseTime;//上映时间private String star;//参演人员private String movieLink;//链接private String score;//电影评分private String snum;private String watched;private String num;public Mao(String picLink,String movie,String releaseTime,String star,String movieLink,String score,String snum,String watched,String num){this.picLink = picLink;this.movie = movie;this.releaseTime = releaseTime;this.star = star;this.movieLink = movieLink;this.score = score;this.snum = snum;this.watched = watched;this.num = num;}

获得总点评人数,想看人数,已看人数

public List<String> getComment(String movieLink){List<String> list = new ArrayList<>(3);String movieId = movieLink.substring(movieLink.lastIndexOf("/")+1,movieLink.length());String request = "/asgard/asgardapi/review/realtime/data.json?movieId="+movieId;HttpClient client = new DefaultHttpClient();HttpGet httpget = new HttpGet(request);HttpResponse response = null;try{response = client.execute(httpget);if(response.getStatusLine().getStatusCode()==200){HttpEntity entity = response.getEntity();if(entity !=null){String body = EntityUtils.toString(entity,"UTF-8");JSONObject jsobject = JSON.parseObject(body);JSONObject data = jsobject.getJSONObject("data");String snum = data.getString("snum");String watched = data.getString("watched");String wish = data.getString("wish");list.add(snum);//总点评人数list.add(watched);//想观看人数list.add(wish);//看过人数}}}catch(Exception e){System.out.println("处理:"+request+"失败,返回状态码:"+response.getStatusLine().getStatusCode());}return list;}

注意获得总点评人数,想看人数,已看人数的数据的方法

在Opera浏览器中右键单击检查元素,找到Audits下的Devices,将Desktop改为Mobile或IE浏览器中点击F12,将桌面切换为Windows Phone,刷新浏览器

找到数据接口/asgard/asgardapi/review/realtime/data.json?movieId=1218029

图片链接,上映时间等进行信息提取

public class Spildermao implements Runnable{String request;public Spildermao(String request){this.request = request;}public void run(){try{Document doc = Jsoup.connect(request).get();Elements elements = doc.select(".board-wrapper > dd");for(int i = 0;i< elements.size();i++){String src = elements.get(i).select(".board-img").attr("data-src");String picLink = src.substring(0,src.lastIndexOf("@"));//图片链接String st = elements.get(i).select(".star").text();String star = st.substring(st.indexOf(":")+1,st.length());String re = elements.get(i).select(".releasetime").text();String releaseTime = re.substring(re.indexOf(":")+1,re.length());String movie = elements.get(i).select(".name").text();String movieLink = elements.get(i).select(".name > a").attr("abs:href");String score = elements.get(i).select(".score").text();List<String> list = getComment(movieLink);String snum = list.get(0);String watched = list.get(1);String wish = list.get(2);lists.add(new Mao(picLink,movie,releaseTime,star,movieLink,score,snum,watched,wish));}writeToExcel();}catch(Exception e){System.out.println("链接:"+request+",处理失败");}}}

提取的结果存入Excel

public void writeToExcel(){FileOutputStream fos;HSSFWorkbook wb = new HSSFWorkbook();HSSFSheet sheet = wb.createSheet("zhangling");HSSFRow row = sheet.createRow(0);//设置列宽,POI中的字符宽度算法是://double 宽度 = (字符个数 * (字符宽度 - 1) + 5) / (字符宽度 - 1) * 256,然后四舍五入sheet.setColumnWidth((short)0,(short)(20*256));sheet.setColumnWidth((short)1,(short)(30*256));sheet.setColumnWidth((short)2,(short)(30*256));sheet.setColumnWidth((short)3,(short)(30*256));sheet.setColumnWidth((short)4,(short)(30*256));sheet.setColumnWidth((short)5,(short)(15*256));sheet.setColumnWidth((short)6,(short)(15*256));sheet.setColumnWidth((short)7,(short)(15*256));sheet.setColumnWidth((short)8,(short)(15*256));HSSFCellStyle style = wb.createCellStyle();style.setAlignment(HSSFCellStyle.ALIGN_CENTER);//水平居中HSSFCell cell1 = row.createCell(0);cell1.setCellValue("picture");cell1.setCellStyle(style);HSSFCell cell2 = row.createCell(1);cell2.setCellValue("movie");cell2.setCellStyle(style);HSSFCell cell3 = row.createCell(2);cell3.setCellValue("movieLink");cell3.setCellStyle(style);HSSFCell cell4 = row.createCell(3);cell4.setCellValue("star");cell4.setCellStyle(style);HSSFCell cell5 = row.createCell(4);cell5.setCellValue("releaseTime");cell5.setCellStyle(style);HSSFCell cell6 = row.createCell(5);cell6.setCellValue("score");cell6.setCellStyle(style);HSSFCell cell7 = row.createCell(6);cell7.setCellValue("snum");cell7.setCellStyle(style);HSSFCell cell8 = row.createCell(7);cell8.setCellValue("watched");cell8.setCellStyle(style);HSSFCell cell9 = row.createCell(8);cell9.setCellValue("num");cell9.setCellStyle(style);if(!lists.isEmpty()){HSSFPatriarch patriarch = sheet.createDrawingPatriarch();//只能申明一次HSSFCellStyle style1 = wb.createCellStyle();style1.setVerticalAlignment(HSSFCellStyle.VERTICAL_CENTER);//垂直居中style1.setWrapText(true);//内容可换行BufferedImage bufferImg;for(int i = 0;i<lists.size();i++){String commentUrl = lists.get(i).getMovieLink();try{row = sheet.createRow((short)sheet.getLastRowNum()+1);row.setHeight((short)(150*20));//设置行高,POI中的行高=Excel的行高度*20HSSFCell cella = row.createCell(1);cella.setCellValue(lists.get(i).getMovie());cella.setCellStyle(style1);HSSFCell cellb = row.createCell(2);cellb.setCellValue(lists.get(i).getMovieLink());cellb.setCellStyle(style1);HSSFCell cellc = row.createCell(3);cellc.setCellValue(lists.get(i).getStar());cellc.setCellStyle(style1);HSSFCell celld = row.createCell(4);celld.setCellValue(lists.get(i).getReleaseTime());celld.setCellStyle(style1);HSSFCell celle = row.createCell(5);celle.setCellValue(lists.get(i).getScore());celle.setCellStyle(style1);HSSFCell cellg = row.createCell(6);cellg.setCellValue(lists.get(i).getSnum());cellg.setCellStyle(style1);HSSFCell cellh = row.createCell(7);cellh.setCellValue(lists.get(i).getWatched());cellh.setCellStyle(style1);HSSFCell celli = row.createCell(8);celli.setCellValue(lists.get(i).getNum());celli.setCellStyle(style1);URL url = new URL(lists.get(i).getPicLink());ByteArrayOutputStream oui = new ByteArrayOutputStream();bufferImg = ImageIO.read(url);ImageIO.write(bufferImg,"jpg",oui);byte[] data = oui.toByteArray();// 关于HSSFClientAnchor(dx1,dy1,dx2,dy2,col1,row1,col2,row2)// dx1:起始单元格的x偏移量,// dy1:起始单元格的y偏移量,// dx2:终止单元格的x偏移量,// dy2:终止单元格的y偏移量,(刚开始时没有设置偏移量,Excel不会得到图片)// col1:起始单元格列序号,从0开始计算;// row1:起始单元格行序号,从0开始计算,// col2:终止单元格列序号,从0开始计算;// row2:终止单元格行序号,从0开始计算HSSFClientAnchor anchor = new HSSFClientAnchor(0,0,1023,255,(short)0,i+1,(short)0,i+1);patriarch.createPicture(anchor,wb.addPicture(data,HSSFWorkbook.PICTURE_TYPE_JPEG));fos = new FileOutputStream(new File("C:Users/Xiao Mi/Desktop/nm.xls"));wb.write(fos);fos.flush();fos.close();System.out.println("已完成:"+lists.get(i).getMovieLink());}catch(Exception e){e.printStackTrace();}}}}

结果图:

源代码

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。