一、计算查找重复的号码,查找长度不对的号码(应该用trim过滤下,待完善)
输入的excel表只有一列,列名为id,值是身份证号
import pandas as pdpath = r'C:\Users\BackUp\Desktop\培训人员名单(1).xlsx'data = pd.read_excel(path)df2 = pd.DataFrame()df2['id_18'] = data['id']df2['id_17'] = data['id'].apply(lambda x:x[:17])df2['id_last'] = data['id'].apply(lambda x:x[-1])df2['dup18'] = df2.id_18.duplicated()df2['dup17'] = df2.id_17.duplicated()print('重复项:')print(df2[df2.id_18.duplicated()])print('前17位存在非法字符:')print(df2[df2.iloc[:,1].str.contains(pat='\s')])print('最后一位不是数字或者不是Xx:')print(df2[df2.iloc[:,2].str.contains(pat='[^0-9Xx]')])print('长度不是18位:')print(df2[df2.iloc[:,0].str.len()!=18])
二、在将第一步查找出的长度错误的号码修改正确后,检查第18位校验位与前17位的校验和是否一致,不一致说明号码填写有误。
import numpy as nplistid17 = df2['id_17'].apply(lambda x:list(map(int,list(x))))X = np.array(listid17.tolist())X = np.matrix(X)theta = np.matrix(np.array([7,9,10,5,8,4,2,1,6,3,7,9,10,5,8,4,2]))y = X*theta.T%11laststr = ['1','0','X','9','8','7','6','5','4','3','2']ar = [laststr[int(i)] for i in y]s = pd.Series(ar)df2['check'] = sdf2['id_last'] = df2['id_last'].str.upper()df2['checkfail'] = df2['id_last']!=df2['check']print(df2[df2['id_last']!=df2['check']])df2.to_excel(r'D:\我的文档\文件\在线培训\结果1130.xlsx')
/article/7f41ececff944a593d095c8c.html