200字范文 > 正则表达式 Re模块函数学习笔记之二

正则表达式 Re模块函数学习笔记之二

时间：2020-09-12 06:34:52

相关推荐

正则表达式 Re模块函数学习笔记之二

`正则表达式 RE模块`

1.生成正则表达式对象

pile(pattern [,flag])

一般步骤：先用compile()函数将正则表达式的字符串编译成正则表达式对象，然后使用正则表达式对象提供的方法进行字符串处理，这里可以提高字符串的处理效率。

其中：pattern为匹配模式的正则表达式，flag是匹配选项标志。可取的值如下：

re.I ,re.ignorecase: 忽略大小写。re.M 多行匹配模式。改变元字符“^”“$”的行为。使他们除了匹配字符串开始和结尾外，也匹配每行的开始和结尾。（换行符之前或之后）re.S ：匹配包括换行符在内的任意字符。改变元字符“.”的行为。re.X ：忽略模式字符串中的空格字符。

匹配模式的取值可以使用运算符“|”，表示同时生效，例如 re.I|re.M

1.字符的匹配和搜索。

（1）match()函数

两种模式：

1. 用正则表达式对象的函数match() ,match(string[,pos[,endpos]])从字符串pos下标处尝试匹配，pos和endpos默认值为0和len(string)

用pattern对象调用match（）

>>> pattern.match(''[1:])<_sre.SRE_Match object; span=(0, 13), match=''>

2.使用match()函数直接进行匹配。re.match(pattern,string[,flag])

match()函数是从字符串开始位置尝试匹配正则表达式，若匹配成功，则返回match对象，否则，返回None.

例1：

import rem=re.match('^[\w]{3}','afb7_dgd')if m:m.group（）

（2）search()函数

在整个字符串中来寻找匹配，如匹配，返回match 对象。

举例：

import reline='my name is allen'searchname=re.search('allen',line,re.M|re.I)

（3）findall()函数

findall()函数搜索字符串，以列表形式返回全部能匹配正则表达式的子串（group()为元组类型）。也是两种调用模式：

re.findall(pattern,string[,flag])

findall(string[,pos[,endpos]]) #从字符串pos下标处尝试匹配，pos和endpos默认值为0和len(string)

示例如下：

>>> re.match('w{3}\.([a-z0-9]+\.)com','')<_sre.SRE_Match object; span=(0, 13), match=''>>>> re.match('w{3}\.([a-z0-9]+\.)com','').group(1)'baidu.'>>> re.findall('w{3}\.([a-z0-9]+\.)com','') #返回子串的列表['baidu.']>>> re.findall('w{3}\.([a-z0-9]+\.)+com','') # 可以匹配多域名的网址，比如www.a.['baidu.']>>> re.findall('w{3}\.([a-z0-9]+\.)+com','www..com')#只有一个（）分组，所以匹配最后一个域名。['cn.']>>> re.findall('w{3}\.([a-z0-9]+\.)([a-z0-9]+\.)+com','www..com')[('baidu.', 'cn.')]>>> re.findall('w{3}\.([a-z0-9]+\.)([a-z0-9]+\.)([a-z0-9]+\.)?com','www..com')[('baidu.', 'edu.', 'cn.')]>>> re.findall('w{3}\.([a-z0-9]+\.)([a-z0-9]+\.)([a-z0-9]+\.)*?com','www..com')[('baidu.', 'edu.', 'cn.')]>>> re.findall('w{3}\.([a-z0-9]+\.)([a-z0-9]+\.)?com','www..com')# 正则表达式不能匹配后面的字符串，？表示0个或者1次。[]>>> re.findall('(w{3})\.([a-z0-9]+\.)([a-z0-9]+\.)+com','www..com')[('www', 'baidu.', 'cn.')]

有分组的时候，返回的group组里面字符串的列表，例如：

>>> re.findall('w{3}\.([a-z0-9]+\.)([a-z0-9]+\.)*com','www. ')[('baidu.', 'edu.'), ('google.', ''), ('alibaba.', ''), ('china.', 'cnn.')]

没有分组的时候，就整体返回匹配的字符串列表。

re.findall('w{3}\.[a-z0-9]+\.com','www.www687alen .com')['', '', '']

（3）finditer()函数

和findall函数类似，在字符串中找到匹配的所有子串，并返回一个迭代器。调用方法同上。

>>> b=re.finditer('w{3}\.([a-z0-9]+\.)+com','www. www687alen .com')>>> for i in b:print(i) #返回的是匹配正则表达式的match对象，并生成迭代器， www687allen没有匹配到。如果i加上group(),则返回具体的字符串。<_sre.SRE_Match object; span=(0, 17), match='www.'><_sre.SRE_Match object; span=(18, 30), match=''><_sre.SRE_Match object; span=(33, 47), match=''><_sre.SRE_Match object; span=(60, 71), match=''><_sre.SRE_Match object; span=(73, 87), match='.com'>

i加上group()

for i in b:print(i.group())www..com

i加上group(1), 返回的是匹配子串中的组

>>> b=re.finditer('w{3}\.([a-z0-9]+\.)+com','www.www687alen .com')>>> for i in b:print(i.group(1))edu.sina..

2 .字符的替换和分割。

（1）字符替换：sub()函数

调用格式：

re.sub(pattern,rep1,string[,count,flag])sub(rep1,string[,count=0])

该函数先在string中匹配pattern的所有子串，如果匹配不成功，返回未被修改的string; 匹配成功，用rep1行进替换匹配到的子串，并返回被替换过的字符串string。count用云指定最多替换的次数。不指定时，全部替换。rep1可以是字符串，也可以是函数。

>>> import re>>> p=pile(r'allen|lily')>>> p.sub('Andy','allen is my lily big brother , lily is allen sisyter, are you okay allen, are you okay lily ?',2)'Andy is my Andy big brother , lily is allen sisyter, are you okay allen, are you okay lily ?'

未匹配到的字符串，返回原来的

>>> p=pile(r'sun')>>> p.sub('Andy','allen is my lily big brother , lily is allen sisyter, are you okay allen, are you okay lily ?',2)'allen is my lily big brother , lily is allen sisyter, are you okay allen, are you okay lily ?'>>>

subn()函数和sub()函数相同，但是返回新的字符串和替换次数组成的元组。

p.subn('Andy','allen is my lily big brother , lily is allen sisyter, are you okay allen, are you okay lily ?')('Andy is my Andy big brother , Andy is Andy sisyter, are you okay Andy, are you okay Andy ?', 6)

（2）字符分割：split()函数

使用正则表达式匹配的字符串，来分割字符串string, 返回分割后的字符串列表。

调用方法：

re.split(pattern,string[,maxsplit,flag])

split(string,[,maxsplit])

其中，maxsplit是最大的分割次数。

示例：

>>> re.split('(Andy)','Andy is my Andy big brother , Andy is Andy sisyter, are you okay Andy?')['', 'Andy', ' is my ', 'Andy', ' big brother , ', 'Andy', ' is ', 'Andy', ' sisyter, are you okay ', 'Andy', '?']>>>

比较：如果用（）分组的话，分割的列表要加上（）里面的内容

>>> re.split('Andy','Andy is my Andy big brother , Andy is Andy sisyter, are you okay Andy?')['', ' is my ', ' big brother , ', ' is ', ' sisyter, are you okay ', '?']

（3）escape()函数

用于将字符串中的特殊符号之前加上转义符号在返回。

>>> re.escape(' 192.182.93.01')'www\\.baidu\\.com\\ \\ 192\\.182\\.93\\.01'

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。