200字范文 > Python使用正则表达式处理字符串

Python使用正则表达式处理字符串

时间：2024-02-09 19:40:36

相关推荐

Python使用正则表达式处理字符串

关于正则表达式基本语法请参考之前发过的文章常用正则表达式锦集与Python中正则表达式的用法，正则表达式扩展语法的高级用法后面会专门整理后再发。

Python标准库re提供了正则表达式操作所需要的功能，既可以直接使用re模块中的方法（见下表）来处理字符串。

其中函数参数“flags”的值可以是re.I（注意是大写字母I，不是数字1，表示忽略大小写）、re.L（支持本地字符集的字符）、re.M（多行匹配模式）、re.S（使元字符“.”匹配任意字符，包括换行符）、re.U（匹配Unicode字符）、re.X（忽略模式中的空格，并可以使用#注释）的不同组合（使用“|”进行组合）。

下面的代码演示了直接使用re模块中的方法和正则表达式处理字符串的用法，其中match()函数用于在字符串开始位置进行匹配，而search()函数用于在整个字符串中进行匹配，这两个函数如果匹配成功则返回match对象，否则返回None。

>>> import re #导入re模块

>>> text = 'alpha. beta....gamma delta' #测试用的字符串

>>> re.split('[\. ]+', text) #使用指定字符作为分隔符进行分隔

['alpha', 'beta', 'gamma', 'delta']

>>> re.split('[\. ]+', text, maxsplit=2)#最多分隔2次

['alpha', 'beta', 'gamma delta']

>>> re.split('[\. ]+', text, maxsplit=1)#最多分隔1次

['alpha', 'beta....gamma delta']

>>> pat = '[a-zA-Z]+'

>>> re.findall(pat, text) #查找所有单词

['alpha', 'beta', 'gamma', 'delta']

>>> pat = '{name}'

>>> text = 'Dear {name}...'

>>> re.sub(pat, 'Mr.Dong', text) #字符串替换

'Dear Mr.Dong...'

>>> s = 'a s d'

>>> re.sub('a|s|d', 'good', s)#字符串替换

'good good good'

>>> s = "It's a very good good idea"

>>> re.sub(r'(\b\w+) \1', r'\1', s)#处理连续的重复单词

"It's a very good idea"

>>> re.sub('a', lambda x:x.group(0).upper(), 'aaa abc abde') #repl为可调用对象

'AAA Abc Abde'

>>> re.sub('[a-z]', lambda x:x.group(0).upper(), 'aaa abc abde')

'AAA ABC ABDE'

>>> re.sub('[a-zA-z]', lambda x:chr(ord(x.group(0))^32), 'aaa aBc abde') #英文字母大小写互换

'AAA AbC ABDE'

>>> re.subn('a', 'dfg', 'aaa abc abde') #返回新字符串和替换次数

('dfgdfgdfg dfgbc dfgbde', 5)

>>> re.sub('a', 'dfg', 'aaa abc abde')

'dfgdfgdfg dfgbc dfgbde'

>>> re.escape('') #字符串转义

'http\\:\\/\\/www\\.python\\.org'

>>> print(re.match('done|quit', 'done'))#匹配成功，返回match对象

<_sre.SRE_Match object at 0x00B121A8>

>>> print(re.match('done|quit', 'done!'))#匹配成功

<_sre.SRE_Match object at 0x00B121A8>

>>> print(re.match('done|quit', 'doe!')) #匹配不成功，返回空值None

None

>>> print(re.match('done|quit', 'd!one!')) #匹配不成功

None

>>> print(re.search('done|quit', 'd!one!done'))#匹配成功

<_sre.SRE_Match object at 0x0000000002D03D98>

下面的代码使用不同的方法删除字符串中多余的空格，如果遇到连续多个空格则只保留一个，同时删除字符串两侧的所有空白字符。

>>> import re

>>> s = 'aaa bb c d e fff '

>>> ' '.join(s.split()) #直接使用字符串对象的方法

'aaa bb c d e fff'

>>> ' '.join(re.split('[\s]+', s.strip())) #同时使用re模块中的函数和字符串对象的方法

'aaa bb c d e fff'

>>> ' '.join(re.split('\s+', s.strip())) #与上一行代码等价

'aaa bb c d e fff'

>>> re.sub('\s+', ' ', s.strip())#直接使用re模块的字符串替换方法

'aaa bb c d e fff'

下面的代码使用几种不同的方法来删除字符串中指定内容：

>>> email = "tony@"

>>> m = re.search("remove_this", email) #使用search()方法返回的match对象

>>> email[:m.start()] + email[m.end():] #字符串切片

'tony@'

>>> re.sub('remove_this', '', email) #直接使用re模块的sub()方法

'tony@'

>>> email.replace('remove_this', '') #直接使用字符串替换方法

'tony@'

下面的代码使用以“\”开头的元字符来实现字符串的特定搜索。

>>> import re

>>> example = 'Beautiful is better than ugly.'

>>> re.findall('\\bb.+?\\b', example) #以字母b开头的完整单词，此处问号?表示非贪心模式

['better']

>>> re.findall('\\bb.+\\b', example) #贪心模式的匹配结果

['better than ugly']

>>> re.findall('\\bb\w*\\b', example)

['better']

>>> re.findall('\\Bh.+?\\b', example)#不以h开头且含有h字母的单词剩余部分

['han']

>>> re.findall('\\b\w.+?\\b', example) #所有单词

['Beautiful', 'is', 'better', 'than', 'ugly']

>>> re.findall('\w+', example)#所有单词

['Beautiful', 'is', 'better', 'than', 'ugly']

>>> re.findall(r'\b\w.+?\b', example)#使用原始字符串

['Beautiful', 'is', 'better', 'than', 'ugly']

>>> re.split('\s', example)#使用任何空白字符分隔字符串

['Beautiful', 'is', 'better', 'than', 'ugly.']

>>> re.findall('\d+\.\d+\.\d+', 'Python 2.7.13') #查找并返回x.x.x形式的数字

['2.7.13']

>>> re.findall('\d+\.\d+\.\d+', 'Python 2.7.13,Python 3.6.0')

['2.7.13', '3.6.0']

>>> s = '<html><head>This is head.</head><body>This is body.</body></html>'

>>> pattern = r'<html><head>(.+)</head><body>(.+)</body></html>'

>>> result = re.search(pattern, s)

>>> result.group(1) #第一个子模式

'This is head.'

>>> result.group(2) #第二个子模式

'This is body.'

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。