No Time to Play

正则表达式匹配多行注释

我的尝试

/\*.*\*/

但是这只能识别单行注释，在Java中，有两种方法识别多行注释：

Pattern.compile("(?s)/\\*.*\\*/")
Pattern.compile("/\\*.*\\*/", Pattern.DOTALL)

但是这会吞并两个注释间的所有的代码：

start_code();
/* First comment */
more_code();
/* Second comment */
end_code();

Solutions

正则表达式中的*是贪婪匹配，而*?是非贪婪匹配，下面的正则使用非贪婪匹配（需添加Pattern.DOTALL选项）

`/\*.*?\*/`

有的正则实现不支持非贪婪匹配，则可以使用下面的正则：

`/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/`

无意间发现的一个Trick，既可以多行，又是非贪婪匹配

`/\*[\w\W]*?\*/`

这里的\w\W可以使用任意相反的字符类，比如\d\D, \s\S等

存在问题

被双引号包括的注释同样会被识别，所以在实际使用时应当注意这点。

参考链接

http://stackoverflow.com/questions/13014947/regex-to-match-a-c-style-multiline-comment

http://ostermiller.org/findcomment.html

Posted by nasta 2013年10月28日 10:23

Category: Code Tag: java RegEx Comment: (0)

repr与str的区别

help(repr):

repr(...)
    repr(object) -> string

    Return the canonical string representation of the object.
    For most object types, eval(repr(object)) == object.

返回对象的典型表示形式（Python解释器所能理解的），后面一句非常重要，对于大多数类型eval(repr(object)) == object

help(str):

class str(basestring)
| str(object) -> string
|
| Return a nice string representation of the object.
| If the argument is a string, the return value is the same object.

返回对象的字符串表示（能打印出的人们能理解的）

这样一来，就很好区别了。再来两个例子：

>>> s = "this is a string"
>>> str(s)
'this is a string'
>>> repr(s)
"'this is a string'"

>>> l = [ "l", "i", "s", "t" ]
>>> str(l)
"['l', 'i', 's', 't']"
>>> repr(l)
"['l', 'i', 's', 't']"

Posted by nasta 2013年10月11日 16:07

Category: Code Tag: Python Comment: (0)

五月
日	一	二	三	四	五	六
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31