ruby正则 \1 和 $1 区别 和 结合gsub的用法

Posted by wxianfeng Fri, 13 Mar 2009 19:47:00 GMT

环境:ruby 1.9

\1 和 $1 在用 ruby 正则的时候 经常会用到 , 那么有什么区别呢,今天 来梳理一下:

\1 : 是 向后引用 , 常使用在 sub , gsub 中
$1 : 是 ruby 里的全局变量

看几个demo:

demo:

"ab12cd12".gsub(/(\d+)cd(\1)/,"")   # => "ab"

这个正则就相当于 /(\d+)cd12/  ,因为 \1 引用的是 前面的 (\d+) ,而前面的 (\d+) 匹配出来的结果是  12

"ab12cd".gsub(/(\d+)/,'34\1')  #  => "ab3412cd"
p "ab12cd".gsub(/(\d+)/,'34\1') # "ab3412cd"
p $1  # "12"
p "ab56cd".gsub(/(\d+)/,"78#{$1}") # "ab7812cd" , 这个时候的 $1 为 上面的 12
p "ab12cd".gsub(/(\d+)/,'34\1') # "ab3412cd"
p $1  # "12"
str = "ab56cd".gsub(/(\d+)/) do |ele|
  "78#{$1}" # 这里的 $1 是 56
end
p str # "ab7856cd"
p "ab56cd".gsub(/(\d+)/,"78#{$1}") # "ab78cd" 这里的 $1 是 nil
str = "ab56cd".gsub(/(\d+)/) do |ele|
  "78#{$1}"
end

p str # "ab7856cd"

得出结论:

1,\1 和 $1 是两个 不同的用法
2,特别注意 $1 在 gsub中 block 中,和 写在replacement 中 是不一样的 , \1 用在 replacement 中 ,$1 用在 block 中 ,这个源码中已经说明了
3,\1 必须用单引号

看下源码中的解释:

#     str.gsub(pattern, replacement)       => new_str
  #     str.gsub(pattern) {|match| block }   => new_str
  #
  #
  # Returns a copy of <i>str</i> with <em>all</em> occurrences of <i>pattern</i>
  # replaced with either <i>replacement</i> or the value of the block. The
  # <i>pattern</i> will typically be a <code>Regexp</code>; if it is a
  # <code>String</code> then no regular expression metacharacters will be
  # interpreted (that is <code>/\d/</code> will match a digit, but
  # <code>'\d'</code> will match a backslash followed by a 'd').
  #
  # If a string is used as the replacement, special variables from the match
  # (such as <code>$&</code> and <code>$1</code>) cannot be substituted into it,
  # as substitution into the string occurs before the pattern match
  # starts. However, the sequences <code>\1</code>, <code>\2</code>, and so on
  # may be used to interpolate successive groups in the match.
  #
  # In the block form, the current match string is passed in as a parameter, and
  # variables such as <code>$1</code>, <code>$2</code>, <code>$`</code>,
  # <code>$&</code>, and <code>$'</code> will be set appropriately. The value
  # returned by the block will be substituted for the match on each call.
  #
  # The result inherits any tainting in the original string or any supplied
  # replacement string.
  #
  #    "hello".gsub(/[aeiou]/, '*')              #=> "h*ll*"
  #    "hello".gsub(/([aeiou])/, '<\1>')         #=> "h<e>ll<o>"
  #    "hello".gsub(/./) {|s| s[0].to_s + ' '}   #=> "104 101 108 108 111 "
  #
  #
  def gsub(pattern, replacement)
    # This is just a stub for a builtin Ruby method.
    # See the top of this file for more info.
  end

replacement 时:

 # If a string is used as the replacement, special variables from the match
# (such as <code>$&</code> and <code>$1</code>) cannot be substituted into it,
# as substitution into the string occurs before the pattern match
# starts. However, the sequences <code>\1</code>, <code>\2</code>, and so on
# may be used to interpolate successive groups in the match.

block 时:

 # In the block form, the current match string is passed in as a parameter, and
# variables such as <code>$1</code>, <code>$2</code>, <code>$`</code>,
# <code>$&</code>, and <code>$’</code> will be set appropriately. The value
# returned by the block will be substituted for the match on each call.

http://stackoverflow.com/questions/288573/1-and-1-in-ruby


正则 多行匹配

Posted by wxianfeng Mon, 15 Nov 2010 21:41:00 GMT

多行匹配也很常见,例如截取html源码的时候很有用,实现多行匹配只需要在正则后面加m即可 例如

/http:(.*?)\s?/m

ruby demo:

1

p "ab\r\n334cd".match(/ab(.*)cd/) # nil

p "ab\r\n334cd".match(/ab(.*)cd/m) # #<MatchData "ab\r\n334cd" 1:"\r\n334">

2,

str = "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><pre>\n<code class=\"ruby\">\nputs \"hello world\"\nputs \"hello world\"\n</code>\n</pre></body></html>\n"

p str.match(/<body>(.*)<\/body>/m)[0] # "<body><pre>\n<code class=\"ruby\">\nputs \"hello world\"\nputs \"hello world\"\n</code>\n</pre></body>"

p str.match(/<body>(.*)<\/body>/)[0] # nil