ruby正则 \1 和 $1 区别 和 结合gsub的用法

Posted by wxianfeng Fri, 13 Mar 2009 19:47:00 GMT

环境:ruby 1.9

\1 和 $1 在用 ruby 正则的时候 经常会用到 , 那么有什么区别呢,今天 来梳理一下:

\1 : 是 向后引用 , 常使用在 sub , gsub 中
$1 : 是 ruby 里的全局变量

看几个demo:

demo:

"ab12cd12".gsub(/(\d+)cd(\1)/,"")   # => "ab"

这个正则就相当于 /(\d+)cd12/  ,因为 \1 引用的是 前面的 (\d+) ,而前面的 (\d+) 匹配出来的结果是  12

"ab12cd".gsub(/(\d+)/,'34\1')  #  => "ab3412cd"
p "ab12cd".gsub(/(\d+)/,'34\1') # "ab3412cd"
p $1  # "12"
p "ab56cd".gsub(/(\d+)/,"78#{$1}") # "ab7812cd" , 这个时候的 $1 为 上面的 12
p "ab12cd".gsub(/(\d+)/,'34\1') # "ab3412cd"
p $1  # "12"
str = "ab56cd".gsub(/(\d+)/) do |ele|
  "78#{$1}" # 这里的 $1 是 56
end
p str # "ab7856cd"
p "ab56cd".gsub(/(\d+)/,"78#{$1}") # "ab78cd" 这里的 $1 是 nil
str = "ab56cd".gsub(/(\d+)/) do |ele|
  "78#{$1}"
end

p str # "ab7856cd"

得出结论:

1,\1 和 $1 是两个 不同的用法
2,特别注意 $1 在 gsub中 block 中,和 写在replacement 中 是不一样的 , \1 用在 replacement 中 ,$1 用在 block 中 ,这个源码中已经说明了
3,\1 必须用单引号

看下源码中的解释:

#     str.gsub(pattern, replacement)       => new_str
  #     str.gsub(pattern) {|match| block }   => new_str
  #
  #
  # Returns a copy of <i>str</i> with <em>all</em> occurrences of <i>pattern</i>
  # replaced with either <i>replacement</i> or the value of the block. The
  # <i>pattern</i> will typically be a <code>Regexp</code>; if it is a
  # <code>String</code> then no regular expression metacharacters will be
  # interpreted (that is <code>/\d/</code> will match a digit, but
  # <code>'\d'</code> will match a backslash followed by a 'd').
  #
  # If a string is used as the replacement, special variables from the match
  # (such as <code>$&</code> and <code>$1</code>) cannot be substituted into it,
  # as substitution into the string occurs before the pattern match
  # starts. However, the sequences <code>\1</code>, <code>\2</code>, and so on
  # may be used to interpolate successive groups in the match.
  #
  # In the block form, the current match string is passed in as a parameter, and
  # variables such as <code>$1</code>, <code>$2</code>, <code>$`</code>,
  # <code>$&</code>, and <code>$'</code> will be set appropriately. The value
  # returned by the block will be substituted for the match on each call.
  #
  # The result inherits any tainting in the original string or any supplied
  # replacement string.
  #
  #    "hello".gsub(/[aeiou]/, '*')              #=> "h*ll*"
  #    "hello".gsub(/([aeiou])/, '<\1>')         #=> "h<e>ll<o>"
  #    "hello".gsub(/./) {|s| s[0].to_s + ' '}   #=> "104 101 108 108 111 "
  #
  #
  def gsub(pattern, replacement)
    # This is just a stub for a builtin Ruby method.
    # See the top of this file for more info.
  end

replacement 时:

 # If a string is used as the replacement, special variables from the match
# (such as <code>$&</code> and <code>$1</code>) cannot be substituted into it,
# as substitution into the string occurs before the pattern match
# starts. However, the sequences <code>\1</code>, <code>\2</code>, and so on
# may be used to interpolate successive groups in the match.

block 时:

 # In the block form, the current match string is passed in as a parameter, and
# variables such as <code>$1</code>, <code>$2</code>, <code>$`</code>,
# <code>$&</code>, and <code>$’</code> will be set appropriately. The value
# returned by the block will be substituted for the match on each call.

http://stackoverflow.com/questions/288573/1-and-1-in-ruby


shell 字符串截取

Posted by wxianfeng Sat, 28 Feb 2009 21:04:00 GMT
环境:ubuntu 10.10

上次写了安装所有gems,如果系统已经安装了 就跳过安装的 shell脚本,其中最主要的使用了 字符串的截取,发现bash里截取字符串非常之麻烦,请教了 一个 搞嵌入式的 家伙(同学) @liuqun,给了我 几个指点,发现原来也有很多办法,只是不是那么顺手.

上次的那个 shell 脚本:

#!/bin/bash 
# 安装所有的gems,如果已经安装了就不安装

cd /usr/local/system/ruby/lib/ruby/gems/1.8/cache 


for i in `ls`;do
        gem=`echo $i | awk -F'-' '{print $1}'`
        version=`echo $i | grep -o "\-[0-9].*" | sed 's/^-//;s/.gem//'`
        is_gem_exist=`gem list $gem -v=$version`
        if [ -z "$is_gem_exist" ]; then # 注意[] 内部两边留空格
              `gem install $i`
        else
                echo "$i have installed"
        fi
done
结合 grep awk sed cut 等等,看看有多少办法吧:

1:
wxianfeng@ubuntu:~$ echo "action-i18n-0.4.1.gem" | grep -o "\-[0-9].*" | sed 's/^-//;s/.gem//'
0.4.1
grep -o 是只显示匹配出来的

sed 's/^-//' 把开头的 - 变为空

sed 's/.gem//' 是把.gem 变为空

2,
wxianfeng@ubuntu:~$ STR="actionmailer-2.3.5.gem"; STR=${STR##*-}; echo ${STR%\.*}
2.3.5
${STR##*-} 从左向右截取最后一个 - 后的字符串

${STR%\.*} 从右向左截取第一个 . 后的字符串

3,
wxianfeng@ubuntu:~$ STR="actionmailer-2.3.5.gem"; echo ${STR:13:5}
2.3.5
${STR:13:5} 从index 13 偏移量,5是长度

4,
wxianfeng@ubuntu:~$ echo actionmailer-2.3.5.gem | sed 's/[^0-9]*\(.*\).gem/\1/'
2.3.5
sed 's/[^0-9]*\(.*\).gem/\1/' \1 匹配出来的是 2.3.5 被替换的是 actionmailer-2.3.5.gem

5,
wxianfeng@ubuntu:~$ STR="actionmailer-2.3.5.gem"  ; echo ${STR:13:5}
2.3.5
${STR:13:5} 13 是偏移量,5是长度

6,
wxianfeng@ubuntu:~$ echo "action-i18n-0.4.1.gem" | awk '{print substr($0,13,5)}'
0.4.1
$0 就是指 action-i18n-0.4.1.gem

13 是偏移量

5 是 长度

7,
wxianfeng@ubuntu:~$ echo "action-i18n-0.4.1.gem" | cut -d- -f 3 | cut -d . -f 1-3
0.4.1
-d- 是以 - 分割

-f 3 是第三断

-f 1-3 第一到第三断

see: http://tech.foolpig.com/2008/07/09/linux-shell-char/
http://linux.sheup.com/linux/linux5426.htm