class String
Extensions to the String
class
TODO make riphtml() just call ircify_html
() with stronger purify options.
We start by extending the String
class with some IRC-specific methods
Extension for String
class.
String#%
method which accept “named argument”. The translator can know the meaning of the msgids using “named argument” instead of %s/%d style.
Public Instance Methods
Source
# File lib/rbot/load-gettext.rb, line 149 def %(args) if args.kind_of?(Hash) ret = dup args.each {|key, value| ret.gsub!(/\%\{#{key}\}/, value.to_s) } ret else ret = gsub(/%\{/, '%%{') begin ret._old_format_m(args) rescue ArgumentError $stderr.puts " The string:#{ret}" $stderr.puts " args:#{args.inspect}" end end end
Format - Uses str as a format specification, and returns the result of applying it to arg. If the format specification contains more than one substitution, then arg must be an Array
containing the values to be substituted. See Kernel::sprintf for details of the format string. This is the default behavior of the String
class.
(e.g.) "%s, %s" % ["Masao", "Mutoh"]
Also you can use a Hash as the “named argument”. This is recommanded way for Ruby-GetText because the translators can understand the meanings of the msgids easily.
-
hash: {:key1 => value1, :key2 => value2, … }
-
Returns: formatted
String
(e.g.) "%{firstname}, %{familyname}" % {:firstname => "Masao", :familyname => "Mutoh"}
Source
# File lib/rbot/core/utils/extends.rb, line 338 def get_html_title if defined? ::Hpricot Hpricot(self).at("title").inner_html else return unless Irc::Utils::TITLE_REGEX.match(self) $1 end end
This method tries to find an HTML title in the string, and returns it if found
Source
# File lib/rbot/irc.rb, line 332 def has_irc_glob? self =~ /^[*?]|[^\\][*?]/ end
This method checks if the receiver contains IRC glob characters
IRC has a very primitive concept of globs: a *
stands for “any number of arbitrary characters”, a ?
stands for “one and exactly one arbitrary character”. These characters can be escaped by prefixing them with a slash (\
).
A known limitation of this glob syntax is that there is no way to escape the escape character itself, so it’s not possible to build a glob pattern where the escape character precedes a glob.
Source
# File lib/rbot/irc.rb, line 289 def irc_downcase(casemap='rfc1459') cmap = casemap.to_irc_casemap self.tr(cmap.upper, cmap.lower) end
This method returns a string which is the downcased version of the receiver, according to the given casemap
Source
# File lib/rbot/irc.rb, line 298 def irc_downcase!(casemap='rfc1459') cmap = casemap.to_irc_casemap self.tr!(cmap.upper, cmap.lower) end
This is the same as the above, except that the string is altered in place
See also the discussion about irc_downcase
Source
# File lib/rbot/ircsocket.rb, line 14 def irc_send_penalty # According to eggdrop, the initial penalty is penalty = 1 + self.size/100 # on everything but UnderNET where it's # penalty = 2 + self.size/120 cmd, pars = self.split($;,2) debug "cmd: #{cmd}, pars: #{pars.inspect}" case cmd.to_sym when :KICK chan, nick, msg = pars.split chan = chan.split(',') nick = nick.split(',') penalty += nick.size penalty *= chan.size when :MODE chan, modes, argument = pars.split extra = 0 if modes extra = 1 if argument extra += modes.split(/\+|-/).size else extra += 3 * modes.split(/\+|-/).size end end if argument extra += 2 * argument.split.size end penalty += extra * chan.split.size when :TOPIC penalty += 1 penalty += 2 unless pars.split.size < 2 when :PRIVMSG, :NOTICE dests = pars.split($;,2).first penalty += dests.split(',').size when :WHO args = pars.split if args.length > 0 penalty += args.inject(0){ |sum,x| sum += ((x.length > 4) ? 3 : 5) } else penalty += 10 end when :PART penalty += 4 when :AWAY, :JOIN, :VERSION, :TIME, :TRACE, :WHOIS, :DNS penalty += 2 when :INVITE, :NICK penalty += 3 when :ISON penalty += 1 else # Unknown messages penalty += 1 end if penalty > 99 debug "Wow, more than 99 secs of penalty!" penalty = 99 end if penalty < 2 debug "Wow, less than 2 secs of penalty!" penalty = 2 end debug "penalty: #{penalty}" return penalty end
Calculate the penalty which will be assigned to this message by the IRCd
Source
# File lib/rbot/irc.rb, line 307 def irc_upcase(casemap='rfc1459') cmap = casemap.to_irc_casemap self.tr(cmap.lower, cmap.upper) end
Upcasing functions are provided too
See also the discussion about irc_downcase
Source
# File lib/rbot/irc.rb, line 316 def irc_upcase!(casemap='rfc1459') cmap = casemap.to_irc_casemap self.tr!(cmap.lower, cmap.upper) end
In-place upcasing
See also the discussion about irc_downcase
Source
# File lib/rbot/core/utils/extends.rb, line 214 def ircify_html(opts={}) txt = self.dup # remove scripts txt.gsub!(/<script(?:\s+[^>]*)?>.*?<\/script>/im, "") # remove styles txt.gsub!(/<style(?:\s+[^>]*)?>.*?<\/style>/im, "") # bold and strong -> bold txt.gsub!(/<\/?(?:b|strong)(?:\s+[^>]*)?>/im, "#{Bold}") # italic, emphasis and underline -> underline txt.gsub!(/<\/?(?:i|em|u)(?:\s+[^>]*)?>/im, "#{Underline}") ## This would be a nice addition, but the results are horrible ## Maybe make it configurable? # txt.gsub!(/<\/?a( [^>]*)?>/, "#{Reverse}") case val = opts[:a_href] when Reverse, Bold, Underline txt.gsub!(/<(?:\/a\s*|a (?:[^>]*\s+)?href\s*=\s*(?:[^>]*\s*)?)>/, val) when :link_out # Not good for nested links, but the best we can do without something like hpricot txt.gsub!(/<a (?:[^>]*\s+)?href\s*=\s*(?:([^"'>][^\s>]*)\s+|"((?:[^"]|\\")*)"|'((?:[^']|\\')*)')(?:[^>]*\s+)?>(.*?)<\/a>/) { |match| debug match debug [$1, $2, $3, $4].inspect link = $1 || $2 || $3 str = $4 str + ": " + link } else warning "unknown :a_href option #{val} passed to ircify_html" if val end # If opts[:img] is defined, it should be a String. Each image # will be replaced by the string itself, replacing occurrences of # %{alt} %{dimensions} and %{src} with the alt text, image dimensions # and URL if val = opts[:img] if val.kind_of? String txt.gsub!(/<img\s+(.*?)\s*\/?>/) do |imgtag| attrs = Hash.new imgtag.scan(/([[:alpha:]]+)\s*=\s*(['"])?(.*?)\2/) do |key, quote, value| k = key.downcase.intern rescue 'junk' attrs[k] = value end attrs[:alt] ||= attrs[:title] attrs[:width] ||= '...' attrs[:height] ||= '...' attrs[:dimensions] ||= "#{attrs[:width]}x#{attrs[:height]}" val % attrs end else warning ":img option is not a string" end end # Paragraph and br tags are converted to whitespace txt.gsub!(/<\/?(p|br)(?:\s+[^>]*)?\s*\/?\s*>/i, ' ') txt.gsub!("\n", ' ') txt.gsub!("\r", ' ') # Superscripts and subscripts are turned into ^{...} and _{...} # where the {} are omitted for single characters txt.gsub!(/<sup>(.*?)<\/sup>/, '^{\1}') txt.gsub!(/<sub>(.*?)<\/sub>/, '_{\1}') txt.gsub!(/(^|_)\{(.)\}/, '\1\2') # List items are converted to *). We don't have special support for # nested or ordered lists. txt.gsub!(/<li>/, ' *) ') # All other tags are just removed txt.gsub!(/<[^>]+>/, '') # Convert HTML entities. We do it now to be able to handle stuff # such as txt = Utils.decode_html_entities(txt) # Keep unbreakable spaces or conver them to plain spaces? case val = opts[:nbsp] when :space, ' ' txt.gsub!([160].pack('U'), ' ') else warning "unknown :nbsp option #{val} passed to ircify_html" if val end # Remove double formatting options, since they only waste bytes txt.gsub!(/#{Bold}(\s*)#{Bold}/, '\1') txt.gsub!(/#{Underline}(\s*)#{Underline}/, '\1') # Simplify whitespace that appears on both sides of a formatting option txt.gsub!(/\s+(#{Bold}|#{Underline})\s+/, ' \1') txt.sub!(/\s+(#{Bold}|#{Underline})\z/, '\1') txt.sub!(/\A(#{Bold}|#{Underline})\s+/, '\1') # And finally whitespace is squeezed txt.gsub!(/\s+/, ' ') txt.strip! if opts[:limit] && txt.size > opts[:limit] txt = txt.slice(0, opts[:limit]) + "#{Reverse}...#{Reverse}" end # Decode entities and strip whitespace return txt end
This method will return a purified version of the receiver, with all HTML stripped off and some of it converted to IRC formatting
Source
# File lib/rbot/core/utils/extends.rb, line 324 def ircify_html!(opts={}) old_hash = self.hash replace self.ircify_html(opts) return self unless self.hash == old_hash end
As above, but modify the receiver
Source
# File lib/rbot/core/utils/extends.rb, line 349 def ircify_html_title self.get_html_title.ircify_html rescue nil end
This method returns the IRC-formatted version of an HTML title found in the string
Source
# File lib/rbot/core/utils/extends.rb, line 332 def riphtml self.gsub(/<[^>]+>/, '').gsub(/&/,'&').gsub(/"/,'"').gsub(/</,'<').gsub(/>/,'>').gsub(/&ellip;/,'...').gsub(/'/, "'").gsub("\n",'') end
This method will strip all HTML crud from the receiver
Source
# File lib/rbot/botuser.rb, line 119 def to_irc_auth_command Irc::Bot::Auth::Command.new(self) end
Returns an Irc::Bot::Auth::Comand from the receiver
Source
# File lib/rbot/irc.rb, line 275 def to_irc_casemap begin Irc::Casemap.get(self) rescue # raise TypeError, "Unkown Irc::Casemap #{self.inspect}" error "Unkown Irc::Casemap #{self.inspect} requested, defaulting to rfc1459" Irc::Casemap.get('rfc1459') end end
This method returns the Irc::Casemap
whose name is the receiver
Source
# File lib/rbot/irc.rb, line 1513 def to_irc_channel(opts={}) Irc::Channel.new(self, opts) end
We keep extending String
, this time adding a method that converts a String
into an Irc::Channel
object
Source
# File lib/rbot/irc.rb, line 1318 def to_irc_channel_topic Irc::Channel::Topic.new(self) end
Returns an Irc::Channel::Topic
with self as text
Source
# File lib/rbot/irc.rb, line 915 def to_irc_netmask(opts={}) Irc::Netmask.new(self, opts) end
We keep extending String
, this time adding a method that converts a String
into an Irc::Netmask
object
Source
# File lib/rbot/irc.rb, line 339 def to_irc_regexp regmask = Regexp.escape(self) regmask.gsub!(/(\\\\)?\\[*?]/) { |m| case m when /\\(\\[*?])/ $1 when /\\\*/ '.*' when /\\\?/ '.' else raise "Unexpected match #{m} when converting #{self}" end } Regexp.new("^#{regmask}$") end
This method is used to convert the receiver into a Regular Expression that matches according to the IRC glob syntax
Source
Source
# File lib/rbot/core/utils/extends.rb, line 355 def wrap_nonempty(pre, post, opts={}) if self.empty? String.new else "#{pre}#{self}#{post}" end end
This method is used to wrap a nonempty String
by adding the prefix and postfix