org.knowceans.util
Class PatternString

java.lang.Object
  extended by org.knowceans.util.PatternString
All Implemented Interfaces:
java.lang.CharSequence, java.util.regex.MatchResult

public class PatternString
extends java.lang.Object
implements java.lang.CharSequence, java.util.regex.MatchResult

PatternString is a wrapper around pattern matching and substitution functionality inspired by the Perl a =~ exp functions.

This implementation puts less emphasis on performance (by putting high priority to reusing Matchers), but is intended for easy use, as most time usually is used for development / porting rather than for operation... The idea is that String-like data can be used for matching operations much more convenient as PatternString objects than by using the final classes String, StringBuffer, Matcher in the Java API.

Further, the methods are organised a bit different from the standard Java API. Basically, there are three operations, find(), match() and substitute(). Unlike their API counterpart find(), which returns a boolean, find() returns this (a substring copy with an empty matcher), in order to allow concatenations. In a subsequent loop, the function found() returns the status of the last find() and operation, and findNext() can be called to advance the parser.

The API functions replaceFirst() and replaceAll() usually return new instances of the String (as it is immutable). The corresponding substitute(), however, returns this and resets the internal matcher if it was global. For the non-global version, the current state of the matcher, i.e. its current region is used. Therefore, it is possible to run through a string using find() or findNext() (findNext() can only be called after find()) and substitute() [TODO: this hot-needle code must be thoroughly tested!].

TODO: make pattern string with a constant (and pre-compilable pattern). TODO: fix problems with cascading substitution.

Author:
gregor heinrich arbylon.net

Field Summary
 boolean debug
           
 
Constructor Summary
PatternString()
          Create a empty PatternString.
PatternString(java.lang.String text)
          Create a PatternString from the input, with an empty matcher.
PatternString(java.lang.StringBuffer text)
          Create a PatternString from the input, with an empty matcher.
 
Method Summary
 char charAt(int index)
           
 void configureMatcher(java.util.regex.Pattern p)
          sets the matcher with the new pattern
 PatternString copy()
          copies the pattern string content with an new matcher set to the region of the current one but the matchresult my set, ie., all references to groups information of the last match are kept.
static PatternString create(java.lang.String s)
          convenience method to get a pattern string.
 java.lang.StringBuffer debugPatternString(java.lang.String pattern)
          parses the pattern and outputs the capturing and non-capturing group positions.
 java.lang.String debugString()
          shows which groups have matched which strings.
 int end()
           
 int end(int group)
           
 PatternString find(java.lang.String expression)
          Emulates a perl find expression like: this =~ /expression/
 PatternString find(java.lang.String expression, int flags)
          emulates a perl find expression like: this =~ /expression/perlFlags
 PatternString find(java.lang.String expression, java.lang.String perlFlags)
          emulates a perl find expression like: this =~ /expression/perlFlags
 java.util.Vector<PatternString> findAll(java.lang.String expression)
          Emulates a repeated perl find like: foreach this =~ /expression/ \@a += \@_.
 java.util.Vector<PatternString> findAll(java.lang.String expression, java.lang.String perlFlags)
          Emulates a repeated perl find like: foreach this =~ /expression/ @@a += @@_.
 java.util.Vector<PatternString> findAll(java.lang.String expression, java.lang.String replacement, java.lang.String perlFlags)
          Finds all occurrences of the expression and substitutes them with the replacement.
 PatternString findNext()
          Finds the next occurrence of the pattern and returns it (i.e., return group(0)).
 boolean found()
          Returns true whether the last find or matching operation has been successful, i.e., the pattern has been found.
 int getFlags()
           
 java.util.regex.Matcher getM()
           
 java.util.regex.Matcher getMatcher()
           
 java.util.regex.Pattern getPattern()
           
 java.lang.StringBuffer getText()
           
 java.lang.String group()
           
 java.lang.String group(int number)
          return the group with the number after the last match
 int groupCount()
           
 PatternString groupP(int number)
          return the group with the number after the last match
 int length()
           
static void main(java.lang.String[] args)
           
 boolean match(java.lang.String expression, int flags)
          emulates a perl match expression like: this =~ /expression/perlFlags
 boolean match(java.lang.String expression, java.lang.String perlFlags)
          Emulates a perl matching expression like: this =~ m/expression/perlFlags
 boolean matched()
          Same as found().
 boolean matcherUptodate(java.lang.String expression, int flags)
          Returns whether the matcher is up to date or must be set with new parameters.
 boolean nperl(java.lang.String patternCommand)
          Like perl, but resets the parser before.
 boolean perl(java.lang.String patternCommand)
          Perform the command in a perl specification on this and return this, e.g., this =~ s/exp/subs/flags will call substitute(exp, subs, flags).
 void region(int start, int end)
          sets the region for this pattern string
 int regionEnd()
          return the end of the internal matcher's region
 int regionStart()
          return the start of the internal matcher's region
 void reset()
          resets the matcher in order to allow new parsing.
 void setFlags(int flags)
           
 void setM(java.util.regex.Matcher m)
           
 void setText(java.lang.StringBuffer b)
           
 int start()
           
 int start(int group)
           
 java.lang.CharSequence subSequence(int start, int end)
           
 PatternString substitute(java.lang.String expression, java.lang.String replacement)
          emulates a perl substitution expression like: this =~ s/expression/replacement/
 PatternString substitute(java.lang.String expression, java.lang.String replacement, int flags, boolean replaceRemaining)
          emulates a perl substitution expression like: this =~ s/expression/replacement/perlFlags
 PatternString substitute(java.lang.String expression, java.lang.String replacement, java.lang.String perlFlags)
          emulates a perl substitution expression like: this =~ s/expression/replacement/perlFlags
 PatternString substituteAll(java.lang.String expression, java.lang.String replacement)
          performs global replace of the string expression with the replacement.
 java.lang.String toString()
           
 int translatePerlFlags(java.lang.String perlFlags)
          add optional flags g - global, otherwise only first occurrence.
 java.lang.String variable(java.lang.String perlVar)
          return the variable with the Perl name perlVar, e.g., $_ for last match.
 PatternString variablePattern(java.lang.String perlVar)
          return the variable with the Perl name perlVar, e.g., $_ for last match.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

debug

public boolean debug
Constructor Detail

PatternString

public PatternString(java.lang.String text)
Create a PatternString from the input, with an empty matcher.

Parameters:
text -

PatternString

public PatternString(java.lang.StringBuffer text)
Create a PatternString from the input, with an empty matcher.

Parameters:
text -

PatternString

public PatternString()
Create a empty PatternString. The object text can be changed using setB().

Parameters:
s -
Method Detail

main

public static void main(java.lang.String[] args)

copy

public PatternString copy()
copies the pattern string content with an new matcher set to the region of the current one but the matchresult my set, ie., all references to groups information of the last match are kept.

Returns:

create

public static PatternString create(java.lang.String s)
convenience method to get a pattern string.

Parameters:
s -
Returns:

nperl

public boolean nperl(java.lang.String patternCommand)
Like perl, but resets the parser before.

Parameters:
patternCommand -
Returns:

perl

public boolean perl(java.lang.String patternCommand)
Perform the command in a perl specification on this and return this, e.g., this =~ s/exp/subs/flags will call substitute(exp, subs, flags).

FIXME: with cascaded substitution, the string of the matcher is always reset to the first value (the field text diverges from the internal state of the matcher).

Parameters:
patternCommand - -- everything that appears right of a =~ in Perl, i.e., the expression includes commands and delimiters. Examples: /abc/ for finding, s/x(\d+)/u$1/gi for substituting all x34 or X34 etc. with u34 etc.
Returns:
whether the pattern could be matched (and with substitution, whether the new string is different from the old one)

find

public PatternString find(java.lang.String expression,
                          java.lang.String perlFlags)
emulates a perl find expression like: this =~ /expression/perlFlags

Parameters:
expression -
perlFlags - (see putPerlFlags)
Returns:

find

public PatternString find(java.lang.String expression,
                          int flags)
emulates a perl find expression like: this =~ /expression/perlFlags

Parameters:
expression -
perlFlags - Pattern.compile
Returns:

find

public PatternString find(java.lang.String expression)
Emulates a perl find expression like: this =~ /expression/

Parameters:
expression -
Returns:

findAll

public java.util.Vector<PatternString> findAll(java.lang.String expression)
Emulates a repeated perl find like: foreach this =~ /expression/ \@a += \@_.

Returns the array of strings found. After this, found will be false because the search is exhaustive and the matcher of this pattern string is positioned at the end of the last match. Can be used to use find in a Java foreach construct. Use reset() to start at the beginning.

Parameters:
expression -
Returns:

findAll

public java.util.Vector<PatternString> findAll(java.lang.String expression,
                                               java.lang.String perlFlags)
Emulates a repeated perl find like: foreach this =~ /expression/ @@a += @@_. Returns the array of strings found. After this, found will be false because the search is exhaustive and the matcher of this pattern string is positioned at the end of the last match. Can be used to use find in a Java foreach construct. Use reset() to start at the beginning.

Parameters:
expression -
Returns:

findAll

public java.util.Vector<PatternString> findAll(java.lang.String expression,
                                               java.lang.String replacement,
                                               java.lang.String perlFlags)
Finds all occurrences of the expression and substitutes them with the replacement. Does not change the internal string but resets the internal matcher and those of the generated pattern strings.

Parameters:
expression -
Returns:

findNext

public PatternString findNext()
Finds the next occurrence of the pattern and returns it (i.e., return group(0)). The success of this find() operation can be checked with found(), and the actual groups can be checked with group(...).

Returns:

found

public boolean found()
Returns true whether the last find or matching operation has been successful, i.e., the pattern has been found.

Returns:

match

public boolean match(java.lang.String expression,
                     java.lang.String perlFlags)
Emulates a perl matching expression like: this =~ m/expression/perlFlags

Parameters:
expression -
perlFlags - (see putPerlFlags)
Returns:
the matched string

match

public boolean match(java.lang.String expression,
                     int flags)
emulates a perl match expression like: this =~ /expression/perlFlags

Parameters:
expression -
perlFlags - Pattern.compile
Returns:

matched

public boolean matched()
Same as found().

Returns:

substitute

public PatternString substitute(java.lang.String expression,
                                java.lang.String replacement,
                                java.lang.String perlFlags)
emulates a perl substitution expression like: this =~ s/expression/replacement/perlFlags

Parameters:
expression -
replacement -
perlFlags - (see putPerlFlags)
Returns:
this

substitute

public PatternString substitute(java.lang.String expression,
                                java.lang.String replacement,
                                int flags,
                                boolean replaceRemaining)
emulates a perl substitution expression like: this =~ s/expression/replacement/perlFlags

Parameters:
expression -
replacement -
perlFlags - (see Pattern)
replaceRemaining - whether to substitute all remaining occurrences or only the next one (prior reset() if you want to replace all.)
Returns:
this

substitute

public PatternString substitute(java.lang.String expression,
                                java.lang.String replacement)
emulates a perl substitution expression like: this =~ s/expression/replacement/

Parameters:
expression -
replacement -
Returns:
this

substituteAll

public PatternString substituteAll(java.lang.String expression,
                                   java.lang.String replacement)
performs global replace of the string expression with the replacement. After the operation, the internal string buffer is filled with the substitute (and the original string lost).

Parameters:
expression -
replacement -
Returns:
this

reset

public void reset()
resets the matcher in order to allow new parsing.


matcherUptodate

public boolean matcherUptodate(java.lang.String expression,
                               int flags)
Returns whether the matcher is up to date or must be set with new parameters. This allows to call find(exp, flags) etc. with parameters in a while loop and avoid to have to call find() after the first match separately.

Note: The expression is checked by reference, i.e., a new instance of the variable expression yields a restarting loop! TODO: check if this makes sense in practice, e.g., with on the fly string concatenations. If not, change == to equals.

Parameters:
expression -
flags -
Returns:

configureMatcher

public void configureMatcher(java.util.regex.Pattern p)
sets the matcher with the new pattern

Parameters:
p -

getMatcher

public java.util.regex.Matcher getMatcher()

getPattern

public java.util.regex.Pattern getPattern()

length

public int length()
Specified by:
length in interface java.lang.CharSequence

charAt

public char charAt(int index)
Specified by:
charAt in interface java.lang.CharSequence

subSequence

public java.lang.CharSequence subSequence(int start,
                                          int end)
Specified by:
subSequence in interface java.lang.CharSequence

translatePerlFlags

public int translatePerlFlags(java.lang.String perlFlags)
add optional flags

Parameters:
perlFlags -
Returns:
the corresponding the Pattern flags value.

group

public java.lang.String group(int number)
return the group with the number after the last match

Specified by:
group in interface java.util.regex.MatchResult
Parameters:
number -
Returns:

groupP

public PatternString groupP(int number)
return the group with the number after the last match

Parameters:
number -
Returns:

regionStart

public int regionStart()
return the start of the internal matcher's region

Returns:

regionEnd

public int regionEnd()
return the end of the internal matcher's region

Returns:

region

public void region(int start,
                   int end)
sets the region for this pattern string

Parameters:
start -
end -

start

public int start()
Specified by:
start in interface java.util.regex.MatchResult

end

public int end()
Specified by:
end in interface java.util.regex.MatchResult

start

public int start(int group)
Specified by:
start in interface java.util.regex.MatchResult

end

public int end(int group)
Specified by:
end in interface java.util.regex.MatchResult

group

public java.lang.String group()
Specified by:
group in interface java.util.regex.MatchResult

groupCount

public int groupCount()
Specified by:
groupCount in interface java.util.regex.MatchResult

variablePattern

public PatternString variablePattern(java.lang.String perlVar)
return the variable with the Perl name perlVar, e.g., $_ for last match.

Parameters:
perlVar -
Returns:

variable

public java.lang.String variable(java.lang.String perlVar)
return the variable with the Perl name perlVar, e.g., $_ for last match.

Parameters:
perlVar -
Returns:

debugString

public java.lang.String debugString()
shows which groups have matched which strings.

Returns:

debugPatternString

public java.lang.StringBuffer debugPatternString(java.lang.String pattern)
parses the pattern and outputs the capturing and non-capturing group positions.

Parameters:
pattern -
Returns:

toString

public java.lang.String toString()
Specified by:
toString in interface java.lang.CharSequence
Overrides:
toString in class java.lang.Object

getText

public final java.lang.StringBuffer getText()

setText

public final void setText(java.lang.StringBuffer b)

getFlags

public final int getFlags()

setFlags

public final void setFlags(int flags)

getM

public final java.util.regex.Matcher getM()

setM

public final void setM(java.util.regex.Matcher m)