PRC2

The state of things

Implementing behavior using state machines.

Systems sometimes have to exhibit some kind of behavior that is different observable output over time. This is different from a pure function, which will always produce the same output for the same input. The behavior differs depending on the 'history' the system experienced, typically in the form of 'events' or method calls that happened over time. Such method calls change the state the system is in. We call such behavior state-full and this way of operating is often implemented using a 'state machine'. In this week and in the AADE course we will be using two concepts:

  1. The state (machine) diagram
  2. The State pattern

lamp battery switch A state machine is a software concept used to implement the behavior of something whose 'reaction' or outcome depends on the events that happened over time. A very simple example is a lamp and a switch. When properly connected and none of the parts is defect, the bulb will light up when the switch is in the on state. As state machine: the switch passes current when it has been turned on, which can be expressed as a turn-on event, making the bulb light up. From the diagram it should be obvious that a current can only flow if the switch is in the on (or closed) state.

In software we model this as the path the code takes, which then depends on some condition or value.

switch The state diagram is a way to express the behavior in a graphical manner. Each state of the system is drawn as a box with rounded corners, like in the diagram above. In the GoF State pattern, which is an object oriented way of modeling state behavior, one uses different objects to realize the effect of changing the software path. The objects all implement the same interface, but have a different implementation. Then, whenever a state switching event occurs, the current state object is replaced by the state object for the new state.

Transition
Figure 1. Transitions

From AADE course, we learned that a life cycle of a system changes throw using by so called transitions. To change the state from wait to lock, trigger event was needed with a guard (condition) and then we did some activity.

The killer rabit
Figure 2. state machine diagram

railway switch As another analogy, you can think of a railway switch. The active state object will always have the same behavior. One makes the train go straight, the other makes it go right. Each state on itself always does the same, like a pure function, so with the same input (train) the same output is produced. The switching between states makes it possible to control the direction of the train traffic.

State machines or state charts can be implemented in several ways. In this week we will apply an OO approach using the well known State Pattern , combined with enums and some Java 8 features, in particular default methods.

The State Pattern has the following UML class diagram:

StatePattern
Figure 3. State Pattern

We deal with a context, an abstract state and multiple concrete state implementations that do the work and represent the state elements in the diagram. In the diagram above there would be a state instance for both ConcreteStateA and ConcreteStateB. In the implementation of the exercise of this week we will be using enum as the basic construct for our work.

Using a functional state approach by using enum has the following advantages:

  • There is no chance of creating new states. Enum ensures that the only instances there ever will be are created at class loading time. You always know all states there ever will be.
  • A state can be implemented as a pure function, making the number of tests needed minimal. The inputs to the state are the state of the context (or technically, the state stored in the context), the inputs, and the event or method call.
  • Such functions and thereby state instances can be shared between contexts, because functional states do not have any state themselves. The actual state is saved in the context. This is not because enums cannot have state, they can, but because a functional state in a state-machine shouldn’t. It IS the state and should not have any. It should always do the same. More importantly, they are immutable, meaning no one can break them by setting some wrong value.
  • A functional state is easily tested without a need for a complex context. The context can easily be mocked as can be seen in this week’s olifantysballs exercise.

The disadvantage of using functional states in this way is that they cannot keep information themselves, such as counting something. But quite often the counting involves only a few possible values, which can be states themselves, or can be realized in the context. As an example for few counting states you might consider a traffic light which counts green, yellow, red.

Warning:

A common error made by novice programmers is to call a method inside a constructor that takes the constructor’s type as a parameter.

Do not do this inside the constructor
    StateMachine(){
      this.initialState = State.IDLE;
      this.initialState.enter( this ); 1

    }
  1. Problematic call of someField.enter(this) inside constructor. Something to avoid.

The problem is that the constructor may not yet be complete, in particular when fields are defined and initialized later in the constructor or even after the constructor definition.

To avoid this problem, you typically have some init() method that should be called after construction or use a factory method that does something similar. In the gumball machine we chose the later. You can find the details in the javadoc documentation of the exercise.

Use init method.
StateMachine(){
  this.initialState = State.IDLE;

}

StateMachine init(){
  this.initialState.enter( this ); 1
  return this;
}
Caller has not much extra work.
    StateMachine sm = new StateMachine().init();

State machines and regular expressions

You can find a lot on the relation between state machines and regular expressions. In most cases, including in Java, the regex is translated or compiled into a statemachine (just) before matches are attempted.

A nice illustration is given by the REGEXPER website, that turns a regex into a railroad diagram. This make the understanding of a regular expression way easier.

A simple regex : Dutch postal code: "\d{4}\s[A-Z]{2}"

GitHub for simple regex tester

postcode regex
Figure 4. with diagram

The postcode railroad diagram reads as a digit followed by 3 more digits, a space, a letter and one more letter.

More complex example:
// note that you can break off the regex in java and intersperse it with comments.
String emailRegex=
      // quite some rubbish before the at sign
     "^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+"+
     "@"+ // the at sign followed by a more restricted set of chars,
          //  with some lengths restrictions
     "[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?"+ // optional non capt grp
     // any number of letter-number combos
     "(?:\\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$";
email regex
Figure 5. Produces this 'railroad' diagram. which is equivalent to a state machine diagram.

Note that there is not one TRUE regex for email addresses. If you want to know the nitty-gritty details of valid email addresses look at RFC 8398 or RFC 5322, or ask in CSAR class.

Warning:

Once you understand that regexes are state machine specifications, and that it is easy to create state machines that never 'terminate', you understand that regular expressions can also be dangerous.

Accepting a regular expression from any source may cause a denial of service attack, because such expression may put your server into an endless loop.

Advice: only accept a restricted set of regexes, maybe parse them before use. In particular avoid back tracking constructs.

Regular expression Denial of Service - ReDoS

So beware.

Regular expressions

An often heard saying is: When you have a problem that can be solved with a regular expression, you actually have two problems, the problem and how to write the expression.

There is indeed some truth in the saying: Regular expressions can be hard to read (and hence maintain), and sometimes hard to write as well.

The problem lies in the fact that the format of the most popular form, the Perl dialect, is quite terse. However you can do quite a few things to improve at least their readability.

Basic regex syntax and rules

A simple string is also a regular expression. As far as there are no meta characters involved, a string simply matches itself.

  • The first meta character you need to know is the dot '.', which matches any character. So a line with a dot will match any line that is the same as the line with any substution for the dot character (including itself).
  • The second meta character is the backslash '\'. It takes the special meaning (as in meta) of the following character away. Back to our matching line: if you want exactly a dot at the place of the dot, prefix it with a backslash, so it will be understood as a dot. Like this: [red]"\.". But since we using Java, and the backslash already has a special role in strings, you must double the backslash in most cases, thus: "\\.".
  • The next are the quantifier: how many of something do you want.
    • ? 0 or one, aka at most. So "a?" means at most one a.
    • + at least one.
    • * any number of times, including zero.
    • numberic quantifier, using braces ('{' and '}''). For instance "a{1,4}" means a between 1 and 4 times, inclusive. You can also write "a{3}" meaning exaclty 3 as. Or a lower (a{2,} for at least 2 a) or upper boundary (a{,12}) at most 12.
  • character classes
    • user specified: [aeiou] matches the English vowels, [a-p] the first 16 characters of the alphabet, [a-pA-P] the same, but ignoring case.
  • predefined :
    • \w = word, which is the set of characters in [a-zA-Z_0-9], so a through z in both upper and lower case, the underscore and the digits 0-9.
    • \W negates the class, in this case it matches the non-word characters
    • \d the decimal digits
    • \D not a digit. etc.
charMeaning
.match all
+at least one quantifier
?ate most one
{start of quantifier spec
}end of quantifier spec
[start character class
]end character class

The definition as far as java is concerned is given in the java.util.Pattern class. You do not have to know all details, the summary above suffices for the most cases. The java doc of pattern gives a complete overview.

Grouping

Sometimes you are not just interested in a match or not, but want to capture parts of the input for further processing. For that you use the parenthesis '(' and ')'. A group, when found, will get a number (or even a name, but that is not in this lesson). Group 0 is typically the whole expression, and number one the first group identified with the parenthesis.

So if you have a regular expression string line "a(e|i|o|u)", group number one will be the group containing the vowel following a, if any.

To get acquainted with regular expressions, it is very helpful to write some tests, to verify your pattern or assumptions on the pattern.

Matches, Matchers, match Groups, and group names.

The java.util.regex package contains the classes to work with regular expressions. The most important are the Pattern and Matcher classes.

A Pattern is a compiled representation of a regular expression. It is used to create a Matcher object that will match the regular expression against an input string.

In a pattern you can define groups of interest, for instance values that you want to extract from the input string. The Matcher object will then contain the groups that were found in the input string. You create a group by surrounding the part of the pattern with parenthesis.

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class Main {
    public static void main(String[] args) {
        String input = "The quick brown fox jumps over the lazy dog";
        String pattern = "The (\\w+) brown (\\w+) jumps over the (\\w+) dog"; 1
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(input);
        if (m.find()) { 2
            System.out.println("Found a match");
            System.out.println("Group 0: " + m.group(0)); 3
            System.out.println("Group 1: " + m.group(1));
            System.out.println("Group 2: " + m.group(2));
            System.out.println("Group 3: " + m.group(3));
        } else {
            System.out.println("No match found");
        }
    }
}
  1. There are three groups in the pattern. The first group matches the word after "The", the second group matches the word after "brown", and the third group matches the word after second "the".
  2. There are several methods to match a pattern against an input string. The find method is the most common one. It returns true if the pattern is found in the input string, and false otherwise. There is also a matches() method, which returns true if the pattern matches the whole input string, and false otherwise. The third method is lookingAt(), which returns true if the pattern matches the beginning of the input string, and false otherwise.
  3. Group 0 is the whole match. The other groups are the parts of the input string that match the groups in the pattern.

Named groups and comments to the regex

You can give the groups names, so the code becomes less brittle when the pattern changes. This is done by using the syntax (?<name>pattern).

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class Main {
    public static void main(String[] args) {
        String input = "The quick brown fox jumps over the lazy dog";
        String pattern =
                      "The (?<first>\\w+)" // comment 1
                      +" brown (?<second>\\w+)" // comment 2
                      +" jumps over the (?<third>\\w+) dog"; // comment 3
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(input);
        if (m.find()) {
            System.out.println("Found a match");
            System.out.println("Group 0: " + m.group(0));
            System.out.println("Group first: " + m.group("first"));
            System.out.println("Group second: " + m.group("second"));
            System.out.println("Group third: " + m.group("third"));
        } else {
            System.out.println("No match found");
        }
    }
}

The comments are not part of the pattern, but are there to explain the pattern. They are ignored by the compiler. But they are very useful to explain the pattern to others, or to yourself when you come back to the code after a while.

Useful web sites