Date Regex; Leap Years

It’s suprising what can be expressed using regular expressions. I tend to think of these at syntactic, but sometimes they can seem almost semantic.

Take dates, for instance, in the format “DD/MM/YYYY”

The months have varying numbers of days. In the Gregorian calendar, a year is a leap year if it is divisible by 4, unless it’s divisible by 100, unless it’s divisible by 400. So, for example, the years 2008 and 2012 are leap years, but the year 2100 is not, but the year 2000 is. This can all be expressed in a regex.

Here’s some pseudo-code:

### in this pseudo-syntax
### character # to end-of-line is a comment
### "...$variable$..." denotes inline string substitution
### regexes ignore whitespace (like Python verbose regex syntax)
### even in a regex, character # to end-of-line is a comment (ditto)
### parentheses (...) are non-capturing in regexes;
###   the more-usual syntax would be (?:...)

regex_01_to_28 := "(0[1-9]|1[0-9]|2[0-8])";
regex_01_to_29 := "(0[1-9]|[12][0-9])";
regex_01_to_30 := "(0[1-9]|[12][0-9]|30)";
regex_01_to_31 := "(0[1-9]|[12][0-9]|3[01])";

#regex_00_to_96_by_4s := "([02468][048]|[13579][26])";
regex_04_to_96_by_4s := "([02468][48]|[13579][26]|[2468]0)";

regex_0001_to_9999 :=
  "(
    [0-9]{3}[1-9]|
    [0-9]{2}[1-9][0-9]|
    [0-9][1-9][0-9]{2}|
    [1-9][0-9]{3}
  )";

### pattern "DD/MM/YYYY"
### YYYY from 0001 to 9999 (as there was no year 0000)
### assume Gregorian calendar throughout this period (even before 1752)

regex_valid_date :=
  "(
    (
      # February only to the 28th
      ($regex_01_to_28$/(02))|
      # April, June, September, November have 30 days
      ($regex_01_to_30$/(04|06|09|11))|
      # 'all the rest' have 31 days
      ($regex_01_to_31$/(01|03|05|07|08|10|12))
    # any century, any year (except 0000)
    )/$regex_0001_to_9999$
  )|(
    # February 29th
    (29/02)/(
      # in any century: years 04, 08, ..., 96 (but not 00)
      ([0-9]{2}$regex_04_to_96_by_4s$)|
      # or in centuries 04, 08, ..., 96: year 00
      ($regex_04_to_96_by_4s$00)
    )
  )";

It should not be difficult to translate this pseudo-code into any language supporting regular expressions.

(You might wish to restrict the years more than I have, for example, from 1800 to 2200.)

Advertisements

4 Responses to “Date Regex; Leap Years”

  1. Sirish Pathak Says:

    After searching a lot I got the pseudo-code and its working fine.
    I admire you.
    (“^((((0[1-9]|1[0-9]|2[0-8])/(02))|((0[1-9]|[12][0-9]|30)/(04|06|09|11))|((0[1-9]|[12][0-9]|3[01])/(01|03|05|07|08|10|12)))/([0-9]{3}[1-9]|[0-9]{2}[1-9][0-9]|[0-9][1-9][0-9]{2}|[1-9][0-9]{3}))|((29/02)/(([0-9]{2}([02468][48]|[13579][26]|[2468]0))|(([02468][48]|[13579][26]|[2468]0)00)))$”);

    • Rob Says:

      Thanks for your comment.

      Please note that in some implementations of regular expressions, the grouping syntax (…) is used to capture sub-string matches. If the expression is used in a context where it is merely checked for whether or not it’s a match [e.g. with XSD’s <xsd:simpleType/xsd:restriction/xsd:pattern/@value> or the grep program], then the (…) and your string are fine.

      If on the other hand, the context is one where the matching sub-expressions may be used in a replacement string [e.g. with Python’s re.match(), or Boost in C++] then you might wish to make most of the groups non-capturing with the (?:…) syntax. In the extreme case of making no captures at all, your string would become

      “^(?:(?:(?:(?:0[1-9]|1[0-9]|2[0-8])/(?:02))|(?:(?:0[1-9]|[12][0-9]|30)/(?:04|06|09|11))|(?:(?:0[1-9]|[12][0-9]|3[01])/(?:01|03|05|07|08|10|12)))/(?:[0-9]{3}[1-9]|[0-9]{2}[1-9][0-9]|[0-9][1-9][0-9]{2}|[1-9][0-9]{3}))|(?:(?:29/02)/(?:(?:[0-9]{2}(?:[02468][48]|[13579][26]|[2468]0))|(?:(?:[02468][48]|[13579][26]|[2468]0)00)))$”

      Capturing groups with the above is awkward, due to the rather unintuitive nesting. For example, the day of the month would appear in different captured groups, depending upon which month it is. (Groups are ordered leftmost/outermost.)

      I would recommend using a separate, simpler regex for that, once a match has been confirmed:
      “^([0-9]{2})/([0-9]{2})/([0-9]{4})$”
      or with named groups:
      “^(?P<day>[0-9]{2})/(?P<month>[0-9]{2})/(?P<year>[0-9]{4})$”

      I hope that helps.

  2. Rob Says:

    See also http://stackoverflow.com/questions/20448/what-is-the-most-brilliant-regex-youve-ever-used/ for more regex magic.

  3. Rob Says:

    By the way, I recommend The Regex Coach for experimentation with regular expressions. I also have another post about free software.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: