RSS Search

News  Feeds  Tags  Search Shortcuts

FeedsFarm.com > Regex 101 Exercise S4 - Extract load average from a string - Discussion

Regex 101 Exercise S4 - Extract load average from a string - Discussion

18th Nov 2005, 20:30 GMT

Exercise S4 - Extract load average from a string The shop that you work with has a server that writes a log entry every hour in the following format: 8:00 am up 23 day(s), 21:34, 7 users, load average: 0.13 You need to write a utility that lets you track the load average on an hourly basis. Write a regex that extracts the time and the load average from the string. **** This is pretty close to the first thing I ever did with regular expressions. I had some logfile information I needed to process. I started writing in C++, and if you've ever tried to do lots of character manipulation in C++, you know how much fun that can be. For this sort of thing, I like to look for good delimiters. To get the time, I'll use "up" as the delimiter, which means I can match with: .+\s*up The \s is something new, it means "any whitespace character". I next need to pull out the load average. I'll use "load average:" as the delimiter, so the regex to pull that out is: load average:\s*[0-9.]+ and I can string them together to get: .+\s*up # match time .+? # skip middle section load\ average:\s*[0-9.]+ # match load average I added the middle clause to skip the characters in the middle that I don't care about. I also switched to multi-line mode, which means that I need to use RegexOptions.IgnorePatternWhitespace, and that required me to change "load average" to "load\ average" so that the regex engine wouldn't ignore the space (after I stared at it for a minute, wondering why it wasn't working...) If I run this in regex workbench, it will report: 0 => 8:00 am up 23 day(s), 21:34, 7 users, load average: 0.13 That tells me that the match worked, but not much else. What I need is a way to extract certain parts of the string, which is done with a "capture" in the regex language. The simplest form of a capture is done by enclosing part of the regex in parenthesis: (.+)\s*up # match time .+? # skip middle section load\ average:\s*([0-9.]+) # match load average Executing that gives: 0 => 8:00 am up 23 day(s), 21:34, 7 users, load average: 0.13 1 => 8:00 am 2 => 0.13 The first capture (index 0) is always the entire match, and then subsequent captures correspond to the portions of the match enclosed in parenthesis. In code, if I wanted to pull the time out, I would write something like: string time = match.Groups[1].Value; That works fine. I could declare victory, but I don't really like the "Groups[1]" part - it doesn't tell me much. Nicely, the .NET regex variant provides (as do some others) A way to name captures. That allows me to write: (?.+)\s*up # match time .+? # skip middle section load\ average:\s*(?[0-9.]+) # match load average Running that gives me: 0 => 8:00 am up 23 day(s), 21:34, 7 users, load average: 0.13 Time => 8:00 am LoadAverage => 0.13 and I could now write code that looks like: string time = match.Groups["Time"].Value; which is very clear - clear enough that I often will not bother with the local variable. That's gets us to where I wanted to get. You may have noticed that I didn't try to validate the time nor did I use anchors for the beginning and end of the string. In this example, I'm dealing with well formed text - the server log is always going to look the way that it does - and it's not worth the effort or complexity to do more than what I did.

View full story at blogs.msdn.com

Regex 101 Exercise S4 - Extract load average from a string - Discussion related news: