Skip to main content

Postprocessing RegExTratcor output for analysis

In order to analyze your data in a chart tool like Excel, you have to to manage five steps.
  1. Define your search terms
  2. Search your files with RegExTractor
  3. Create a transformation file
  4. Transform your xml
  5. Import the transformed xml into Excel and analyze your data.
RegExTractor don't want to reinvent the wheel. It's just closing a gap. It enables you to "convert" a text file (or a part of it) into xml. I've already explained the main principles of point 1 and 2 in the "Getting Started" tutorials. And this is all RegExTractor is doing for you.

For the next steps we'd like to use mature tools and technologies instead of inventing new ones. In this post I'd like to show, how to go the whole way to get an Excel chart out of your data.

Remember the example log from the "Getting Started" tutorials. As this is a log from one of my applications I know, that every message contains the class name and the method from which the log entry was written.

What I'd like to know now is, how often a class or method has been called. We're not talking about a debug session, we're talking about real life data, about real users who left their traces in our logs. We may gain knowledge about the most used functions, for example.

In our search term file we will define a search term similar to the one we've used in the "Getting Started" tutorials. We'd like to extract:
  • Date
  • Time
  • Value between the first square bracktes (class name)
  • Value between the second square brackets (method name)
So our search term (using regular expressions) will look like this to describe our needs:

As I mentioned this is not a tutorial of how to use regular expression. You should look for such a tutorial to get familiar with regular expressions!

The result xml after processing the log files with RegExTractor will look like this:

This is a generic RegExTractor format. You could try to import this xml into Excel - what will work at all. But finally, this xml is to complex for Excel to satisfy our needs. So we have to simplify our xml, using xsl transformation:

We transform our RegExTractor format into a more simpler xml.

Finally, this xml could be imported into Excel to analyse our data to gain knowledge about our system.
Once you've created your search terms and your xsl transformation file, the whole process could be automated to get the data from your logs within seconds.

I will show such an automation scenario in another post.

Popular posts from this blog

RegExTractor: Getting Started (Part 1)

In this tutorial I like to show you the main functionalities of RegExTractor. RegExTractor is build for complex searches. Keep in mind not to use RegExTractor if there is an easier way to accomplish your task!

You may download the latest release of RegExTractor on GitHub.

For this example we have a file folder containing some application log files:

These files look like this one here and we're interested in how often the application has been started.

Create a search term file A search term file is a simple text file and as the name implies this file will contain all our search terms. In our simple example we will search for a single text term: "Application Started".

Search with RegExTractor
Now we open RegExTractor and choose the file folder which contains our example files. Decide if you like to search recursive in sub folders or if you like to search the top folder only. You may also apply a filter for files, if maybe just files with the extension *.log are in scope.

RegExTractor: Getting Started (Part 2)

In this post I will show how RegExTractor will use regular expressions as search terms.

As seen in part 1 of this tutorial RegExTractors search result show us all findings of the provided search term "Application started". But this is not our goal. We'd like to know all dates and times when our application was started.

RegExTractor supports regular expressions. It's assumed that you're familiar with regular expressions.

The things we are interested in is the date, the time and the text "Application started". So we build our regular expressions using brackets to define our match groups.
(\d{2}.\d{2}.\d{2}) (\d{2}:\d{2}:\d{2}).+?(Application Started) We create a search term file as described in part 1 using this more complex regular expression as search term instead of just the simple search string. The result looks like this:

Doing the regular expression with .NET Framework functions the search will return the whole match of our regular expression as <…