Update: Part 2 of this article can be found here.
Using spreadsheets to analyze numerical or well-categorized data is relatively straightforward. It might not be easy necessarily, but at least you normally know exactly what to do. If you have ever been faced with open-ended text responses, perhaps from a survey, emailed questions or feedback forms, you know how tricky it can be to make sense of it.
The problems are many. Non-standard formatting, having to manually read each response to understand its content, variable length, and those are just the first that come to mind.
What we need is some way to drill down automatically to see if there are any common patterns, and therefore have an immediate starting point to start interpreting the responses.
A Potential Solution
Say we had a spreadsheet containing all the website feedback from the last few weeks.
What if we could count occurrences of certain words or phrases? We could then see if there were issues most prevalent on the minds of our customers.
First of all we need to make a list of all the words contained in the feedback. We can do this using a hash table or “Dictionary”. This is like an array, but instead of the items in the array being held against a number they instead are indexed using a keyword. Using this system we can store a word, such as the word “Data”, and then count how many times this word occurs.
By “Splitting” the text in the cells wherever a space is found we can crudely grab all the words used. Obviously this is not a pure approach; there will be badly formatted responses or other delimiters that would need to be cleaned. For our purposes though it will work well enough.
As you can see, while this approach is encouraging in that it worked as intended, the results are not yet useful. We need to exclude superfluous words and get to the “meat”. The traditional approach to this kind of indexing issue is to compile a “stop list” of words and phrases that can be safely ignored.
Armed with this list we ought to be able to now generate a clean set of data from which we can begin to draw conclusions.
So we have learned that for a full solution we need to create a macro that goes over the response text in our first data sheet and it should create a list of words, taking care to ignore words we determine to be irrelevant, listed in a “Stop List” sheet. Then for each word our macro needs to count how many times each word appears. This word list and associated word count should appear in another sheet we will call the “Issue List”. That list should be sorted in descending order so we can clearly see which issues are most important to our customers.
Here is what our data set looked like when we ran the macro against a set of feedback. We had to run it a couple of times to build up our specific stop list. I imagine this will be a common theme when you try it yourself.
Probably not surprising that in this particular result we see Data and Excel score highly, but I am intrigued by the issues around the Security responses.
The Finished Word Counting Macro
Here is the completed macro so you can use it in your own projects.
Sub WordCount() Dim vArray As Variant Dim lngLoop, lngLastRow As Long Dim rngCell, rngStoplist As Range With CreateObject("Scripting.Dictionary") 'declaring object array, Items can be any form of data, and are stored in this object array. Worksheets(1).Activate For Each rngCell In Worksheets("Sheet1").Range("A1", Cells(Rows.Count, "A").End(xlUp)) vArray = Split(rngCell.Value, " ") For lngLoop = LBound(vArray) To UBound(vArray) If Application.WorksheetFunction.CountIf(Sheets("Stoplist").Range("A1:A" & Sheets("Stoplist").UsedRange.Rows.Count), vArray(lngLoop)) > 0 Then Else If Not .exists(vArray(lngLoop)) Then 'if the word is not in the array then its true. .Add vArray(lngLoop), 1 'adds the word to object array. Else .Item(vArray(lngLoop)) = .Item(vArray(lngLoop)) + 1 'if the item or the word already exist in the array it will just increase the keys which is the number of word count. End If End If Next lngLoop Next rngCell Worksheets("Issue").Range("A2").Resize(.Count).Value = Application.Transpose(.keys) 'this line shows the retrieval of the word count stored in the array. Worksheets("Issue").Range("B2").Resize(.Count).Value = Application.Transpose(.items) 'this line shows the retrieval of the word added in the array. End With lngLastRow = Worksheets("Issue").Cells(Rows.Count, 1).End(xlUp).Row Range("A1:B" & lngLastRow & "").Select ActiveWorkbook.Worksheets("Issue").Sort.SortFields.Clear ActiveWorkbook.Worksheets("Issue").Sort.SortFields.Add Key:=Range("B2:B" & lngLastRow & ""), _ SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal With ActiveWorkbook.Worksheets("Issue").Sort .SetRange Range("A1:B" & lngLastRow & "") .Header = xlYes .MatchCase = False .Orientation = xlTopToBottom .SortMethod = xlPinYin .Apply End With Worksheets("Issue").Activate Cells(2, 1).Select End Sub
By no means am I saying this is a silver-bullet solution. You still must read the responses and deal with them appropriately. What you gain by doing this is an overview of the kinds of issues your customers are talking about, and perhaps a way to “score” your feedback.
Perhaps you can spot trends over time, or overlay “sentiment” by analyzing the balance of positive and negative words, and correlations between them? I wonder if it is possible to turn this data into one of those pretty keyword charts used in Web2.0 sites?
How might you use an approach like this in your work? We would love to know your ideas…
About the author
Yoav Ezer co-authors the technology and productivity blog Codswallop. He is also the CEO of a company that produces PDF to Excel conversion software.
Update: Part 2 of this article can be found here.