Its been a while since I blogged about coding! Today, with the help and suggestions of Michael Earls, I came up with a little tidbit that many might find useful.
Parsing a string for a set of words to be omitted. This could be for language or for optimizing a sentence for searchin! We'll go the language route and omit names I am frequently addressed as.
Sounds like the work of Regular expressions, doesn't it?
This will be implemented in an ASP.net application context, so adjust if necessary.
First we need a list of words. These could come from a table, a file, etc. We'll get them from web.config.
<add key="RestrictedWordsList" value="jerk,fool,punk,dumbass,loser" />
Now, let's get this out of web.config, split it on the comma into an array, then format it into a regular expression (regualar expression rules are outside the scope of this article)
string[] sWords = ConfigurationSettings.AppSettings["RestrictedWordsList"].Split(new char[]{','}); string sRegExpression = string.Format("({0})",string.Join(")|(",sWords));
Create a regular expression object, passing the flags ignoreCase and compiled, and store in cache.
Regex regex = new Regex(sRegExpression,RegexOptions.Compiled | RegexOptions.IgnoreCase);
Application["RestrictedWordsList"] = regex;
It is not clear whether Context.Cache costs more overhead than Application state. It seems intuitive that change notification must require at least a slight amount of additional overhead. Our value comes from web.config. Changes to web.config restart the Application, so we'll use Application state rather than Cache.Insert() today.
Now that we have our RegEx object for all to share... simply return the result of replacement.
return regex.Replace(textToParse, string.Empty);
here is the whole thing:
private string ParseStringForRestricctedWords(string textToParse)
{
if(textToParse.Length<1) return string.Empty;
if(ConfigurationSettings.AppSettings["RestrictedWordsList"]==null) return textToParse;
Regex regex = null;
string sCacheKey = "VZ.NIWCMM";//names I wouldn’t call my mother
//get superflous words from web.config and store in cache as compiled regex
// used application rather than cache because web.config change restarts application.
// no change notification needed
if(Application[sCacheKey]==null)
{
//split the delimited words into a string array, then format them into a regular expression
string[] sWords = ConfigurationSettings.AppSettings["RestrictedWordsList"].Split(new char[]{','});
string sRegExpression = string.Format("({0})",string.Join(")|(",sWords));
regex = new Regex(sRegExpression,RegexOptions.Compiled | RegexOptions.IgnoreCase);
Application[sCacheKey] = regex;
}
else
{
regex = (Regex)Application[sCacheKey];
}
//replace the list with empty string
return regex.Replace(textToParse, string.Empty);
}
note: If you care about spaces, you'll have to follow with a replacement of multiple spaces as any word replacements leave the trailing space from the word that was removed. As there are many ways to skin a cat, there are many ways to accomplish this. How would you have done it?