Sometimes we come across situations where we need to count numbers of words, characters etc. from some content. Lets say we are creating a Rails app like ghostbloggers in which the user buy/sell their content, there users are charged based on the number of words written in the article. This can easily be achieved by the words counted gem which gives you the count of the number of words, characters, longest word etc. from the content passed as an input.

If we want to use it in our Rails app then we just need to add it to our gemfile.

gem 'words_counted'

After that install the bundle.

$ bundle install

Then add the following code to your controller.

Here the @sentence instance variable is the one which stores the input, you can add the desired content which you want to analyze in the @sentence variable.

Now when you are analyzing the most frequent word you want the words such as ‘is’, ‘not’, ‘how’ etc. to be excluded from the analysis process. The functionality of excluding these type of irrelevant words is also provided by this gem. You can just pass the list separated with spaces with excludekeyword. Or you can also pass regular expression,lamda etc. which you want to exclude. For more exclusion details you can refer this link https://github.com/abitdodgy/words_counted#excluding-tokens-from-the-tokeniser

@counter = WordsCounted.count(@sentence, exclude: ApplicationHelper::STOP_WORDS)

In this example I have added all the stop words in the constant STOP_WORDS in the ApplicationHelper.

So now we have two instance variables @counter_with_stop_words which contains data with stop words and @counter which contains data without stop words.

Now, by using these instance variables we can find various things such as Total word count, Most frequent word, Longest word etc. We can also fine the keyword density of words in the content. Just add the following code to view file.

Let us assume we need to publish content at price of $50 per 100 words, now as we know the number of words(i.e. 233 in this example) we can easily calculate the price of the publishing this by

@price = (@counter_with_stop_words.token_count.to_f / 100).ceil * 50

which evaluates to $150. Below is the snapshot of the output.

content word count

Thanks for reading!

Click here for more details…

At BoTree Technologies, we build enterprise applications with our RoR team of 25+ engineers.

We also specialize in Python, RPA, AI, Django, JavaScript and ReactJS.

Consulting is free – let us help you grow!


References:

abitdodgy/words_counted
words_counted – A Ruby natural language processor.github.com