These are my notes about Chapter 4 from the book Beginning Ruby: From Novice to Professional.A book highly recommended for dummies. I found it via A Path to Learn Rails 4 properly.
Description
This code will read in text supplied in a separate file, analyze it for various patterns and statistics, and print out the results for the user.
Required Basic Features
- Character count
- Character count (excluding spaces)
- Line count
- Word count
- Sentence count
- Paragraph count
- Average number of words per sentence
- Average number of sentences per paragraph
Building the Basic Application
Let’s outline the basic steps as follows:
- Obtain some dummy text
- Load in a file containing the text or document you want to analyze.
- As you load the file line by line, keep a count of how many lines there were.
- Put the text into a string and measure its length to get your character count.
- Temporarily remove all whitespace and measure the length of the resulting string to get the character count excluding spaces.
- Split out all the whitespace to find out how many words there are.
- Split on full stops to find out how many sentences there are.
- Split on double newlines to find out how many paragraphs there are.
- Perform calculations to work out the averages. Create a new, blank Ruby source file and save it as analyzer.rb in your Ruby folder. As you work through the next few sections, you’ll be able to fill it out.
1. Obtain some dummy text
The dummy file must be within the same folder where you will save example1.rb, and call it text.txt
2. Load in a file containing the text
The parameters are taken from ARGV[0] or ARGV.first (which both mean exactly the same thing the first element of the ARGV array).
lines = File.readlines(ARGV[0])
To process text.txt now, you will run it:
ruby analyzer.rb text.txt
3. count how many lines are within the file.
line_count = lines.size
4. Put the text into a string and measure its length.
The join method can be used to join the Array back into a single String
text = lines.join character_count = text.length
5. Temporarily remove all whitespace and measure the length
The gsub method String.gsub(RegExpression,substring), replaces into “String” the parts of it that meet the regular expression and replaces it with “substring”.
character_count_nospaces = text.gsub(/\s+/, '').length
6. Find out how many words there are.
The split method to split a string based on a single character or static sequence of characters
word_count = text.split.length
7. Split on full stops to find out how many sentences there are.
sentence_count = text.split(/\.|\?|!/).length
8. Split on double newlines to find out how many paragraphs there are
paragraph_count = text.split(/\n\n/).length
9. Perform calculations to work out the averages.
all_words = text.scan(/\w+/) good_words = all_words.select{ |word| !stopwords.include?(word) } good_percentage = ((good_words.length.to_f / all_words.length.to_f) * 100).to_i
Code
# analyzer.rb -- Text Analyzer stopwords = %w{the a by on for of are with just but and to the my I has some in} lines = File.readlines(ARGV[0]) line_count = lines.size text = lines.join # Count the characters character_count = text.length character_count_nospaces = text.gsub(/\s+/, '').length # Count the words, sentences, and paragraphs word_count = text.split.length sentence_count = text.split(/\.|\?|!/).length paragraph_count = text.split(/\n\n/).length # Make a list of words in the text that aren't stop words, # count them, and work out the percentage of non-stop words # against all words all_words = text.scan(/\w+/) good_words = all_words.select{ |word| !stopwords.include?(word) } good_percentage = ((good_words.length.to_f / all_words.length.to_f) * 100).to_i # Summarize the text by cherry picking some choice sentences sentences = text.gsub(/\s+/, ' ').strip.split(/\.|\?|!/) sentences_sorted = sentences.sort_by { |sentence| sentence.length } one_third = sentences_sorted.length / 3 ideal_sentences = sentences_sorted.slice(one_third, one_third + 1) ideal_sentences = ideal_sentences.select { |sentence| sentence =~ /is|are/ } # Give the analysis back to the user puts "#{line_count} lines" puts "#{character_count} characters" puts "#{character_count_nospaces} characters (excluding spaces)" puts "#{word_count} words" puts "#{sentence_count} sentences" puts "#{paragraph_count} paragraphs" puts "#{sentence_count / paragraph_count} sentences per paragraph (average)" puts "#{word_count / sentence_count} words per sentence (average)" puts "#{good_percentage}% of words are non-fluff words" puts "Summary:\n\n" + ideal_sentences.join(". ") puts "-- End of analysis"