Back to DFS's Pascal Page
Character Frequency Graph Problem II
To analyze language, you need a reasonably large sample. In the first Character Frequency Graph Problem, you hard set the text in an array. To prevent the entering of the data from becoming overly onerous, we used a very short text, O Canada. In this follow-up problem, you will read the text from a file, one line at a time.
Since the program is handling the text one line at a time, you can process very large texts.
In this problem, you will tally the frequencies of the letters in Lewis Carroll's Alice's Adventures in Wonderland. The file, which is downloadable using this link, was originally downloaded from Project Gutenberg, which is a repository of over 20,000 free books. The file is 153KB long.

Use the following pseudocode to guide your program development.
- Initialize frequency array
- Introduce program
- Get file name from user
- Process file
- Find largest frequency
- Determine ratio to be used for bar graph
- Print stats
- Print graph
Notes
- Since you will only use the text once, there is no need to use an array for the text, but rather just a simple string variable.
- To tally the frequencies of the various letters, set up a 26-element array using capital letters as the subscripts.
- Use the graphing technique that was developed when analyzing O Canada.
- While you will have a lot more text which should give us a better indication
of the actual frequency distribution of letters, we have exposed a difficuly with
the program you have written to deal with O Canada. We can no longer use an asterisk for each
occurrence of a letter, because the graph will not fit on the screen. To handle longer texts,
you will have to calculate (actually, have your program do it) a scaling factor which
is appropriate for the text being analyzed.
Compare these frequencies with the points on the tiles used in the game Scrabble.
- What surprises do you find?
- Why do you think there are discrepancies?
Compare the two graphs you have generated.
What surprises do you find?
© DFStermole 2008
Created 25 Mar 08
Modified 30 Mar 08