The programs you wrote for Problem Set V, even though they utilized dynamic memory allocation, had one major shortcoming; you had to establish an array in which to hold the pointers to the memory allocated. This meant that you had to allow for a maximum number of pointers for the structures that were to be used.
There are many occasions in programming when we as programmers have no idea how much data our programs will be asked to handle or how much RAM the machine our program is running on might have (or how much is still available to us because other programs are running). Linked lists of data allow us to use as little or as much memory as necessary (as long as it is available).
Before attempting this problem, read the Dynamic Memory Allocation page. Then save, compile, and trace through the sample program, making sure you understand how the pointers are used.
For the program below, you are to use data which you find on the Internet. The file must be at least 100K in length. A good place to look is Project Gutenberg, which is where I obtained Alice's Adventures in Wonderland.
Using the following global struct declaration
struct word_s {
int origpos,
char *word,
int freq,
struct word_s *nx; };
and these variable declarations in main()
struct word_s *head_word_ptr = NULL; struct word_s *current_word_ptr = NULL;
you are to write a program which reads in the data from a disk file, stores it in a linked list, and provides at least the following sort of statistical information about the file.
The file being processed is alice30.txt. There are 155741 characters. There are 27346 total words. There are 2704 unique words. The greatest frequency for any word is 1634. The ratio to be used for the bar graph is 0.029.
Your program will then utilize a loop to permit the user to select the method of displaying data, providing a menu such as the following.
How would you like the data displayed? 1: Order of first occurrence 2: Alphabetical 3: Descending by frequency Q: Quit
The structure of your program should be as follows:
main
|initialize
|introduction
|get_filename
|process_file
|word_already
|largest_freq
|bar_ratio
|print_stats
|menu
|print_orig_order
|print_table
|print_alpha_order
|sort_alpha
|print_table
|print_num_order
|sort_numeric
|print_table
The outputs will take the following forms.
This is a frequency chart for the words in alice30.txt, listed in order of first occurrence. 1 ALICE : 3 2 S :****** 202 3 ADVENTURES : 1 4 IN : 2 5 WONDERLAND : 1 6 Lewis : 1 7 Carroll : 1 8 THE : 9 9 MILLENNIUM : 1 10 FULCRUM : 1 11 EDITION : 1 12 3 : 1 13 0 : 1 14 CHAPTER : 12 15 I :**************** 543 16 Down :*** 102 17 The :*********************************************** 1634 18 Rabbit :* 50 19 Hole : 5 20 Alice :************ 395 21 Was :********** 353 22 Beginning : 14 23 To :********************* 726 24 Get :* 46 25 Very :**** 131 26 Tired : 7 27 Of :*************** 511 28 Sitting : 10 29 By :** 58 30 Her :******* 246 31 Sister : 9 32 On :****** 193 33 Bank : 3 34 And :************************** 869 35 Having : 10 36 Nothing :* 34 37 Do :** 81 38 Once :* 34 39 Or :** 77 40 Twice : 5 41 She :**************** 548 42 Had :***** 177 43 Peeped : 3 44 Into :** 67 45 Book : 11 Do you want to continue? (Y/N)
This is a frequency chart for the words in alice30.txt, listed alphabetically. 1 0 : 1 2 3 : 1 3 A :******************* 632 4 Abide : 1 5 Able : 1 6 About :*** 94 7 Above : 3 8 Absence : 1 9 Absurd : 2 10 Acceptance : 1 11 Accident : 2 12 Accidentally : 1 13 Account : 1 14 Accounting : 1 15 Accounts : 1 16 Accusation : 1 17 Accustomed : 1 18 Ache : 1 19 Across : 5 20 Act : 1 21 Actually : 1 22 Ada : 1 23 Added :* 23 24 Adding : 1 25 Addressed : 2 26 Addressing : 1 27 Adjourn : 1 28 Adoption : 1 29 Advance : 3 30 Advantage : 3 31 Adventures : 6 32 ADVENTURES : 1 33 Advice : 2 34 Advisable : 2 35 Advise : 1 36 Affair : 1 37 Affectionately : 1 38 Afford : 1 39 Afore : 1 40 Afraid : 12 41 After :* 43 42 Afterwards : 2 43 Again :** 83 44 Against : 9 45 Age : 4 Do you want to continue? (Y/N)
This is a frequency chart for the words in alice30.txt, listed in decreasing order of frequency. 1 The :*********************************************** 1634 2 And :************************** 869 3 To :********************* 726 4 A :******************* 632 5 It :***************** 591 6 She :**************** 548 7 I :**************** 543 8 Of :*************** 511 9 Said :************** 460 10 You :************ 396 11 Alice :************ 395 12 In :*********** 367 13 Was :********** 353 14 That :********* 302 15 As :******** 263 16 Her :******* 246 17 T :****** 218 18 At :****** 211 19 S :****** 202 20 On :****** 193 21 With :***** 179 22 All :***** 178 23 Had :***** 177 24 But :***** 170 25 For :**** 153 26 So :**** 151 27 They :**** 150 28 Be :**** 147 29 Not :**** 138 30 What :**** 135 31 Very :**** 131 32 This :**** 130 33 Little :**** 126 34 He :**** 121 35 Out :*** 116 36 Down :*** 102 37 One :*** 100 38 Is :*** 100 39 Up :*** 100 40 There :*** 98 41 His :*** 95 42 About :*** 94 43 If :*** 94 44 Then :*** 93 45 No :*** 89 Do you want to continue? (Y/N)
Computers are very useful for sifting through vast amounts of information. However, it still takes the human touch to interpret what has been gleaned from the data.
Using declarations similar to those above, cannibalize your source code for Program I and add the following features:
Compose a report about the similarities/differences between the two files. This 1-2 page submission should utilize statistics and graphs which your program generates.