Back to DFS's Pascal Page


Analyzing a Large Text II

Sorting Parallel Arrays

Your previous program introduced you to text processing. You are now ready to attempt using the computer to help you actually analyze text files.

As you discovered, printing out a frequency chart of all the words in a book does not present the data in a manner conducive to analysis. The words and their frequencies of occurrence were saved in a pair of arrays and printed in the order in which the words were first encountered in the text. While a great deal of work was saved by having the computer find the words and do the counting, we are confronted with a tedious and time-wasting task if we want to use the data.

Consider these questions:

  1. How many times does a particular word, e.g., rabbit, appear in Alice's Adventures in Wonderland?
  2. What word appears most frequently?
  3. What are the ten most frequently used words?
  4. How many words are used only once?
  5. Can we tell anything about the subject matter of the text by investigating words with a certain range of frequencies?

None of these questions is quickly answered given the way our printout of the data is organized. However, the first four questions can easily be answered if we sort the data, i.e., put it in alphabetical or numerical order. The last question requires analysis and would probably entail comparing information gleaned from using your program on a number of different texts.

This then is your task. Give the user two further choices in the menu:

How would you like the data displayed?
1: Order of first occurrence
2: Alphabetical
3: Descending by frequency
Q: Quit

You are to write the addition procedures required by the program structure chart in order to print out the alphabetical and numerical frequency charts. The subprograms marked with (f) are to be functions while the others are to be procedures.

  main
      |Initialize
      |Introduction
      |GetFilename
      |ProcessFile
                  |f WordAlready : integer
      |f LargestFreq : integer
      |f BarRatio : real
      |PrintStats
      |Menu
           |PrintOrigOrder
                          |PrintTable
           |PrintAlphaOrder
                          |SortAlpha
                          |PrintTable
           |PrintNumOrder
                          |SortNumeric
                          |PrintTable

Two of the new procedures demand that sort routines be written. Before you attempt these programming tasks you should read the Sorting page and investigate the Traces of Sorting Algorithms. Choose two different algorithms for the sort routines. Your tasks are complicated somewhat by the fact that you are using parallel arrays. You will do the usual comparisons, but the swaps will have to be done on both arrays instead of just one as occurs in the examples.

Your output from separate runs of your program should produce the following output.

Alphabetical

This is a frequency chart for the words in alice30.txt,
listed alphabetically.

   1 0                   :                                                   1
   2 3                   :                                                   1
   3 A                   :*******************                              632
   4 Abide               :                                                   1
   5 Able                :                                                   1
   6 About               :***                                               94
   7 Above               :                                                   3
   8 Absence             :                                                   1
   9 Absurd              :                                                   2
  10 Acceptance          :                                                   1
  11 Accident            :                                                   2
  12 Accidentally        :                                                   1
  13 Account             :                                                   1
  14 Accounting          :                                                   1
  15 Accounts            :                                                   1
  16 Accusation          :                                                   1
  17 Accustomed          :                                                   1
  18 Ache                :                                                   1
  19 Across              :                                                   5
  20 Act                 :                                                   1
  21 Actually            :                                                   1
  22 Ada                 :                                                   1
  23 Added               :*                                                 23
  24 Adding              :                                                   1
  25 Addressed           :                                                   2
  26 Addressing          :                                                   1
  27 Adjourn             :                                                   1
  28 Adoption            :                                                   1
  29 Advance             :                                                   3
  30 Advantage           :                                                   3
  31 Adventures          :                                                   6
  32 ADVENTURES          :                                                   1
  33 Advice              :                                                   2
  34 Advisable           :                                                   2
  35 Advise              :                                                   1
  36 Affair              :                                                   1
  37 Affectionately      :                                                   1
  38 Afford              :                                                   1
  39 Afore               :                                                   1
  40 Afraid              :                                                  12
  41 After               :*                                                 43
  42 Afterwards          :                                                   2
  43 Again               :**                                                83
  44 Against             :                                                   9
  45 Age                 :                                                   4
Do you want to continue? (Y/N)

Descending by frequency

This is a frequency chart for the words in alice30.txt,
listed in decreasing order of frequency.

   1 The                 :*********************************************** 1634
   2 And                 :**************************                       869
   3 To                  :*********************                            726
   4 A                   :*******************                              632
   5 It                  :*****************                                591
   6 She                 :****************                                 548
   7 I                   :****************                                 543
   8 Of                  :***************                                  511
   9 Said                :**************                                   460
  10 You                 :************                                     396
  11 Alice               :************                                     395
  12 In                  :***********                                      367
  13 Was                 :**********                                       353
  14 That                :*********                                        302
  15 As                  :********                                         263
  16 Her                 :*******                                          246
  17 T                   :******                                           218
  18 At                  :******                                           211
  19 S                   :******                                           202
  20 On                  :******                                           193
  21 With                :*****                                            179
  22 All                 :*****                                            178
  23 Had                 :*****                                            177
  24 But                 :*****                                            170
  25 For                 :****                                             153
  26 So                  :****                                             151
  27 They                :****                                             150
  28 Be                  :****                                             147
  29 Not                 :****                                             138
  30 What                :****                                             135
  31 Very                :****                                             131
  32 This                :****                                             130
  33 Little              :****                                             126
  34 He                  :****                                             121
  35 Out                 :***                                              116
  36 Down                :***                                              102
  37 One                 :***                                              100
  38 Is                  :***                                              100
  39 Up                  :***                                              100
  40 There               :***                                               98
  41 His                 :***                                               95
  42 About               :***                                               94
  43 If                  :***                                               94
  44 Then                :***                                               93
  45 No                  :***                                               89
Do you want to continue? (Y/N)

It is not necessary to have a loop in the menu procedure. In fact, once you have allowed the user to request a sort, it is not possible to return to the listing in order of first occurrence. This will be handled in Sorting an Array of Records.


© DFStermole 2002-2005
Created 15 Mar 02
Modified 7 July 05