Back to DFS's Workshop Page
Back to Agenda Page
When researchers of literature wished to study the style of an author or grammarians wanted to investigate current or past usage, they have traditionally turned to books called concordances. A concordance is an alphabetical listing of all the words in a text. For each occurrence, the neighboring context is given. A concordance, being a book, is a static presentation of data. There are various limitations which result from the original design and production considerations.
Some concordances are sentence-based. This means that even if the word being cited is the first word in the sentence, no words from the previous sentence will be provided. Can you think of anything that is missed using this method?
You are to provide access to an English-language book in a similar fashion. You can download the data for free from Project Gutenberg. For this problem, you should use Alice's Adventures in Wonderland by Lewis Carroll (1832-1898). The book is available in zipped format: alice30h.zip. However, the text is already unzipped here as alice30h.html.
Have a look at the beseda text corpus (a body of word data) at the Institute of Slovenian Language. Type in the word visokost to see what is found. You could use this web site as a model if you were to have enough time.
However, for this workshop, you will create a "quick and dirty" concordance generator. The text is broken into lines. Your web page programming will have the following characteristics:
| Alice | started to her feet, for it flashed across her mind that | |
| cats eat bats, I wonder?' And here | Alice | began to get rather |
| It was all very well to say 'Drink me,' but the wise little | Alice |
The first line presents an occurrence where "Alice" is the first word on the line; the second has an occurrence of "Alice" in the middle; and the third has it in line-final position. The item searched for is highlighted in red.
To make your programming task easier, you should
Your endproduct should perform somewhat as does my Quick & Dirty Concordance.
How is a web-based concordance program better than a concordance in a book?
Is the book concordance still useful? Does it fulfill a need for a researcher better than a computer program?
Do one or more of the following: