Lab 1 - Bash
Due Sunday, April 2, at 11:00pm
This first lab is relatively quick, and should get you up to speed working with the command line, basic shell commands, an editor, and a small shell program.
Materials
You will need these materials for this assignment.
- Alice in Wonderland, which you will find as plaintext in file
alice-gutenberg.txton canvas.Credit: Project Gutenberg
- Shakespeare’s sonnets, which you will find as plaintext files,
are available in the tar file
sonnets.taron canvas. This file unpacks into a subdirectorysonnets.Credit: Project Gutenberg
Curious how the sonnets directory from Gutenberg’s sonnets.html file?
Look at the bottom of this page.
Preparation
Set up for your work in this course, if you have not already:
$ cd
$ mkdir -p cs50/labs/lab1
$ chmod go-rwx cs50
$ cd cs50/labs/lab1
These commands create a directory ~/cs50/labs/lab1, prevent others
from peeking at your work, and change the working directory to lab1
so you’re ready for the work below.
Assignment
-
Write a single bash command/pipeline that will extract the text of Alice in Wonderland into a file
alice.txtin yourlab1directory. (Note that the provided file has header and footer material added by Project Gutenberg. We want only the stuff Lewis Carroll wrote.) -
Write a single bash command/pipeline that will read
alice.txtand print a crude table of contents. -
Write a single bash command/pipeline that will read
alice.txtand print the words, in order, exactly one word per line, intoalicewords.txt. (A word is a sequence of letters.) -
Write a single bash command/pipeline that will read
alicewords.txtand print the number of times the word “Alice” appears, regardless of how it is capitalized. -
Write a single bash command/pipeline that will read
alicewords.txtand print the number of times the word “wonder” appears, regardless of how it is capitalized; be careful not to count other words! -
Write a single bash command/pipeline that will read
alicewords.txtand print the top-10 most-common words, regardless of capitalization. - Write a single bash command/pipeline that will read
alicewords.txtand print the number of unique words, regardless of capitalization, that are not stop words. Use the sorted list of stop words instopwords.txt(available on canvas).We’re going to keep things simple, here; the words “account”, “accounts”, and “accounting” are all unique words for our purposes, though a fancier solution would stem them all to “account”. Given our definition of word, above, the word “account’s” would appear in
alicewords.txtas “account” and “s”, and the latter will be stripped out as a stopword.You may find
commuseful here. - Write a bash script called
shake.shthat allows the user to search for a word in all of Shakespeare’s sonnets.- For each matching sonnet, the script prints one line: the sonnet number, a colon, a space, the first line of the sonnet, then elipses.
- If the user provides too few or too many arguments, it should print “usage: shake.sh searchword” and exit with status 1.
- If the script cannot find the
sonnetsdirectory in the expected location, it should print “cannot find sonnets directory” and exit with status 2. - Write three test cases; write each test’s command in a separate single-line file
shaketest#. Run each test and save its output in a separate file, i.e.,bash shaketest1 > shaketest1.out.
For example,
$ ./shake.sh spring CII: My love is strengthen'd, though more weak in seeming;... CIV: To me, fair friend, you never can be old,... I: From fairest creatures we desire increase,... LIII: What is your substance, whereof are you made,... LXIII: Against my love shall be as I am now,... XCVIII: From you have I been absent in the spring,... $ ./shake.sh computer $How many hits will you get if you shake
love? :-)
What to hand in, and how
For the command/pipeline questions, write all your answers in a single file commands.txt.
Include each command and its output.
When finished, you should have the following files:
$ ls -1
alice.txt
alicewords.txt
commands.txt
shake.sh
shaketest1
shaketest1.out
shaketest2
shaketest2.out
shaketest3
shaketest3.out
Bursting an html file into many sonnets
This section is brought to you by curiosity.
How was the files in the sonnets directory produced from Gutenberg’s
sonnets.html file? A little sed and a little awk:
mkdir sonnets
sed \
-e '1,/^\*\*\* START/d' \
-e '/^End of Project Gutenberg/,$d' \
-e 's/<br \/>//' \
-e 's/ / /g' \
-e '/<\/p>/d' \
-e '/<\/h3>/d' \
-e '/class="poem"/d' \
-e '/<pre>/d' \
-e '/^$/d' sonnets.html \
| awk '\
/<h3>/ {printing=0; header=1; next;}\
header {sonnet="sonnets/" $1; print sonnet; printing=1; header=0; next;}\
printing {print > sonnet}'
You don’t need to learn awk and you will not need it for this
assignment. But it’s a interesting language!