Lab 2 - C and Bash
This second lab is focused on three small C programs, and a regression-testing bash script. They start easy and get progressively more difficult.
Reminder
Grading will focus on CS50 coding style - including consistent formatting, selection of identifier names, and use of meaningful comments - in addition to correctness and testing.
Your C code must compile without producing any compiler warnings. You will lose points if the compiler produces warnings when using our CS50-standard compiler flags.
If your submitted code fails to compile, or triggers a segmentation fault, we’ll notify you and give you an opportunity to repair and resubmit. (See programs that crash.)
Preparation
Set up four directories for this lab:
$ cd
$ mkdir -p cs50/labs/lab2
$ cd cs50/labs/lab2
$ mkdir regress chill histo words
Assignment
Please follow the CS50 coding style.
Design, write, document, and fully test the following three separate C programs and one bash script.
Point breakdown:
- (25 points) regress.sh
- (25 points) chill.c
- (25 points) words.c
- (25 points) histo.c
regress.sh
Regression testing is important to any quality software-development process. As a software project evolves, each new revision is tested against a thorough suite of test cases to ensure that the new revision still performs correctly where it had before. As new functionality is added, new tests are added to the suite.
Write a bash script regress.sh
to perform regression testing. Its
command line looks like
./regress.sh dirname testfilename...
- where
dirname
is the name of a directory containing test results, and - where each
testfilename
is the name of a file containing bash command(s) for one test case.
The script verifies the validity of its arguments (exit with non-zero status on any error):
- there must be at least two arguments;
- if something by the name
dirname
exists, it must be a directory; - each
testfilename
must be a regular file and be readable.
After checking its arguments, the script creates new directory whose name has the form YYYYMMDD.HHMMSS, representing the current date and time, in the current directory.
(For example, 20170402.143702
.) If any error, exit with non-zero status.
The script then runs each test case with bash, redirecting stdin from /dev/null
, producing four files for each case:
YYYYMMDD.HHMMSS/testfilename.test
- a copy oftestfilename
YYYYMMDD.HHMMSS/testfilename.status
- the exit status ofbash testfilename
YYYYMMDD.HHMMSS/testfilename.stdout
- the stdout frombash testfilename
YYYYMMDD.HHMMSS/testfilename.stderr
- the stderr frombash testfilename
If the directory dirname
does not exist, YYYYMMDD.HHMMSS is renamed dirname
.
Exit 0 if success, non-zero if any error.
If the directory dirname
already exists, YYYYMMDD.HHMMSS is compared with dirname
to provide a brief listing of any differences - or the simple statement “no differences”.
Exit 0 if no differences, non-zero if differences.
In typical usage, the first time the developer runs regress.sh
, the script creates a directory by name dirname
; in subsequent runs, it compares the new test results with those from the prior run.
Over time, directories YYYYMMDD.HHMMSS accumulate, providing a historical record of test results.
exit
Non-zero if any error, or any differences from the earlier dirname
directory.
Zero if dirname
created successfully, or there were no differences from an existing dirname
.
Example
Suppose we used regress.sh
to support development of the shake.sh
solution.
The directory contains the script and four test files, each with a one-line command.
(Here regress.sh
is elsewhere in the PATH.) We start out by listing the four test cases, then run regress.sh
twice, then add a test case, then change a test case.
Finally, we test some erroneous cases.
[lab2]$ ls
shake.sh* test0 test1 test2 test3
[lab2]$ cat test?
cat shake.sh
./shake.sh
./shake.sh love
./shake.sh computer
[lab2]$ regress.sh base test?
20170402.143657
saved test results in base
[lab2]$ regress.sh base test?
20170402.143702
comparing 20170402.143702 with base...
no differences.
[lab2]$ echo ./shake.sh two words > test4
[lab2]$ regress.sh base test?
20170402.143721
comparing 20170402.143721 with base...
Only in 20170402.143721: test4.status
Only in 20170402.143721: test4.stderr
Only in 20170402.143721: test4.stdout
Only in 20170402.143721: test4.test
[lab2]$ echo ./shake.sh rose > test2
[lab2]$ cat test?
cat shake.sh
./shake.sh
./shake.sh rose
./shake.sh computer
./shake.sh two words
[lab2]$ regress.sh base test?
20170402.143806
comparing 20170402.143806 with base...
Files base/test2.stdout and 20170402.143806/test2.stdout differ
Files base/test2.test and 20170402.143806/test2.test differ
Only in 20170402.143806: test4.status
Only in 20170402.143806: test4.stderr
Only in 20170402.143806: test4.stdout
Only in 20170402.143806: test4.test
[lab2]$ ls
20170402.143702/ shake.sh* test3
20170402.143721/ test0 test4
20170402.143806/ test1
base/ test2
[lab2]$
######## now some error cases
[lab2]$ regress.sh
usage: regress.sh dirname testfilename...
[lab2]$ regress.sh base
usage: regress.sh dirname testfilename...
[lab2]$ regress.sh test?
first argument ('test0') is not a directory
[lab2]$ regress.sh /base test?
20170402.144605
mv: rename 20170402.144605 to /base: Permission denied
failed to save test results in /base; they remain in 20170402.144605
[lab2]$ regress.sh base testing
test case 'testing' is not a file (or not readable)
[lab2]$ regress.sh base base
test case 'base' is not a file (or not readable)
[lab2]$ chmod -r test?
[lab2]$ regress.sh base test?
test case 'test0' is not a file (or not readable)
[lab2]$
Note above how we use the bash globbing syntax ?
to indicate a wildcard that matches any single character; thus, test?
expands to
test0 test1 test2 test3 test4
Note how test0
simply prints the current copy of shake.sh
, which adds nicely to the historic record.
The name of test files is not important to
regress.sh
, but a development team may want to agree on a naming convention. For example, suppose you chose to name them all with extension.test
. If you had saved the first run ofregress.sh
in a directory namedbase
, you could then run future tests as./regress.sh base *.test
Just to be clear, each testfile contains bash command(s), and your
regress.sh
script should execute those commands by running bash and
providing testfilename
as an argument. (Recall how we did the test
cases for shake.sh
.) But you should run each test only once
within any given run of regress.sh
– not only is that more
efficient, it’s possible that the commands within some test files
might actually not be amenable to being run multiple times (e.g., if
they have side effects like creating or removing files).
It’s easily possible to redirect the stdin, stdout, and stderr, all in one run of a test - and to catch the exit status of that run.
Assumptions
Please expect that dirname names a directory that could be anywhere - not necessarily a subdirectory of the current working directory. (This assumption should not complicate your work.)
Please assume that the testfilenames name files that are in the current directory. (In principle, we should support testfiles from anywhere, but it’s a bit trickier; this assumption simplifies your work. Your script need not validate the assumption - just assume it.)
Please do not assume that the script itself, regress.sh, is in the current working directory. (This assumption should not complicate your work.)
Hints
Check out the
date
command and its+
option.Check out the
diff --brief
command form.Check out the
shift
built-in bash command and this example SHIFTING file.
chill.c
Write a program to calculate “wind chill” based on the current temperature and wind speed. The standard formula for this calculation is:
Wind Chill = 35.74 + 0.6215T - 35.75(V^0.16) + 0.4275T(V^0.16)
where T
is the temperature in degrees Fahrenheit (when less than 50)
and V
is the wind velocity in miles per hour. The ^
character
denotes exponentiation. Note that the above formula is not in C
programming language syntax.
input
No input files; stdin is ignored.
The user may run your program with no arguments, one argument, or two arguments as explained below.
output: no arguments
If the user provides no arguments to your program, it should print out a table of temperatures (from -10 to +40 by 10’s) and and wind speeds (from 5 to 15 by 5’s). Your output should look similar to the following, with nice columns and titles:
$ ./chill
Temp Wind Chill
---- ---- -----
-10 5 -22.3
-10 10 -28.3
-10 15 -32.2
0 5 -10.5
0 10 -15.9
0 15 -19.4
10 5 1.2
10 10 -3.5
10 15 -6.6
20 5 13.0
20 10 8.9
20 15 6.2
30 5 24.7
30 10 21.2
30 15 19.0
40 5 36.5
40 10 33.6
40 15 31.8
output: one argument
If the user provides one argument, it will assumed to be a temperature (expressed as a floating-point number).
If that temperature is less than 50, it is acceptable; chill
then prints a table of wind speeds (from 5 to 15 by 5’s) and the calculated wind chills for that temperature only.
Your program’s output for one argument should look like this:
$ ./chill 32
Temp Wind Chill
----- ---- -----
32 5 27.1
32 10 23.7
32 15 21.6
output: two arguments
If the user provides two arguments, they will be temperature and velocity, respectively (expressed as floating-point numbers). The temperature must be less than 50 and the velocity must be greater than or equal to 0.5.
If the arguments are acceptable, then your program should calculate and print the wind chill for that temperature and velocity only.
Your program’s output for two arguments should look like this:
$ ./chill 5 20
Temp Wind Chill
----- ---- -----
5 20 -15.4
If either argument is out of range, your program should issue a message and exit. Here’s an example:
$ ./chill 55
Temperature must be less than 50 degrees Fahrenheit
$ ./chill -150
Temperature must be greater than -139 Fahrenheit
(coldest recorded earth temperature, see https://en.wikipedia.org/wiki/Lowest_temperature_recorded_on_Earth)
$ ./chill 10 0
Wind velocity must be greater than or equal to 0.5 MPH
In the preceding examples some values were printed as integers and some as decimal fractions. You may print everything in the format “x.y”, if you wish, but do not print more than one decimal place. Indeed, it may be wise to use this format when the user specifies temperature or windspeed, because the user may specify a non-integral value and it may be misleading to print it as an integer.
output: more than two arguments
print a “usage” line and exit with error status.
exit
If the program terminates normally, it exits with a return code of 0
.
Otherwise, it terminates with a documented, non-zero return code.
Compiling
You will likely need the math library.
To use it, add
#include <math.h>
to your chill.c
file, and add
-lm
to your mygcc
command.
(That is “dash ell emm”, which is short for “library math”.)
mygcc chill.c -lm -o chill
words.c
Write a C program reminiscent of one of the pipelines we explored in Lab 1; specifically, a program called words
that breaks its input into a series of words, one per line.
It may take input from stdin, or from files whose names are listed as arguments.
usage
words [filename]...
input
When no filenames are given on the command line, words
reads from stdin.
When one or more filenames are given on the command line, words
reads from each file in sequence.
If the special filename -
is given as one of the filenames, the stdin is read at that point in the sequence.
output
In any case, the stdout should consist of a sequence of lines, with exactly one word on each output line (i.e., each output line contains exactly one word and no other characters). A “word” is a sequence of letters.
Although you may be tempted to think of the input as a sequence of lines, it may be helpful to think of it as a sequence of characters.
Note it is possible for the output to be empty, if there are no words in any of the input files.
Any error messages are written to stderr.
exit
If the program terminates normally, it exits with a return code of 0
.
Otherwise, it terminates with a documented, non-zero return code.
hints
Check out
man ctype
on a mac, orman ctype.h
on a Linux system.Consider a function that processes a file, given a
FILE*
as parameter.Would it be easier to process the file(s) one character at a time?
histo.c
Write a program that reads a series of positive integers from stdin, and prints out a histogram. There should be 16 bins in your histogram. The catch? You don’t know in advance the range of input values; assume the integers range from 0 to some unknown positive maximum. Thus, you will need to dynamically scale the bin size for your histogram. An example is below.
Usage:
There are no command-line arguments.
Requirements:
You must begin with bin size 1, and double it as needed so all positive integers observed on input fit within the histogram.
You must have 16 bins. The number ‘16’ should appear only once in your code.
input:
Input is read from stdin, whether from the keyboard, redirected from a
file, or piped in from another command. You can use a simple
scanf()
for this. Assume the input contains only integers,
separated by white space (space, tab, newline). Assume the smallest
integer is zero; ignore any negative integers. If there is
non-integer non-space content in the file, it is ok for the program to
treat that as the end of input; the program should not crash, or enter
an infinite loop – it should just silently behave as if there are no
more integers. (These assumptions make it easy to use scanf
for
your input.)
As always, any other assumptions you make should be documented in your README
file and your testing procedure should be documented in your TESTING
file.
output:
See examples below.
exit
This program has no arguments and does not check its input for errors, so it should always exit with zero status.
Examples:
Here we compile and run the program, and type a set of numbers (spread
over three lines, but it doesn’t matter as long as we put space or
newline between numbers), ending with ctrl-D
on the beginning of a
line. (That sends EOF to the program.) It then printed a histogram,
nicely labeling each line with the range of values assigned to that
bin, and printing a row of *
to represent the count of values that
fell into that bin.
$ mygcc histo.c -o histo
$ ./histo
16 bins of size 1 for range [0,16)
3 -4 5 1 7 0
8 0 15 12 3 5
3 3 3 3 3
^D
[ 0: 0] **
[ 1: 1] *
[ 2: 2]
[ 3: 3] *******
[ 4: 4]
[ 5: 5] **
[ 6: 6]
[ 7: 7] *
[ 8: 8] *
[ 9: 9]
[10:10]
[11:11]
[12:12] *
[13:13]
[14:14]
[15:15] *
$
The notation [a,b) includes all values x such that a <= x < b, that is, the range includes a but does not include b. For example, [0,4) = {0, 1, 2, 3}.
Now watch what happens if we input a number outside the original range of [0,16).
$ ./histo
16 bins of size 1 for range [0,16)
3 -4 5 1 7 0
8 0 15 12 3 5
18
16 bins of size 2 for range [0,32)
19 20 30 7 12
50
16 bins of size 4 for range [0,64)
34
32
19
44
^D
[ 0: 3] *****
[ 4: 7] ****
[ 8:11] *
[12:15] ***
[16:19] ***
[20:23] *
[24:27]
[28:31] *
[32:35] **
[36:39]
[40:43]
[44:47] *
[48:51] *
[52:55]
[56:59]
[60:63]
$
Each time it sees a number outside the current range, it doubles the range and doubles the size of each bin. (Notice also the [low:high] labels in the histogram; this notation includes both low and high and everything in between.) It might have to repeat the doubling if we put in a number well past the current bin size:
$ ./histo
16 bins of size 1 for range [0,16)
150
16 bins of size 2 for range [0,32)
16 bins of size 4 for range [0,64)
16 bins of size 8 for range [0,128)
16 bins of size 16 for range [0,256)
^D
[ 0: 15]
[ 16: 31]
[ 32: 47]
[ 48: 63]
[ 64: 79]
[ 80: 95]
[ 96:111]
[112:127]
[128:143]
[144:159] *
[160:175]
[176:191]
[192:207]
[208:223]
[224:239]
[240:255]
$
Here’s an example using bash syntax to generate a list of numbers, and
piping the output to histo
:
$ echo {1..16} 150 | ./histo
16 bins of size 1 for range [0,16)
16 bins of size 2 for range [0,32)
16 bins of size 4 for range [0,64)
16 bins of size 8 for range [0,128)
16 bins of size 16 for range [0,256)
[ 0: 15] ***************
[ 16: 31] *
[ 32: 47]
[ 48: 63]
[ 64: 79]
[ 80: 95]
[ 96:111]
[112:127]
[128:143]
[144:159] *
[160:175]
[176:191]
[192:207]
[208:223]
[224:239]
[240:255]
$
Although we scale the bin size (x axis), I’m not asking you to scale the bin count (y axis) for printing, which can lead to some really wide output:
$ echo {1..100} 1500 | ./histo
16 bins of size 1 for range [0,16)
16 bins of size 2 for range [0,32)
16 bins of size 4 for range [0,64)
16 bins of size 8 for range [0,128)
16 bins of size 16 for range [0,256)
16 bins of size 32 for range [0,512)
16 bins of size 64 for range [0,1024)
16 bins of size 128 for range [0,2048)
[ 0: 127] ****************************************************************************************************
[ 128: 255]
[ 256: 383]
[ 384: 511]
[ 512: 639]
[ 640: 767]
[ 768: 895]
[ 896:1023]
[1024:1151]
[1152:1279]
[1280:1407]
[1408:1535] *
[1536:1663]
[1664:1791]
[1792:1919]
[1920:2047]
$
We took some pains to format the [low:high] range indicators for each
row, using a fixed-width field just wide enough to hold the biggest
number. It’s a nice touch (read man printf
for some clues) but it’s
ok if you make a simpler assumption (e.g., always use 6-digit field
width).
Representing a histogram
You will need an array of 16 bins to represent the number of integers observed in each bin. You’ll need to keep track of the bin size and the range of the histogram. If you observe a value outside the range, you should double the bin size and range - but first you need to compress the current 16 bins into the first 8 bins. You’ll likely need one loop to compute the new values for the lower half of the bins (each bin receiving the sum of two bins’ counts), and then another to assign the new value (0) to the upper half of the bins.
(Again: the number ‘16’ may only occur once in your code; scattering hard-coded numbers around your code is bad style.)
Notice that the number of bins, bin size, and histogram range are all powers of 2.
What to hand in, and how
Make sure to compile and test your solutions on one of the machines in M210. If you choose to develop your solutions on some other system, you are responsible to ensure that the work you submit runs correctly on a machine in M210 — which is where where we will test it!
-
In addition to your code, each of the four subdirectories of
lab2
must include two simple text files:a.
README
, which describes how your program should be compiled and executed, along with an explanation of any assumptions you made about the assignment. See the example README file.b.
TESTING
, which describes how you tested the program, including test inputs and test scripts; these test inputs and test scripts should be included as files in the same directory.
See the example TESTING file; in that case we would not need to include testing inputs because they were CS50-provided files, or generated within the testing file itself. (We created this file in MacOS Terminal by choosing “Export Text As…” from the Shell menu when we had finished all the testing commands, we edited it with emacs to add the comments.
When finished, you should have the following files:
lab2
├── chill
│ ├── README
│ ├── TESTING
│ └── chill.c
├── histo
│ ├── README
│ ├── TESTING
│ ├── histo.c
├── regress
│ ├── README
│ ├── TESTING
│ └── regress.sh
└── words
├── README
├── TESTING
└── words.c
This listing was produced by the
tree
command. Neat, huh?