Modules and Makefiles
Goals
- how to create a re-usable “module” by separating its code into a pair of
.c
and.h
files. - the Make command and its Makefiles - never run
mygcc
again!
For a deeper dive into make
, see the lecture extra.
For a look at the C preprocessor, see another lecture extra.
For more about C header files, see another lecture extra.
Extracting bag from names
Recall our names9 program from the prior lecture, in which we converted the linked list data structure, specific to that particular application’s needs, into a generic bag data structure. Our next step is to extract that bag into a separate set of files, with a clearly defined interface, so it can be used by this program and easily re-used by other programs. Given its design, it’s also easy to create multiple bags, unlike the original design (names5.c) that was constrained to a single application-specific linked list.
Today we divide the code names9 into three files:
namesA.c
- a new version of the application;bag.h
- declarations that form the interface to he bag modulebag.c
- functions that define the implementation of the bag module.
Since we also use the readlinep
module, we must now compile the program with a command like
mygcc names.c bag.c readlinep.c -o names
Make and Makefiles
As our programs become more complex, it’s increasingly tedious (and error-prone) to type the commands to compile the code into an executable. Indeed, for a large codebase, it can take minutes or hours to compile all the code. Fortunately, there is a good tool to make an executable from a collection of source files, incrementally recompiling only those files that have been updated.
The make
command reads a configuration file called Makefile
– written in its own little language – and executes shell commands as needed to compile all the source files and link them into an executable.
Before we look at a specific example, let’s first take a closer look at the compilation process.
From now on we will compile all our programs with Make.
Compiling and linking
The compiler reads “source code” (from a file like bag.c
containing C code) and outputs “assembly code” (in a file bag.s
containing assembly language).
Assembly language is still a human-readable text file, but now lists machine-level instructions instead of C statements.
In other words, the compiler translates source code into assembly code.
The Unix command for the C compiler is cc
, though we use one from Gnu called gcc
.
The assembler reads assembly code and outputs “object code”, translating the textual machine instructions into binary machine instructions.
It’s still not a complete program, not yet “executable” by the operating system.
The Unix command for the assembler is as
.
The linker reads object code (from files like names.o
, bag.o
, and readlinep.o
, and from libraries) and links them all together into an “executable” file, ready for the OS to run (execute) when we ask bash to run it.
It does not translate the code further – both input and output in machine instructions in a binary format – it “links” the pieces together, resolving references in one file to functions and variables in other files.
For example, names.o
may reference a function bag_new()
, which the linker finds in bag.o
.
The linker assigns memory addresses for all the code and global variables, ready for loading into memory when the program is executed.
The Unix command for the linker is ld
(short for “load”, not to be confused with ln
, which links files together).
However, The linker is not a magical program.
The gcc
command is actually pretty sophisticated, and in a form like
mygcc names.c bag.c readlinep.c -o names
will actually run four commands, something like this:
gcc -c names.c
gcc -c bag.c
gcc -c readlinep.c
ld names.o bag.o readlinep.o libraries... -o names
rm names.o bag.o readlinep.o
Each of the first three ask to “compile” (-c
) the listed C source file; by default, the compiler creates an object-code file in a file of the same name with the extension .o
.
(Under the hood, it’s actually running both the compiler and then the assembler, briefly creating the intermediate .s
file.) The fourth line links those three object files together with a collection of common libraries, to produce the executable program; the original -o names
commandline argument is placed here.
Finally, it cleans up by removing the intermediate object files.
Again, for a large code base, it is a waste of effort to recompile every C source file every time, even if only one of them has changed since the last compilation.
Dependencies
To optimize this process it helps to know which files depend on each other. Clearly, each object file depends on its source file:
names.o: names.c
bag.o: bag.c
readlinep.o: readlinep.c
and the program depends on all the object files that will linked together:
names: names.o bag.o readlinep.o
(It also depends on the standard libraries, but we assume they never change and we won’t worry about them.)
Actually, if we look deeper, the object files also depend on the include files, because those files are virtually included in the C source files – the compiler reads the include file bag.h
when it sees #include "bag.h"
at some point while reading the C source code.
Thus we also have dependencies
names.o: bag.h readlinep.h
bag.o: bag.h
readlinep.o: readlinep.h
(They also include .h
files for standard libraries, but again, we assume those never change and thus we don’t worry about them as dependencies.)
The above dependencies are actually written in make
language.
The name to the left of each colon is the target, and the name(s) to the right of the colon are the dependencies.
The target depends on the dependencies, and should be rebuilt if any of the dependencies change.
Notice that dependencies are transitive: names
depends on bag.o
which depends on bag.c
and bag.h
, and so forth.
Make builds the dependency tree and rebuilds files as needed.
But how does it know how to rebuild one file from another?
We add a command (or list of commands), indented with a tab, on the line(s) that immediately follow the target.
We put all this together in a file called Makefile
:
# executable depends on object files
names: names.o bag.o readlinep.o
gcc names.o bag.o readlinep.o -o names
# object files depend on source files
names.o: names.c
bag.o: bag.c
readlinep.o: readlinep.c
# object files also depend on include files
names.o: bag.h readlinep.h
bag.o: bag.h
readlinep.o: readlinep.h
# how to compile source into object files
names.o:
gcc -c names.c
bag.o:
gcc -c bag.c
readlinep.o:
gcc -c readlinep.c
Important: commands must be indented with a tab character, not spaces, or Make will not recognize them as commands. Some editors are savvy and insert a tab when you type a tab, and some try to be clever and change the tab to spaces. Make requires a tab.
Now, all we have to do is type make
, and let it do all the work!
$ ls
Makefile bag.h names.txt readlinep.h
bag.c names.c readlinep.c
$ make
gcc -c names.c
gcc -c bag.c
gcc -c readlinep.c
gcc names.o bag.o readlinep.o -o names
$ make
make: `names' is up to date.
$ touch bag.h
$ make
gcc -c names.c
gcc -c bag.c
gcc names.o bag.o readlinep.o -o names
$ ls
Makefile bag.o names.o readlinep.h
bag.c names* names.txt readlinep.o
bag.h names.c readlinep.c
$
Notice that, the first time I ran make
it compiled each source file into an object file, and then compiled them all together.
(We use gcc
to link object files, because it’s very complicated to set up a proper ld
commandline, and gcc
is smart enough to realize it need only run the linker when it’s only given object files.)
On my second run of make
, it checked the dependency tree and decided there was nothing to be rebuilt.
Time saved!
I pretended to edit bag.h
by “touching” it - the touch
command just updates the modification date of the file(s) named on its command line, creating files if needed but making no changes to the content of existing files.
On my third run of make
, it checked the dependency tree and saw that both bag.o
and names.o
depend on bag.h
, so it recompiled both.
Because those two object files changed, it had to rebuild names
.
In the end, notice that the .o
files stick around - which is why make
does not need to rebuild them every time.
Make actually knows about C and the relationship between .c
and .o
files, so several of the dependencies we listed were not necessary to list.
Indeed, some of the commands we listed are also unnecessary: they are implicit rules already known to make.
We thus streamline our Makefile
:
# executable depends on object files
names: names.o bag.o readlinep.o
gcc names.o bag.o readlinep.o -o names
# object files also depend on include files
names.o: bag.h readlinep.h
bag.o: bag.h
readlinep.o: readlinep.h
Compare with the prior Makefile
and you’ll see two whole sections gone.
This new makefile works just the same:
$ ls
Makefile bag.c names.c readlinep.c
Makefile~ bag.h names.txt readlinep.h
$ make
cc -c -o names.o names.c
cc -c -o bag.o bag.c
cc -c -o readlinep.o readlinep.c
gcc names.o bag.o readlinep.o -o names
$ ls
Makefile bag.h names.c readlinep.c
Makefile~ bag.o names.o readlinep.h
bag.c names* names.txt readlinep.o
$
There are two important details, however.
First, you’ll note that its implicit rules run cc
instead of gcc
.
Second, note that neither our explicit rules nor Make’s implicit rules pass the CS50-standard command-line arguments to the C compiler.
We need to tell Make about those.
We can assign to some Make variables to inform Make about our preferred compiler and command-line options.
(Actually, Make calls them “macros” because their value cannot actually vary.)
CC=gcc
CFLAGS=-Wall -pedantic -std=c11 -ggdb
# executable depends on object files
names: names.o bag.o readlinep.o
$(CC) $(CFLAGS) names.o bag.o readlinep.o -o names
# object files also depend on include files
names.o: bag.h readlinep.h
bag.o: bag.h
readlinep.o: readlinep.h
The first two lines assign values to the macros CC
and CFLAGS
.
These two particular macros are known to Make; it will use them in its implicit rules.
To use them in our explicit rules, too, we have to refer to them explicitly - see the commands to rebuild names
.
Notice that the macro-substitution syntax in Make uses a dollar sign - like bash - but requires parentheses around the macro name - unlike bash.
$ make
gcc -Wall -pedantic -std=c11 -ggdb -c -o names.o names.c
gcc -Wall -pedantic -std=c11 -ggdb -c -o bag.o bag.c
gcc -Wall -pedantic -std=c11 -ggdb -c -o readlinep.o readlinep.c
gcc -Wall -pedantic -std=c11 -ggdb names.o bag.o readlinep.o -o names
This time, Make used the gcc
compiler and all our flags.
Which target?
So far, we’ve just typed make
at the command line.
We can actually tell Make to build a particular target by naming it:
$ make readlinep.o
gcc -Wall -pedantic -std=c11 -ggdb -c -o readlinep.o readlinep.c
When given no target(s) on the command line, Make chooses the first target that has commands.
In our Makefile
, that was names
- which is why we put it first.
In a complex Makefile
, it is common to place a target named all
first, with a dependency (list) that indicates which target (list) is actually desired when one types make all
or just make
.
This is a phony target, because the result of
make all
is never to produce a file calledall
. That’s ok - its purpose is to direct Make toward the target(s) we want it to build.
Cleaning up
We’ve taught Make how to build our program - now let’s teach it how to clean up.
By convention, every good Makefile
has a target called clean
.
Let’s extend the above Makefile
(by convention, the clean
rule goes at the bottom of the file):
clean:
rm -f names
rm -f *~ *.o
rm -rf *.dSYM
The -f
flag (“force”) asks rm
to exit with success even if it fails - e.g., if there are no files by those names to be removed.
The -r
flag (“recursive”) removes a directory and its contents.
These commands remove the names
executable, and the object files *.o
, and the emacs backup files ~
, and the dSYM directory gcc
creates in support of gdb
.
Look how well it cleans up!
$ ls
Makefile bag.h names.c readlinep.c
Makefile~ bag.o names.o readlinep.h
bag.c names* names.txt readlinep.o
$ make clean
rm -f names
rm -f *~ *.o
rm -rf *.dSYM
$ ls
Makefile bag.h names.txt readlinep.h
bag.c names.c readlinep.c
In some Makefiles, you’ll see the author explicitly tell Make that this rule is “phony”:
.PHONY: clean
More on macros
Any symbol that is defined in a Makefile like this
NAME = VALUE
is called a macro. Once defined, you can expand them using
... $(NAME) ...
It is also common to define macros for the name of the program being built, and the list of object files that go into it.
Below, our final Makefile
declares PROG
and OBJS
for this purpose, and uses them in the build command.
I also included a list of library files needed by this program as LIBS
; although names
does not need the math library, it’s harmless to ask gcc
to link with the math library and it serves as a good example here.
(Certain libraries, like stdio and stdlib, are linked without you asking.)
PROG = names
OBJS = names.o bag.o readlinep.o
LIBS = -lm
# executable depends on object files
$(PROG): $(OBJS)
$(CC) $(CFLAGS) $(OBJS) $(LIBS) -o $(PROG)
Notice how we were able to rewrite the build rule to be fully generic - it names no specific files. Indeed, you could use this rule in all your Makefiles!
Complete Makefile
With the addition of the phony targets and a nice header comment, we’re done:
# Makefile for the "names" program that uses the "bag" module.
#
# David Kotz - April 2017
CC=gcc
CFLAGS=-Wall -pedantic -std=c11 -ggdb
PROG = names
OBJS = names.o bag.o readlinep.o
LIBS = -lm
.PHONY: all clean
all: names
# executable depends on object files
$(PROG): $(OBJS)
$(CC) $(CFLAGS) $(OBJS) $(LIBS) -o $(PROG)
# object files depend on include files
names.o: bag.h readlinep.h
bag.o: bag.h
readlinep.o: readlinep.h
clean:
rm -f $(PROG)
rm -f *~ *.o
rm -rf *.dSYM
The macro definitions at the top make it easy for a reader to quickly see what program will be built and from what files. The gnarly compilation rule is the same for every Makefile! Written once, and not likely needing to be changed. The dependencies at bottom are the only other part that varies with the specific program.
I saved this Makefile as names-Makefile in our examples directory.
Tips
Some other things to know about Make.
Errors
If Make encounters an error while compiling, it stops and does not proceed with the rest of the build.
If, for example, a C compilation step fails and prints its error messages, Make notices this, prints a message, and exits with error status 1.
It will not proceed to compile other files, or build the overall program.
If you want to ask Make to proceed even after encountering errors, run make -k
.
Make command line
Make supports many command-line arguments.
See man make
.
-f file
use file
instead of default Makefile
-n
Determine which commands need to be run, but don’t actually execute any of them, just print them.
-k
while a particular target and the things that depend on it might have failed, continue trying to make the other dependencies of these targets anyway.
(My emacs command uses this flag.)
-C dir
or --directory=dir
tells Make to change its working directory (as if it internally runs cd dir
) and then look for a Makefile (i.e., it will read from dir/Makefile
).
This option is useful for running Make in a subdirectory.
Automatic variables
The make utility also provides some useful automatic variables:
$@
name of the current target
$?
the list of dependencies that are newer than the target
$^
the list of dependencies for this target
For example, we could rewrite our names
target as follows
names: names.o bag.o readlinep.o
$(CC) $(CFLAGS) $^ -o $@
Although useful, I find the style we used in our complete example to be more readable.
Documentation
You can view the GNU documentation, the manual pages for make
, Steve Talbot’s “oldie but goodie” book “Managing projects with make” updated in 2004 by Robert Mecklenburg for GNU make
, or any of the variety of online tutorials for make
.
for emacs users
If you use emacs, and you installed our CS50 customation:
cp ~cs50/.emacs ~
Whenever you start emacs, it loads this file.
This file defines a new keystroke ^X-c
(ctrl-X c), which tells emacs to run make -k
in a separate subwindow.
If you get any compiler errors, type the keystroke ^X-`
(ctrl-X backquote) and emacs jumps to the right file and right line.
It’s handy!
If
cp
warns you about over-writing an existing.emacs
you might want to reconsider - perhaps you’ve customizedemacs
before and you should edit your~/.emacs
to include commands from~cs50/.emacs
rather than simply overwriting it! Best todiff ~/cs50/.emacs ~/.emacs
to see what’s different, rather than simply overwriting yours.
Non-compilation makefiles
You can use makefiles to solve many of your daily challenges involving a sequences of dependent actions:
- running test shell scripts during development and regression testing
- creating documents with LaTeX
- maintaining webpages (staging and live directories)
- automated documentation generation
- source code management
Here, for example, is the Makefile I use on my MacBook to compile and push the CS50 website to the webserver:
# Makefile for deploying Jekyll site.
# David Kotz, Spring 2017, borrowing from Charles Palmer and Ira Jenkins
DEST = cs50@green.cs.dartmouth.edu:public_html
DESTURL = http://www.cs.dartmouth.edu/~cs50/
LOCALURL = http://127.0.0.1:4000
# files *not* to push to the destination site
EXCLUDES = --exclude=/Makefile \
--exclude=/Comics/future \
--exclude=/Comics/unused \
--exclude=/examples/future \
--exclude=/examples/unused \
--exclude=/Labs/future \
--exclude=/Labs/unused \
--exclude=/Sections/future \
--exclude=/Sections/unused \
--exclude=/Reading/future \
--exclude=/Reading/unused \
--exclude=/Lectures/future \
--exclude=/Lectures/unused \
--exclude=\*-draft\* \
--exclude=\*~ \
--exclude=\*.o \
--exclude=a.out \
--exclude=\*.dSYM \
--delete-excluded
.PHONY: build deploy local clean
# just for testing the build
build:
@echo "\nBuilding..."
cat _config-{base,cs} > _config.yml
jekyll build
# build and push to CS server
deploy:
cat _config-{base,cs} > _config.yml
@echo "\nBuilding for CS..."
jekyll build --destination _site
-rm _config.yml
@echo "\nFixing permissions..."
chmod -R go-w _site/*
chmod -R go+rX _site/*
@echo "\nAdding data..."
ln -s ../data _site/data
@echo "\nDeploying..."
rsync -e ssh -alcvz $(EXCLUDES) --delete _site/ $(DEST)
open $(DESTURL)
# build and open in a local server (jekyll serve runs jekyll build first)
local:
@echo "\nBuilding for local viewing..."
cat _config-{base,local} > _config.yml
(sleep 5; open $(LOCALURL)) &
jekyll serve --incremental
-rm _config.yml
clean:
jekyll clean
-rm -f _config.yml
The leading @
on some command lines tells make
not to print that command out for the user to see - just to execute the command.
Here I use it for echo
commands because, of course, those print their arguments for the user to see.
The leading -
on some command lines tells make
to ignore the exit status of that command - so even a non-zero exit status will allow make
to continue and not exit with an error message.
Activity
In today’s activity we extend the Makefile
for names
to add a phony test target.