Extra - Introduction to C
This “lecture extra” is meant to complement today’s lecture notes.
Warning: the examples in (or cited by) this “lecture extra” may not follow CS50 programming style.”
We cover
- Operators
- Precedence
- Base data types
- Storage modifiers
- Scope rules of global and local variables
- Flow of control in C programs
Let’s C !
Basic Operators
Nearly all operators in C are identical to those of Java. However, the role of C in system programming exposes us to much more use of the shift and bit-wise operators than in Java. Here are the basic operators:
- Assignment
=
- Arithmetic
+, -, *, /, %, unary
-
Priorities may be overridden with
( )
’s. - Relational (all of these have the same precedence)
>, >=, <, <=
- Equality
==
,!=
- Logical
&&
(and),||
(or),!
(not)
Pre- and post- decrement and increment operators
Any (integer, character or pointer) variable may be either incremented or decremented before or after its value is used in an expression.
For example :
--fred
will decrement fred
before its value is used
++fred
will increment fred
before its value is used
fred--
will get (old) value and then decrement fred
fred++
will get (old) value and then increment fred
Let’s write some C code to look at pre and post increment and decrement.
C code: increment.c
/*
file: increment.c
Description: Illustrate pre and post increment and decrement.
*/
#include <stdio.h>
int main() {
int fred = 3, a=3;
printf("Start; fred = %d and a = %d\n", fred, a);
a = --fred;
printf("a = --fred; fred = %d and a = %d\n", fred, a);
a = ++fred;
printf("a = ++fred; fred = %d and a = %d\n", fred, a);
a = fred--;
printf("a = fred--; fred = %d and a = %d\n", fred, a);
a = fred++;
printf("a = fred++; fred = %d and a = %d\n", fred, a);
return 0;
}
Once we have the C code we have to compile it with gcc
with the various compiler switches we discussed in Lecture 1.
To see ALL of the options, try
gcc --help -v
# or
man gcc
We’ll use mygcc -o filename filename.c
as our compilation command for now.
Later, we’ll use the more powerful make
utility.
In .bashrc
we already defined mygcc
as an alias:
$ alias mygcc
alias mygcc='gcc -Wall -pedantic -std=c11 -ggdb'
$ mygcc -o increment increment.c
$ ls -l increment
-rwxr-xr-x 1 atc admin 13344 Jan 14 21:51 increment
$ ./increment
Start; fred = 3 and a = 3
a = --fred; fred = 2 and a = 2
a = ++fred; fred = 3 and a = 3
a = fred--; fred = 2 and a = 3
a = fred++; fred = 3 and a = 2
The compiler produces an executable file named increment
.
You do not have to use chmod
to make it an executable.
The compiler takes care of that when it creates an executable with the correct permission for the file increment
.
Try it yourself.
Save the file in your directory (e.g., cs50/code/
), and compile and run the code.
Verify that the output matches the above.
More about gcc and mygcc
A word of warning: Whether using gcc -o hello hello.c
or mygcc -o hello hello.c
you must take care to avoid getting the order of the files wrong with the -o
switch which tells the complier that the name of the file following the -o
switch will be the name of the executable.
One student compiled the correct way mygcc -o hello hello.c
(producing a executable hello
correctly) and then recompiled but got the order wrong: mygcc -o hello.c hello
.
What the gcc
compiler did wasn’t pleasant.
It took the executable hello
as the source file and and hello.c
as the name of the executable to be created.
The result was the real source file hello.c
disappeared!
Well, it didn’t actually disappear, it was just erased by the compiler as it got ready to produce a new output file of that same name.
So please be careful: the -o
tells the compiler that the executable it creates should be given the name that follows the -o
.
You can always use the default a.out
and not use the -o
option if concerned.
The other gcc
switches are important to use because make sure we write good clean code.
Lets discuss what these
gcc options are (
-Wall -pedantic -std=c11`):
-Wall
turns on all optional warnings which are desirable for normal code.-pedantic
checks programs for strict ISO C conformance and issues all the warnings demanded by strict ISO C.-std=c11
indicates the language standard we want. By using these options the compiler forces us to resolve various warning that you would not see if you just usedgcc
without any switches.
Always use these options (-Wall -pedantic -std=c11
) from now on in this course.
You can do a man gcc
and look at the option meanings for the nitty gritty details on these settings.
Bitwise operators and masking
&
(bitwise and), |
(bitwise or), ~
(bitwise negation).
You can use these to check if certain bits are on, as in (nextchar & 0x30)
.
Do you see how this works?
Shift operators <<
(shift left), >>
(shift right)
Shifting bits to the left or right. Note: results may vary based upon whether the type of the variable being shifted is “signed” or “unsigned”. See H&S pp.231-233.
Combined operators and assignment
Example combined.c
#include <stdio.h>
#include <stdlib.h>
int main()
{
int a,b,c,d;
a=10; b=20; c=30; d=4;
// these lines ...
a=a+2;
b=b-12;
c=c*(d-2);
printf( "a=%d, b=%d, c=%d\n", a, b, c);
a=10; b=20; c=30; d=4;
// could have been written this way
a += 2;
b -= 12;
c *= d-2;
printf( "a=%d, b=%d, c=%d\n", a, b, c);
exit (0);
}
Type coercion
C permits assignments and parameter passing between variables of different types using type casts or coercion. Casts in C are not implicit, and are used where some languages require a ‘transfer function’. We will see examples of C’s cast operator later in the course.
Precedence of operators
Expressions are all evaluated from left-to-right, and the default precedence may be overridden with brackets.
Operator |
---|
Precedence |
highest |
( ) |
++ -- ! |
* / % | |
+ - |
== != |
& |
| |
&& |
|| |
?: |
= |
, |
lowest |
Variable names
Variable names (and type and function names as we shall see later) must commence with an alphabetic or the underscore character A-Z a-z _
and be followed by zero or more alphabetic, underscore or digit characters A-Z a-z 0-9
.
Most C compilers, such as gcc
, accept and support variable, type, and function names to be up to 256 characters in length.
(Some older C compilers only supported variable names with up to 8 unique leading characters and keeping to this limit may be preferred to maintain portable code.) It is also preferred that you do not use variable names consisting entirely of uppercase characters.
All-uppercase variable names are typically reserved for constants (such as MAXBUFSIZE
, AVAGADROS_NUMBER
, MAXUSERS
).
Importantly, C variable names are case sensitive, so MYLIMIT
, mylimit
, Mylimit
and MyLimit
are four different variable names.
There are some specific variable/function naming styles that you may encounter. The major ones are
-
camelCase: writing compound words with the first letter of each word capitalized, except for the first word’s first letter, which is not capitalized.
-
PascalCase: writing compound words just as in camelCase* with the first letter of the first word also capitalized. (In Java it is common to use this case for class names, but Camel case for member names.)
-
snake_case: writing compound words with an underscore between each word with little, if any, capitalization.
For a short history of how these came about and some discussion, see Brad Abrams’ blog post.
Any programming project, including all of your assignments, should pick a variable/function naming style and stick with it.
Base types
Variables are declared to be of a certain type. This type may be either a base type supported by the C language itself, or a user-defined type consisting of elements drawn from C’s set of base types. C’s base types and their representation on our lab’s systems are:
type | description |
---|---|
char |
the character type, 8 bits long |
short |
the short integer type, 16 bits long |
int |
the standard integer type, 32 bits long |
long |
the longer integer type, 64 bits long |
float |
the standard floating point (real) type, 32 bits long (about 10 decimal digits of precision) |
double |
the extra precision floating point type, 64 bits long (about 17 decimal digits of precision) |
We can determine the number of bytes required for datatypes (and other things, as we will see later) with the sizeof
operator.
In contrast, Java defines how long each datatype may be.
In C, the sizes vary from machine to machine, with the details managed by the compiler.
C’s only guarantee is that:
sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)
Let’s write some C code to look at these base data types.
We will use the sizeof
operator and the printf
function.
We will also define variables of each of the base types and print the initialized values as part of the data-types.c
code.
Example data-types.c
The contents of data-types.c looks like this:
/*
file: data-types.c
Description: Sets up variables for different base data types, intialises them
and print the data and the size of the base data types in number of bytes.
Revised version of code pg. 96 (Program 2.10) (Bronson) "First Book on ANSI C"
*/
#include <stdio.h>
int main() {
char ch = 'a';
int in = 2;
short sh = 3;
long lo = 4;
long long llo = 1234567890;
float fl = 1000.256734;
double db = 11e+23;
long double ld = 10e+30;
printf("-------contents ------- sizeof()------\n\n");
printf("contents of char is %c --- sizeof %ld bytes\n\n", ch, sizeof(char));
printf("contents int is %d --- sizeof %ld bytes\n\n", in, sizeof(int));
printf("contents short is %d --- sizeof %ld bytes\n\n", sh, sizeof(short));
printf("contents long is %ld --- sizeof %ld bytes\n\n", lo, sizeof(long));
printf("contents long long is %lld --- sizeof %ld bytes\n\n", llo, sizeof(long long));
printf("contents float is %f --- sizeof %ld bytes\n\n",fl, sizeof(float));
printf("contents double is %e --- sizeof %ld bytes\n\n",db, sizeof(double));
printf("contents long double is %LG --- sizeof %ld bytes\n",ld, sizeof(long double));
return 0;
}
Let’s compile and run the code.
$ mygcc -o data-types data-types.c
$ ./data-types
-------contents ------- sizeof()------
contents of char is a --- sizeof 1 bytes
contents int is 2 --- sizeof 4 bytes
contents short is 3 --- sizeof 2 bytes
contents long is 4 --- sizeof 8 bytes
contents long long is 1234567890 --- sizeof 8 bytes
contents float is 1000.256714 --- sizeof 4 bytes
contents double is 1.100000e+24 --- sizeof 8 bytes
contents long double is 1E+31 --- sizeof 16 bytes
Check it out: Save the file in your directory; compile and run the code. Check the output.
Storage modifiers of variables
Base types may be preceded with one or more storage modifier:
storage | meaning |
---|---|
modifier | |
auto |
the variable is placed on the stack (default, deprecated) |
const |
the variable is a constant, and can/will never change value |
extern |
the variable is defined outside of the current file |
register |
request that the variable be placed in a register (mostly ignored) |
static |
the variable is placed in global storage with limited visibility |
typedef |
introduces a user-defined type |
unsigned |
stored and used only as positive integers |
Initialization of variables
All scalar auto
and static
variables may be initialized immediately after their definition, typically with constants or simple expressions that the compiler can evaluate at compile time.
The C99 language defines that all uninitialized global variables, and all uninitialized static local variables will have the starting values resulting from their memory locations being filled with zeroes - conveniently the value of 0
for an integer, and 0.0
for a floating point number.
Scope rules of global variables
Scope is defined as the section (e.g., function, block) of the program where the variable is valid and known.
In Java, a variable is simply used as a name by which we refer to an object. A newly created object is given a name for later reference, and that name may be re-used to refer to another object later in the program. In C, a variable more strictly refers to a memory address (or contiguous memory address starting from the indicated point) and the type of the variable declares how that memory’s contents should be interpreted and modified.
C only has two true lexical levels, global
and function
, though sub-blocks of variables and statements may be introduced in sub-blocks in many places, seemingly creating new lexical levels.
As such, variables are typically defined globally (at lexical level 0), or at the start of a statement block, where a function’s body is understood to be a statement block.
Variables defined globally in a file, are visible until the end of that file.
They need not be declared at the top of a file, but typically are.
If a global variable has a storage modifier of static
, it means that the variable is only available from within that file.
If the static
modifier is missing, that variable may be accessed from another file if part of a program compiled and linked from multiple source files.
The extern
modifier is used (within a given file) to declare that the indicated variable is defined in another file.
The variable may be declared as extern
in all files, but must be defined (and not as a static
!) in only a single file.
Scope rules of local variables
Variables may also be declared at the beginning of a statement block, but may not be declared anywhere other than the top of the block. Such variables are visible until the end of that block; most variables are declared at the top of a function’s block, and are thus visible until the end of the current function. A variables name may shadow that of a global variable, making that global variable inaccessible. Blocks do not have names, and so shadowed variables cannot be named. Local variables are accessible until the end of the block in which they are defined.
Local variables are implicitly preceded by the auto
modifier as control flow enters the block; memory for the variable is allocated on the run-time stack.
The memory is automatically deallocated (or simply becomes inaccessible) as control flow leaves the block.
The implicit auto
modifier facilitates recursion in C as each entry into a new block allocates memory for new local variables, and these unique instances are accessible only while in that block.
If a local variable is preceded by the static
modifier, its memory is not allocated on the run-time stack, but in the same memory as for global variables.
When control flow leaves the block, the memory is not deallocated, and remains for the exclusive use by that local variable.
The result is that a static
local variable retains its value between entries to its block.
Whereas the starting value of an auto
local variable (sitting on the stack) cannot be assumed (or more correctly, should be considered to contain a totally random value), the starting value of a static local variable is as it was when the variable was last used.
Examples of global and local variables
Let’s look at some code snippets to reinforce the ideas of local and global variables and the issue of the scope of these variables in a section of code.
Example scope.c
/*
File: scope.c
Description: Illustrates the use of global and local variables and
global function prototypes.
Revised code taken from pg. 330 (Program 7.1) (Bronson) "First Book on ANSI C"
*/
#include <stdio.h>
/* firstnum is a global variable not defined on the main() stack. It has full scope
of all functions in the file scope.c. Any code in the file can read and write to it.
Once main() teriminates the variable is dellocated and no longer exists
*/
int firstnum; /* create a global variable named firstnum */
void valfun(); /* global function prototype */
int main()
{
int secnum; /* create a local variable named secnum */
firstnum = 10; /* store a value into the global variable */
secnum = 20; /* store a value into the local variable */
printf("\nFrom main(): firstnum = %d",firstnum);
printf("\nFrom main(): secnum = %d\n",secnum);
valfun(); /* call the function valfun */
printf("\nFrom main() again: firstnum = %d",firstnum);
printf("\nFrom main() again: secnum = %d\n",secnum);
return 0;
}
void valfun() /* no values are passed to this function */
{
/* secum is a local variable created on the stack when valfun() executes.
When valfun() exits the stack is deallocated and the variable no
longer exists. It is local and its scope is valfun() */
int secnum; /* create a second local variable named secnum */
secnum = 30; /* this only affects this local variable's value */
printf("\nFrom valfun(): firstnum = %d",firstnum);
printf("\nFrom valfun(): secnum = %d\n",secnum);
firstnum = 40; /* this changes firstnum for both functions */
}
If we run the code the output is as follows:
$ mygcc -o scope scope.c
$ ./scope
From main(): firstnum = 10
From main(): secnum = 20
From valfun(): firstnum = 10
From valfun(): secnum = 30
From main() again: firstnum = 40
From main() again: secnum = 20
Study the output. Is it what you expected?
The first thing to note about the source code is that it defines a global variable firstnum
that’s scope is the complete file and therefore is accessible from main()
and the valfun()
.
Note scope.c
has a main()
and a valfun()
function.
The prototype for valfun()
is declared at the top of the file, giving it global scope in the file scope.c
.
(We will discuss prototypes more later.) Both main()
and valfun()
update and print the value of firstnum
, which represents a variable with a memory address (space is not allocated on the stack as in the case of auto variables such as secnum
).
Note that main()
and valfun()
both have local variables named secnum
.
This name collision is not a problem and causes no clash because of the scope of these two different local variables (that happen to have the same name) only have local scope inside the main()
and valfun()
functions, respectively.
Their instances are private to main()
and valfun()
, respectively.
They have no association other than having the same names.
They are auto variables created on the stack and no longer exist when the function exits.
For example, valfun()
creates a variable for secnum
of integer type on its local stack when it executes, but when it returns control to main()
the stack is deallocated and the variable no longer exists.
In contrast, the global variable firstnum
and its current value are not changed when valfun()
exits.
Examples of auto and static local variables
Consider another storage modifier that is impacted by scope: static
.
Here the variable is placed in global storage with limited visibility depending on where it is defined.
Let’s look at two code snippets that illustrate the use of local auto and static variables.
These represent two important cases in C.
First, let’s look at the case of auto local variables.
Example: auto.c
/*
File: auto.c
Description: Illustrates the auto local variables
Code taken from pg. 336 (Program 7.2) (Bronson) "First Book on ANSI C"
*/
#include <stdio.h>
void testauto(); /* function prototype */
int main()
{
int count; /* create the auto variable count */
for(count = 1; count <= 3; count++)
testauto();
return 0;
}
void testauto()
{
int num = 0; /* create the auto variable num */
/* and initialize to zero */
printf("The value of the automatic variable num is %d\n", num);
num++;
}
If we run the code the output is as follows:
$ ./auto
The value of the automatic variable num is 0
The value of the automatic variable num is 0
The value of the automatic variable num is 0
Study the output. Is it what you expected?
Now let’s look at the case when num
is defined as static
inside the scope of the function teststat()
.
Note, that the value of num
is now persistent across multiple invocations of the function.
This is in direct contrast to the auto local varable of the last code snippet - i.e., auto.c
.
In essence, the operator static
allocates memory to the variable of type int
that is outside the stack just like a global variable in scope.c
- i.e., firstnum
.
However, the distinction here is that static
is not global.
It is only accessible in the function teststat()
.
We hope that clarifies the issue of scope, local and global variables and the issue of auto variables and static variables.
First, let’s look at the case of static local variables.
Example static.c
/*
File: static.c
Description: Illustrates the use of auto variables, with `static`.
Revised code taken from pg. 336 (Program 7.2) (Bronson) "First Book on ANSI C"
*/
#include <stdio.h>
void teststat(); /* function prototype */
int main()
{
int count; /* count is a local auto variable */
for(count = 1; count <= 3; count++)
teststat();
return 0;
}
/* Note, that the varuable num in teststat() is only set to zero once. The value set by
the local scope static variable num detains its value when teststat() returns.
*/
void teststat()
{
static int num=0; /* num is a local static variable */
printf("The value of the static variable num is now %d\n", num);
num++;
}
If we run the code the output is as follows:
$ ./static
The value of the static variable num is now 0
The value of the static variable num is now 1
The value of the static variable num is now 2
Is this what you expected?
Question: If I have defined static int num;
at the top of static.c
how would that change the scope of the static variable?
Is it different to int num
; as defined as global variable (like firstnum
in scope
.c)?
Flow of control in a C program
Control flow within C programs is almost identical to the equivalent constructs in Java.
However, C provides no exception mechanism, and so C has no try
, catch
, and finally
, constructs.
Conditional execution
if ( expression )
statement1;
if ( expression ) {
statement1;
statement2;
......
}
if ( expression )
statement;
else
statement;
Of significance, and a very common cause of errors in C programs, is that C (before C99) has no Boolean datatype. Instead, any expression that evaluates to the integer value of 0 is considered false, and any nonzero value as true (the opposite of bash!). A conditional statement’s controlling expression is evaluated and if non-zero (i.e., true) the following statement is executed. Most errors are introduced when programmers (accidentally) use embedded assignment statements in conditional expressions:
if (loop_index = MAXINDEX )
statement;
/* instead of ... */
if (loop_index == MAXINDEX )
statement;
A good habit to get into is to place constants on the left of (potential) assignments:
if (0 = value )
statement;
When compiling with gcc -std=c11 -Wall -pedantic
the compiler will complain about the use of assignment operators inside if
conditions … the only way to ‘shut up the compiler’ is to use extra parentheses:
if ( ( loop_index = MAXINDEX ) )
statement;
C’s other control flow statements are very unsurprising:
while ( conditional-expression ) {
statement1;
statement2;
......
}
do {
statement1;
statement2;
......
} while ( conditional-expression );
for( initialization ; conditional-expression ; adjustment ) {
statement1;
statement2;
......
}
Examples of code snippets:
/* Loops: using the for statement */
#define ARRAY_LENGTH 100
long array[ARRAY_LENGTH];
int i;
for ( i = 0; i < ARRAY_LENGTH; i++) {
array[i] = 4 * i;
printf('Value of i is %d\n", i);
}
Any of the components of the for
statement’s for-expressions may be missing, If the conditional-expression is missing, it is always true.
Infinite loops may be requested in C with for( ; ; )
… or with while(1)
…
The equivalence of for and while
for ( expression1 ; expression2 ; expression3 ) {
statement1;
}
expression1;
while ( expression2 ) {
statement1;
expression3;
}
Example of equivalence using the while
statement
i = 0;
while (i < ARRAY_LENGTH) {
array[i] = 4 * i;
printf('Value of i is %d\n", i);
i++;
}
The switch statement
switch ( expression ) {
case const1 : statement1; break;
case const2 : statement2; break;
case const3 :
case const4 : statement4;
default : statementN; break;
}
One of the few differences here between C and Java is that C permits control to “drop down” to following case constructs, unless there is an explicit break
statement.
Example operator.c
/*
File: operator.c
Description: Implements basic operations (multiplication, division, addition, modulos.
Revised version of code pg. 191 (Program 4.6) (Bronson) "First Book on ANSI C"
*/
#include <stdio.h>
int main() {
int opselect;
float fnum, snum;
printf("Please type in two numbers: ");
scanf("%f %f", &fnum, &snum);
printf("Enter a select code:");
printf("\n 1 for addition");
printf("\n 2 for multiplication");
printf("\n 3 for division : ");
printf("\n 4 for modulus : ");
scanf("%d", &opselect);
// This line below as an intentional bug. It is commented out.
// it causes a segmentation fault (segfault for short)
//scanf("%d", opselect);
switch (opselect) {
case 1:
printf("The sum of the numbers entered is %6.3f\n", fnum + snum);
break;
case 2:
printf("The product of the numbers entered is %6.3f\n", fnum * snum);
break;
case 3:
if (snum != 0.0)
printf("The first number divided by the second is %6.3f\n",fnum / snum);
else
printf("Division by zero is not allowed\n");
break;
case 4:
printf("The modulus of the numbers entered is %d\n", (int)fnum % (int)snum);
break;
default:
printf("Need to enter a number between 1-4\n\n");
}
return 0;
}
The break statement
for ( expression1 ; expression2 ; expression3 ) {
statement1 ;
if( ... )
break;
statementN ;
}
while ( expression1 ) {
statement1 ;
if( ... )
break;
statementN ;
}
switch ( expression1 ) {
case const1:
statement 1;
break;
case const2:
statement 2;
break;
case const3:
statement 3;
break;
default:
statement n;
break;
}
The continue statement
for ( expression1 ; expression2 ; expression3 ) {
statement1 ;
if( ... )
continue;
statementN ;
}
while ( expression1 ) {
statement1 ;
if( ... )
continue;
statementN ;
}