Why the auxiliary lab?

Many of you are still struggling with C, as evidenced in the midterm performance. We provide this lab to help you get more C programming practice and (potentially) improve your grade if you have done poorly in the midterm. If you have done well in the midterm and are pretty comfortable with C, you need not complete this lab.

Like other labs, this lab counts for 7% of your total grade. However, this 7% comes out of the 25% midterm grade percentage. In other words, there are two ways of computing the final grade:

We use the higher score of the two as your actual final grade.

C exercises

As usual, do a git pull upstream master in your cso-labs directory. This lab's files are located in the auxclab/ subdirectory. There are three sub-directories auxlab/avgcsv, auxlab/wordcount, auxlab/gameoflife underneath. They correspond to the three C exercises in this lab. You will populate these directories with your source code.

In this lab, we provide no crutches (no skeleton code, no Makefile, no grading testers) to make you get comfortable with the overall experience of writing C programs and testing them on UNIX. Hence, the sub-directories for each exercise are completely empty.

Exercise 1: Averaging columns in a *.csv file

Put all your files for this exercise in auxlab/avgcsv directory.

Write a C program to parse a *.csv file where fields are separated by the ';' character and print out the average of each column (which is an integer), separated by the ';' character.

Your executable file must be named avgcsv. It should take as argument the name of a csv file and outputs the average of each column, separated by ';'.

For example, suppose the contents of the example.csv are given below:

$ cat example.csv   
10;25;56
22;10;100

Then, the expected out of the ./avgcsv example.csv command should be:

$ ./avgcsv example.csv
16.0;17.5;78.0

We expect you to write a Makefile to generate the avgcsv executable file from your source code. Please refer to this simple tutorial (under the compilation section) on how to write a Makefile.

You need to read from a file. Refer to [Kernighan and Ritchie] Chapter 7.5 on how to do this.

We do not provide you with any tests and will test your program under a few *.csv files of our own choosing during grading time. We highly encourage you to write unit tests for your program.

Exercise 2: Find the ten most frequent words

Put all your files for this exercise in auxlab/wordcount directory.

Write a C program to find the ten most frequently used words in a given file. Your executable file must be named wordcount. It should take as argument the name of a file and outputs ten lines, one for each of the ten most frequently used words in the file in sorted order. If two words w1, w2 have the same occurance counts and w1 is alphabetically smaller than w2, then w1 precedes w2. Each line shows the occurance counts, followed by a whitespace, and then followed by the corresponding word.

For example, suppose the contents of the example.txt are given below:

$ cat example.txt
A potential victory for Donald J. Trump may hinge on one important (and large) 
group of Americans: whites who did not attend college.

Polls have shown a deep division between whites of different education levels and 
economic circumstances. A lot rides on how large these groups will be on Election Day: 
All pollsters have their own assessment of who will show up, and their predictions 
rely on these evaluations.

Then, the expected output of the ./wordcount example.txt command should be:

$ ./wordcount example.txt
2 large
2 their
2 these
2 whites
2 who
2 will
3 a
3 and
3 of
4 on

In order to count the words, your program needs to transform the given textfile into a collection of words. First, you need to split the textfile into words. Each word refers to a consecutive sequence of alpha-numeric characters. Words are separated by one or more non-alphanumeric characters. (Hint: usage the C library function isalnum. For usage, type man isalnum) Second, you need to ``normalize`` the words by turning any uppercase letters into a lower case one.

To count the occurances of words, there are several strategies.

  1. You can implement a hash table to store the mapping from each word (C strings) to its occurance counter. C's standard library does not have hash tables nor dictionary, so you'll have to implement your own. After you've counted all the words, you'll need to sort them by their occurance counters.
  2. You can sort all words first. For sorting, you should learn to use the library function qsort (type man qsort to learn how to use it). You can then sum consecutive identical words in the sorted list to count them. Last, you need to sort words by their occurance counts.

Like in exercise 1, you should write a Makefile to compile your program and write your own unit tests to check the correctness. During grading, we'll test your program using text files of our own choosing.

Note that we will test your program using large text files. Your program must not run slower than O(n*log_n), otherwise, you will not be able to pass the test.

Game of Life

Put all your files for this exercise in auxlab/gameoflife directory.

Write a C program to simulate the Conway's game of life.

In a game of life of size n by n, the universe is two dimensional and consists of n by n cells. Each cell is in one of two possible states, "live"/"dead" (or "populated"/"unpopulated"). Every cell interacts with its eight neighbours, which are the cells that are horizontally, vertically, or diagonally adjacent. At each step in time, the following transitions occur:

The initial pattern constitutes the seed of the system. The first generation is created by applying the above rules simultaneously to every cell in the seed; births and deaths occur simultaneously, and the discrete moment at which this happens is sometimes called a tick. The rules continue to be applied repeatedly to create further generations.

What about boundary conditions? We treat the borders of the 2D world as if they wrap around. In a universe of n x n cells, let's refer to the top-left cell as in position (0,0) and the bottom right cell as in position (n-1,n-1). If borders wrap around, then the 8 neighbors of (i,j) are:

  1. ((i+1)%n,j) right neighbor
  2. ((i-1)%n,j) left neighbor
  3. (i,(j+1)%n) top neighbor
  4. (i,(j-1)%n) bottom neighbor
  5. ((i+1)%n,(j+1)%n) topright neighbor
  6. ((i-1)%n,(j+1)%n) topleft neighbor
  7. ((i+1)%n,(j-1)%n) bottomright neighbor
  8. ((i-1)%n,(j-1)%n) bottomleft neighbor

Your executable file must be named gameoflife. It should take two arguments. The first argument is the name of a seed pattern file. The second argument is the number of ticks to run for the simulation. The seed pattern file contains one line per row of the universe. If the cell is "dead", its position is marked with the '.' character. If the cell is "live", the position is marked with the 'x' character. The seed file also effectively specifies the size of the universe to simulate.

For example, the contents of an example seed file example_seed are as follows:

$ cat example_seed
.....
.....
.xxx.
.....
.....
Then, the expected output of the command ./gameoflife example_seed 1 should be:
$ ./gameoflife example_seed 1
.....
..x..
..x..
..x..
.....
The expected output of the command ./gameoflife example_seed 2 should be:
$ ./gameoflife example_seed 2
.....
.....
.xxx.
.....
.....

Like in exercise 1 and 2, you should write a Makefile to compile your program and write your own unit tests to check the correctness. During grading, we'll test your program using seed files of our own choosing.

Handin Procedure

To handin your files, simply commit and push them to github.com
$ git commit -am "Submit Lab-xxx"
$ git push origin 
We will fetching your lab files from Github.com at the specified deadline and grade them. You should commit frequently. By default, we will use your latest commit before the deadline to evaluate your labs. Your grade is determined by the score your solution reliably achieves when we run the tester on our test machines.