Friday, September 21, 2012

Summer Project: CSV Parser

Welcome to the third and final installation of the Summer Project series. This project, although isn't having me fix bugs 3 months after release, was the biggest project and did require the most knowledge of certain servers (Mostly LDAP and Active Directory LDAP). This application takes in several .csv files (a file that has a list of students and certain attributes) and creates an Engineering account for them. This application has many parts to it that made it possible:

  • CSV Parser
  • Create Account
  • Create Cards and allow to download
  • Setting up the csv files from the user
These were originally a series of perl files that a person would have to do manually in a linux terminal. My task was to take these perl files and port them into php and have the user (my boss) just select which csv files he wants to add and the code does all the rest. Note: porting perl to php can be an excruciating task (especially when translating regular expressions).

Setting up the .csv file:

In order to parse and process the csv files, the user will have needed to provide some. The first thing that needed to be done was to allow the user to open up a file explorer and select the csv file. This can be done using this html tag

<form action="formatfile.php" method="POST" enctype="multipart/form-data">

     <input type="hidden" name="MAX_FILE_SIZE" value="1000000" />

     <input type="file" name="inputFile" />

     <input type="submit" value="Add File" />
</form>

It has to be in the from because when the user does select "Add File", then it will upload that file to the server into a tmp folder. Adding a file to the tmp folder proved to be difficult because all folder have a default mod of 755 with the user as root and the group as apache. Some people online suggested to chmod it to 777 to allow apache to add. This would be disastrous and I was surprised that they would suggest it. With a security setting of 777, anyone can write anything in the tmp folder whether it be javascript files or other scripts. I needed to find a way for apache to write to the folder without anyone else. Since apache belongs to the group apache, a security setting of 775 would allow apache to write to the folder and no one else. 

When the file is uploaded, it is stored in a temporary location in $_FILES, that if left will be deleted. Therefore, I had to create a helper script to move the file from the tmp location to my more permanent tmp folder on the server. The script would also check to see if the file was really a valid csv file before adding it to the folder. 

After the file had been uploaded, the index page then moves on to show all the files that are uploaded where the user can add more files, remove some files, or parse the files which leads me to:

CSV Parser:

The CSV Parser goes through each line of the .csv file and extracts the relevent student information (name, uid, major, class). From there it puts it in the Create Student script (to be covered below). From there it will print the student information out and how much it completed (in percent). When testing, this process can take about 2 - 3 hours just to parse and add some 2000 students. I have no idea why it takes so long (though I do expect it has to do with adding to AD LDAP (connecting to Windows). After it does this for each .csv file, it will then create cards for each class they are in (i.e. CS1410-001) so that the teacher can hand out their account information at the first or second day of class (for those who request it). Part of the information that is pulled out is what engineering class they are taking. This is able to work too since each .csv file is just a department (i.e. MSE, BIOEN, CH EN, CS, ECE, ME EN). (The card generating algorithm will be covered below). 

This was the first script that I made that I had to port from perl. The way the perl script extracted the information from the line was that each information was at a certain point (uid took up the first eight, name had the next 50, etc). The perl script was able to use regular expressions to get it. 


($uuid,                 # (8)

   $fullname,             # (50) LAST, FIRST MIDDLE|M.I.

   $adm_stat,             # (4)  admission stat   (??)

   $major,                # (4)
   $degree,               # (5)  degree
   $enroll,               # (2)  enrollment class (??)
   $flag,                 # (2)  ???
   $colldes,              # (2)  college descriptor
   $coursedept,           # (5)  course department
   $coursenum,            # (4)  course number
   $coursesect,           # (3)  course section
#  $credithrs,            # (3)  credit hours
#  $grade,                # (2)
#  $crnocrf,              # (1)  credit no/credif flag
#  $courseti,             # (8)  course title INDEX (numeric)
#  $coursetitle,          # (30) course title       (descriptive name)
#  $addline1,             # (35) address line 1
#  $addline2,             # (35) address line 2 (seems to be empty)
#  $city,                 # (14) 
#  $state,                # (2)
#  $zip,                  # (5) ) = /^(\d{8})(.{50})(.{4})(.{4})(.{5})(.{2})(.{2})(.{2})  (.{5})  (.{4})(.{3})/;

Since in php, regular expression are a pain to do, I decided to use a substring(currentLine, index of first char, length of string). This took some fine tuning since my reading of regular expressions isn't the best. From the information extracted, the script would then translate some of it into what the Create Student needed. $degree would be used to determine class (bs, bhs => ugrad AND phd, ms, men => grad AND all else => class). All trailing and leading whitespace would be removed. Then their major would be translated into the department they belonged to (CS => cs, ME EN => me, etc). From there, all the information would be put into an array to be processed by the Create Student algorithm. 


        $MAJOR = array(

            'BIOE' => 'be',

            'CECS' => 'ce',

            'CFEN' => 'ch',

            'CHEN' => 'ch',
            'CPSC' => 'cs',
            'CVEN' => 'cv',
            'MEEN' => 'me',
            'ECE' => 'ee',
            'MSE' => 'ma'
        );

        $COE = array(
            'CS   ' => 'cs',
            'ME EN' => 'me',
            'ECE  ' => 'ee',
            'CHFEN' => 'ch',
            'BIOEN' => 'be',
            'CVEEN' => 'cv',
            'MSE  ' => 'ma',
            'CH EN' => 'ch'
        );

        $uid = trim(substr($line, 0, 8));
        $fullname = trim(substr($line, 8, 50));
        $major = trim(substr($line, 63, 4));
        $degree = trim(substr($line, 67, 5));
        $coursedept = substr($line, 79, 5);
        $coursenum = trim(substr($line, 87, 4));
        $coursesect = trim(substr($line, 91, 3));

        $name = explode(',', $fullname);
        $name1 = explode(' ', $name[1]);

        $last = $name[0];
        $first = $name1[0];
        $middle = count($name1) == 2 ? $name1[1] : '';

        $first = ucfirst($first);
        $middle = ucfirst($middle);
        $last = ucfirst($last);

        if(array_key_exists($major, $MAJOR))
        {
            if($degree === "BS" || $degree === "BHS" || $degree === "MIN")
            {
                $rank = 'ugrad';
            }
            else if($degree === "MEN" || $degree === "MS" || $degree === "PHD")
            {
                $rank = 'grad';
            }
            else
            {
                $rank = 'class';
            }
            $dept = $MAJOR[$major];
        }
        else
        {
            $dept = $COE[$coursedept];
            $rank = 'class';
        }
        
        $student = array(
            'First' => $first,
            'Middle' => $middle,
            'Last' => $last,
            'Uid' => $uid,
            'Dept' => $dept,
            'Rank' => $rank,
            'CourseNumber' => $coursenum,
            'CourseSect' => $coursesect,
            'CourseDept' => $coursedept
        );
        
        return $student;

Create Student:

This algorithm was originally created by a coworker who created this so that an oper can create an account for a student who did not get one in this process. While his algorithm was good, I had to do a lot of work to make it work for my code. For instance, he used a framework made by code igniter. I tried to get it to work at first, but there were a lot of errors. I gave up and made the code from scratch. This was a long process, but I was able to get it to work at the end. 

The first thing the code does is to create the attributes for the user to insert on LDAP (Linux side). The attribute is using the information extracted from the CSV Parser and some new information using it such as UID (user id for chmod), GID (group id for chgrp), a username, location of their home directory, and a random password that is encrypted. 

The username is generated by using a combination of the users full name (usually the first name with last initial, or first initial with last name). It will test each try and go on to the next if the suggested username is already taken. It will then see if the user already exists. If the user already exists, then the code will terminate, tell the user, and go on to the next student. 

If the student is valid, then it will connect to ldap and add the student using ldap_add. If at any point, ldap connection or ldap_add should fail, it will tell the user and go on to the next student. 

When the student has been added to LDAP, it will then create the student on Active Directory LDAP on the windows side. This required a third party library (ADLDAP).

AD LDAP:
Author Scott Barnett, Richard Hyland
Copyright (c) 2006-2010 Scott Barnett, Richard Hyland
License http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html LGPLv2.1
Revision $Revision: 91 $
Version 3.3.2
Link http://adldap.sourceforge.net/

In order to create an account on the windows side, this required me to redo the attributes and add a few categories. These changes were to revamp where the home directory was located on the windows server, add a description (their uid), create .win_profile path, and tell which OU the student belonged to. 

After that, AD LDAP was able to add the student to the server and add them to the ENG Student group. Of course if any of that failed, then the code will tell the user and go on to the next student.

After creating the account on both the windows and linux side of LDAP, the script will then create the students home directory where they will have access to all their work. This is a very simple script that calls another unix script that does all the dirty work. 

      if(preg_match('/[^a-z]/', $user))
            {
                throw new Exception("Illegal Username: " . $user);
            }
            $cmd = "$sudo $homedir " . "$user $uid $gid" . ' 2>&1';
            exec($cmd, $output, $returncode);
            //var_dump($output);
            //var_dump($returncode);
            if($returncode != 0)
            {
                throw new Exception("Could nt create homedir: " . implode("\n", $output));
            }
            return true;

That is all the Create Student account needs to do, the student is ready to use his/her account. The scrip then returns the users password and username for the Card Generation algorithm.

Card Generator:

After the student has been created, then the array that was returned by the Create Student algorithm is added to the $student array that was generated by the CSV Parser. The student is added to a 2D array where the row are Class numbers and the column is the Student's uid where the $student array is stored at that location.

$course = $student['CourseNumber'] . "." . $student['CourseSect'];
     
        if(!array_key_exists($course, $courses))
        {
            $courses[$course] = array();
            $courses[$course][$student['Uid']] = $student;
        }
        else
        {
            $courses[$course][$student['Uid']] = $student;
        }

When the parser is finished with each .csv file. The 2D array is then parsed over in a double array where each class and the students in each class is added to a txt file with the class name one it. That file is then temporarily stored in the cards folder in the server.


foreach($courses as $course)
        {
            $tmp = array_values($course);
            $title = $tmp[0]['CourseDept'];
            $courseNum = $tmp[0]['CourseNumber'] . "." . $tmp[0]['CourseSect'];
         
            $ourFileName = "cards/accounts." . $title . "." . $courseNum . ".txt";
            $file = fopen($ourFileName, 'w')
                    or die("Can't open file " . $ourFileName);
         
            foreach($course as $student)
            {
                $fullname = $student['Last'] . ", " . $student['First'] . " " . $student['Middle'];
                $lineBreak = '-----------------------------';
             
                fwrite($file, $lineBreak . "\n" );
                fwrite($file, $fullname . "   CADE Lab Login: ". $student['Username'] . "   Password: " . $student['Password'] . "\n");
                fwrite($file, "\n");
                fwrite($file, $student['Uid'] . "          To change your lab password, type 'passwd' at a UNIX prompt.\n");
                fwrite($file, "                  Use 'remote_passwd' to set your password for remote services,\n");
                fwrite($file, "                  such as POP mail or FTP.\n");
                fwrite($file, $lineBreak . "\n");
                fwrite($file, "\n");
            }
         
            fclose($file);
        }
     
        Zip_And_Download();



function Zip_And_Download()
{
    exec("zip -r cards/cards cards/");     
    echo "<a href='download.php'>Download cards.zip</a><br />";
}


After each file has been made, then the user can download each file in a .zip file; however, if they were to refresh the page or clear it, then the cards folder would be wiped clean and could never be gotten again (excluding .snapshot) because the script won't add the same students twice once they've been created.

Downloading a zip file is a lot simpler than it seems.


    header('Content-Type: application/zip');
    header('Content-Disposition: attachment; filename=cards.zip');
    readfile("/home/kray/public_html/UserCreateTemp/cards/cards.zip");


That is all for the CSV parser. I couldn't post as much of the code as I would have liked to due to security and me liking my job. This concludes the Summer Projects series. I am now in full development in my Game Dev class and I will be focusing on that for this semester.

No comments:

Post a Comment