Category Archives: Programming

Codebook

Lately I’ve been doing a lot of coding competitions, google code jam, Project Euler, and at my university (UTEP). I’m starting a coding book, which is a collection of commonly used simple implementations of algorithms that are useful in coding competitions. Hopefully this can be a collaborative effort with the UTEP team.

Others are free to use it and contribute!

https://github.com/awknaust/codebook

Tracking Dynamic IP

So it is nice to be able to remotely connect to your home computer. Many (included my) ISPs, give dynamic IP addresses, and although these don’t change often, they do change and are a pain to remember anyway. My solution is to have my computer e-mail me whenever it discovers it has a new IP address! You can easily configure exim4 to send emails from a linux machine through gmail, a great guide is here.

Once you have that you just need a little script, mine does this with python by asking http://icanhazip.com. and writing a temporary file with the old ip in /var/tmp. It only sends you a message when the IP changes, no need to spam your email box!

 

#!/usr/bin/python

''' A super simple script to email changes in a dynamic IP to
an email address. If you drop this in /etc/cron.* , it will periodically
poll a website to check if your iP has changed, and if it has it well
send you an e-mail with the new one!

Alex Knaust - 2/28/2013
'''

import os, smtplib, urllib2, re
from email.mime.text import MIMEText

# file to save the last IP address in
IPLOC = '/var/tmp/ip'

# site from which to get IP, if you change this you might need to alter get_live_ip
IPSITE = 'http://icanhazip.com'

# Message to send yourself with the IP
MAILSTR = '''Hi, My new IP address is {0}\n'''

# Address from which the email is being sent (this should probably match your SMTP setup
FROMADDR = 'you@host.com'

# Address to which to send the message
TOADDR = 'you@host.com'


#####################################################
############### CODE ################################
#####################################################
def save_ip(ipstr):
    '''Save the Ip to IPLOC, return true if successful, false otherwise'''
    try:
        with open(IPLOC, 'w') as f:
            f.write(ipstr)
    except:
        return False
    return True
    
def get_old_ip():
    '''Gets the old IP by reading IPLOC, returns None if it could not be read'''
    try:
        with open(IPLOC, 'r') as f:
            ip = f.read()
        return ip.strip()
    except IOError:
        return None
    
def get_live_ip():
    '''Return the IP address of this computer by polling IPSITE
    
    returns None if not successful
    '''
    try:
        f = urllib2.urlopen(IPSITE)
        return f.read().strip()
    except urllib2.URLError:
        return None

def send_mail(ipstr):
    '''Tries to send an email about the change of IP to the TOADDR
    
    returns true if successful, false otherwise
    '''
    msg = MIMEText(MAILSTR.format(ipstr))
    msg['Subject'] = 'IP Address change'
    msg['From'] = FROMADDR
    msg['To'] = TOADDR
    
    try:
        s = smtplib.SMTP('localhost')
        s.sendmail(FROMADDR, [TOADDR, ], msg.as_string())
        s.quit()
    except:
        return False
    return True

def main():
    oldip = get_old_ip()
    newip = get_live_ip()
    
    if newip is not None:
        if oldip is not None:
            if newip != oldip:
                send_mail(newip)
                save_ip(newip)
        else:
            send_mail(newip)
            save_ip(newip)

if __name__ == '__main__':
    main()

Yet another backup script

I’m publishing a simple script I wrote a year or so ago to do automatic mysql backups with 7zip on a linux server. It is inspired by the original mysqlbackup script, in that it can rotate the databases on a daily/weekly etc. basis, but it also provides stuff like exporting to an ftp server, or simpler backups. It isn’t completely polished, so any forks/tips are welcome!

I’ve made a permanent page here with a download link and some more about it.

#!/usr/bin/python

#################################
# created by Alex Knaust 06/2011
#################################

# mysql login credentials
MYSQL_USERNAME = ''
MYSQL_PASSWORD = ''
MYSQL_HOST = 'localhost'
MYSQL_PORT = 3306

# directory that directories (daily, weekly, monthly, etc.) will be created in
BACKUP_DIR = '/backup/mysql'

# filename pattern to archive files as, will be formatted with datetime.strftime
FILENAME_PATTERN='%d-%m-%Y_database-dump.7z'

# arguments for the p7zip program
P7ZIP_ARGS = '-t7z -bd -m0=LZMA2 -mx=9'

#file to write the intermediate dump to
TMPFILE = '/var/tmp/dbdump.sql'

# arguments for the mysqldump command (specify which databases here)
MYSQLDUMP_ARGS = '--all-databases --force'

# octal permissions for the database dump
PERMS = 0o640



###############################################################################
# DO NOT EDIT BELOW HERE
###############################################################################
from datetime import datetime, timedelta
import shutil, os, subprocess
from ftplib import FTP
from syslog import syslog

def dumpmysqldb(user, password, port, host, args='--all-databases --force', 
            filename=TMPFILE):
    '''Runs mysqldump to backup all databases to filename, args are passed to mysqldump
        raises an Exception if the filename does not exist afterwards
    '''
    
    # set permissions of PERMS file to be written to
    with open(filename, 'w') as f:
        pass
    os.chmod(filename, PERMS)
    
    mysqldumpcommand = '''mysqldump --user="{user}" --password="{passwd}" --host={host} --port={port} {args} > {output}'''.format(
        user=user, passwd=password, output=filename, args= args, host=host, port=port)
    
    pipe = subprocess.Popen(mysqldumpcommand, shell=True, stdout=subprocess.PIPE)
    pipe.communicate()
    
    if not os.path.isfile(filename):
        syslog("{0} doesnt seem to exist although it should".format(filename))
        raise Exception("{0} doesnt seem to exist, although it should".format(filename))


def compress_7z(path, args='-bd', outfile=None):
    '''compresses a file with p7zip, outputting as .7z file, returns the name of the
        compressed file
    '''
    if not outfile:
        outfile = os.path.join(os.path.split(path)[0], 'dump.7z')
        
    p7zipcommand = '''7z a {args} "{outfile}" "{filename}"'''.format(
            args=args, filename=path, outfile=outfile)
    pipe2 = subprocess.Popen(p7zipcommand, shell=True, stdout=subprocess.PIPE)
    pipe2.communicate()
    
    if not os.path.isfile(outfile):
        syslog("{0} was not created correctly".format(outfile))
        raise Exception("{0} was not created correctly".format(outfile))
    else:
	os.chmod(outfile, PERMS)
        return outfile


class BackupUpdater:
    '''Class that handles deleting old files'''
    def __init__(self):
        raise NotImplementedError
    
    def update(self, time, file):
        '''This will be called when a new dump is created'''
        raise NotImplementedError

    
class BasicBackupUpdater(BackupUpdater):
    '''Just saves the files in the directory given'''
    def __init__(self, directory, fileformat='%d-%m-%Y_database-dump'):
        self.directory = directory
        self.fileformat = fileformat
        if not os.path.isdir(self.directory):
            os.makedirs(self.directory)
        
    def update(self, time, file):
        newfilename = time.strftime(self.fileformat)
        shutil.move(file, os.path.join(self.directory, newfilename))


class FTPBasicBackupUpdater(BackupUpdater):
    '''Saves the file to a directory on an FTP Server'''
    def __init__(self, directory, user, passwd, host, fileformat='%d-%m-%Y_database-dump'):
        self.fileformat = fileformat
        self.FTP = FTP(host=host, user=user, passwd=passwd)
        self.ftp.cwd(directory)
        
    def update(self, time, file):
        newfilename = time.strftime(self.fileformat)
        with open(newfilename, 'rb') as tfile:
            self.FTP.storbinary("STOR " + newfilename, tfile)
        ftp.quit()

    
class AdvancedBackupUpdater(BasicBackupUpdater):
    '''Works by rotating the backups into a folder hierarchy based on their
    creation time, i.e. after 1 week the daily folder will be emptied and the
    oldest file will be moved to weekly, and the newest file will now be in daily
    , does the same for weekly, monthly, yearly.
    '''
    folders = (
            ('daily', timedelta(weeks=1)),
            ('weekly', timedelta(weeks=5)),
            ('monthly', timedelta(weeks=52)),
            ('yearly', timedelta.max), #if this program works for more than 2 million years, we're in trouble
        )
        
            
    def __init__(self, directory, fileformat):
        BasicBackupUpdater.__init__(self, directory, fileformat)


    def _getAge(self, directory):
        '''Returns a datetime object representing the oldest creation date in a directory'''
        filelist = [os.path.join(directory, f) for f in os.listdir(directory) if os.path.isfile(os.path.join(directory, f))]
        if filelist:
            filelist.sort(key= lambda f: os.path.getmtime(f))
            return datetime.fromtimestamp(os.path.getmtime((filelist[0]))), filelist[0]
        else: return datetime.now(), None


    def _updatePushing(self, time, filename, index=0):
            folder = os.path.join(self.directory, self.folders[index][0])
            delta = self.folders[index][1]
            
            if not os.path.isdir(folder):
                os.makedirs(folder)
            oldesttime, oldestfile = self._getAge(folder)
            
            #if its time to rotate, push them up with a recursive call
            if time - oldesttime > delta:
                self._updatePushing(oldesttime, os.path.join(folder, oldestfile), index+1)
                for file in [os.path.join(folder, f) for f in os.listdir(folder) if os.path.isfile(os.path.join(folder, f))]:
                    os.remove(file)
            try:
                shutil.move(filename, folder)
            except shutil.Error:
                syslog("Error : file {0} already exists".format(os.path.join(folder, filename)))
                print "Error : file {0} already exists".format(os.path.join(folder, filename))
                pass #probably already existed
    
    
    def update(self, time, filename):
        nf = os.path.join(os.path.split(filename)[0], 
            time.strftime(self.fileformat))
        shutil.move(filename, nf)
        self._updatePushing(time, nf, index=0)
        
        
if __name__=='__main__':
    print 'Dumping databases...'
    syslog('Dumping databases...')
    dumpmysqldb(user=MYSQL_USERNAME, password=MYSQL_PASSWORD,
             host=MYSQL_HOST, args=MYSQLDUMP_ARGS, port=MYSQL_PORT, filename=TMPFILE)
    
    print 'Compressing {0} with 7z...'.format(TMPFILE)
    syslog('Compressing {0} with 7z...'.format(TMPFILE))
    cmpfile = compress_7z(TMPFILE, args=P7ZIP_ARGS)
    
    print 'Updating backups with {0} in {1}'.format(cmpfile, BACKUP_DIR)
    syslog('Updating backups with {0} in {1}'.format(cmpfile, BACKUP_DIR))
    b = AdvancedBackupUpdater(BACKUP_DIR, FILENAME_PATTERN)
    b.update(datetime.now(), cmpfile)
    
    print'Success'
        

Reversing Chunks of a Linked List

One of my fellow researchers asked me this question, which he said was an interview question for amazon.

Write an algorithm to reverse blocks of length k- in a list. i.e. if k=3 and the list is originally \{0,1,2,3,4,5,6,7,8\} ; it should produce \{2,1,0,5,4,3,8,7,6\}.

There’s already a clever trick to reverse a linked list without requiring any extra space. The idea is that we just iterate through the list and reverse all the pointers. In java this would look something like this

	public static Node reverse(Node list){
		Node prev = null;
		while(list != null){
			Node next = list.next;
			list.next = prev;
			prev = list;
			list = next;
		}
		return prev;
	}

This will reverse the entire list (by going until the next link is null), so if we want to reverse blocks of length k instead, we should add an additional parameter and condition, to obtain below.

	public static Node reverse(Node list, int k){
		Node prev = null;
		while(list != null && k-- > 0){
			Node next = list.next;
			list.next = prev;
			prev = list;
			list = next;
		}
		return prev;
	}

So what happens when we call reverse on a list? Imagine we have the list 0 \rightarrow 1 \rightarrow 2 \rightarrow 3 \rightarrow 4\rightarrow \emptyset after a call to reverse with k=2 on the first node (0), the list will be \emptyset \leftarrow 0 \leftarrow 1\quad 2\rightarrow 3 \rightarrow 4, and the call returns a pointer to ‘1’. Now we see that the trick we need to overcome is losing our place in the list as we do the block-wise reversal. I did this by keeping track of basically three positions : the first element of the block you are going to reverse, the last element of this block (which is returned by the call to reverse) and the first element of the next block to be reversed.

This way, after we have reversed the first j-1 blocks, we only need to reverse block j, point the end of block j-1 to the new beginning of block j, and then update positions to go to block j+1. In code this looks like… (I handle the first block specially so we know how to return the start of the list)

	public static Node kreverse(Node list, int k){
		Node prev = list;
		int i = 0;

		//first block special case?
		while(list != null && i++ < k){
			list = list.next;
		}
		Node retval = reverse(prev, k);

		while(list != null){
			i = 0;
			Node me = list;
			while(list != null && i++ < k)
				list = list.next;
			Node end = reverse(me, k);
			prev.next = end;

			prev = me;
		}

		return retval;
	}

What is great is that we have only visited each element in the list about twice, (once in the inner while loop of kreverse, and once in reverse) and at most constant extra space. In fact, we can even avoid visiting each element twice if we are clever with the reverse method, and return multiple values (Python!!!!). So this method is O(n) in both time and complexity. One other consideration is what will happen if the length of the list is not divisible by k.

Drawing a Boundary Tag Allocator

Some code that can be used in conjunction with the malloc code here. It adds an additional function to draw what the memory sort of looks like according to the boundary-tag allocator. It also tries to make the size of the regions appropriate.

Sample Output :

************
*ssssssssss*  1048568  ( 0x239dff8 )
*          *
*          *
*          *
*          *
*pppppppppp*  55952  ( 0x22aba90 )
*ssssssssss*  55944  ( 0x22aba88 )
*//////////*
*//////////*
*//////////*
*pppppppppp*  3600  ( 0x229ee10 )
*ssssssssss*  3592  ( 0x229ee08 )
*          *
*pppppppppp*  3248  ( 0x229ecb0 )
*ssssssssss*  3240  ( 0x229eca8 )
*//////////*
*//////////*
*pppppppppp*  0  ( 0x229e000 )
************

Additional code, just put this in myAllocator.c, maybe somewhere near the bottom? It will add some warnings about unsafe casting etc.


#define FREE ' '
#define ALLOC '/'
#define PREFIX 'p'
#define SUFFIX 's'
#define EDGE '*'
#define mem0(x) (void*)(x) - (void*)arenaBegin
#define MAXBLOCKS 1000

typedef struct RLine_s{
    int lines, sz;
    BlockPrefix_t *pfx;
}RLine;

//cannot dynamically allocate memory...assumes less than 1000 blocks will be allocated
RLine lpp[MAXBLOCKS];

void printMemLine(char type, int width, void* address){
    assert(width > 2);
    putc(EDGE, stderr);
    int i;
    for(i = 0; i < width - 2; i++) putc(type, stderr);
    putc(EDGE, stderr);
    if(type == PREFIX || type == SUFFIX)
        printf("  %d  ( 0x%x )", mem0((int)address), address);
    printf("\n");
}

int compRLineLoc(void *b, void *a){
    return (void*)((RLine*)a)->pfx - (void*)((RLine*)b)->pfx;
}

int compRLineSize(void *b, void *a){
    return ((RLine*)a)->sz - ((RLine*)b)->sz;
}

/* Draws the arena vertically, with each line representing a pseudo-address
 * The prefixes and suffixes for each block are always drawn along with their
 * addresses, however the size of the regions is somewhat arbitrary. It is true,
 * however that |A| > |B| ==> lines(A) >= lines(B) where A, B are two regions
 * and lines denotes how much space is left between them.
 * 
 * The leftmost address is base-10 and 0 is at arenaBegin, the rightmost is the
 * actual address in memory. 
 * 
 * Example Output: 
 *  ************
 *  *ssssssssss*  1048568  ( 0x225aff8 )
 *  *          *
 *  *          *
 *  *pppppppppp*  3248  ( 0x215bcb0 )
 *  *ssssssssss*  3240  ( 0x215bca8 )
 *  *xxxxxxxxxx*
 *  *pppppppppp*  0  ( 0x215b000 )
 *  ************ 
*/
void visualizeMemory(){
    assert(arenaBegin); //this would be boring if you havent allocated memory
    
    int LINES = 25, WIDTH = 12, rcount = 1, i,j;
    int MAXSZ = 5, smallr;
    int lines_left = LINES - 2;
    
    //count the number of regions
    BlockPrefix_t* p = arenaBegin;
    while((p = getNextPrefix(p)) && ++rcount);
    assert(rcount < MAXBLOCKS); //We do not have enough memory to do this
    lines_left -= 2*rcount;
    
    p = arenaBegin;
    for(i = 0; i < rcount; i++){
        lpp[i].pfx = p;
        lpp[i].sz = computeUsableSpace(p);
        lpp[i].lines = 1;
        p = getNextPrefix(p);
    }
    
    /* An algorithm to make larger regions look larger */
    if (lines_left >= 0){
        qsort(lpp, rcount, sizeof(RLine), compRLineSize);
        smallr = rcount - 1;
        while(lines_left >= 0 && smallr > 0){
            for(i = 0; i < smallr && lines_left>=0; i++){
                if(lpp[i].lines < MAXSZ) lpp[i].lines++;
            }
            smallr--;
        }
    }
    
    //sort them in reverse order of their location, so we can print them backwards
    qsort(lpp, rcount, sizeof(RLine), compRLineLoc);
    
    //do the printing!!!
    printMemLine(EDGE, WIDTH, 0);
    for(i = 0; i < rcount; i++){
        p = lpp[i].pfx;
        printMemLine(SUFFIX, WIDTH, p->suffix);
        for(j = 0; j < lpp[i].lines; j++)
            printMemLine(p->allocated ? ALLOC : FREE, WIDTH, 0);
        printMemLine(PREFIX, WIDTH, p);
    }
    printMemLine(EDGE, WIDTH, 0);
}

Unhiding Subdirectories

Made a terrible mistake in easytag and this script was necessary. It will unhide all subfiles of a directory (at any subdirectory level) by simply renaming them to not start with ‘.’ (linux)

#!/usr/bin/python
import os, shutil, sys

def unhide_subfiles(directory):
    for dirpath, dirnames, filenames in os.walk(directory):
        for f in filenames:
            if f[0] == '.':

                fpath = os.path.join(dirpath, f)
                nfpath = os.path.join(dirpath, f[1:])
                shutil.move(fpath, nfpath)

if __name__=='__main__':
    if len(sys.argv) != 2:
        print('Pass directory such that all subfiles of directory should be unhidden')
        exit(1)

    unhide_subfiles(sys.argv[1])

Script to Delete Directories without Music

After moving all my music with EasyTag creator I was left with a directory structure with tons of directories which had no music (but maybe .cue or .m3u files or maybe even subdirs). If you run this on the root of your music folder, it will try to fix it!

#!/usr/bin/python
import re, os, sys, shutil
"""Script to delete all subfolders of a folder that do not
contain music (at any subdirectory level).
 - Alex Knaust 10/25/2012
"""

musicRegex = re.compile(r"\.(flac|mp3|ogg|mp4|aac|alac|ape|wav)", re.IGNORECASE)
def music_match(fname):
    """Determine if a file is music (according to musicRegex)"""
    return musicRegex.match(os.path.splitext(fname)[-1])

def has_any_music(files):
    """Check if a list of filenames contains any music files (via music_match)"""
    for f in files:
        if music_match(f):
            return True
    return False
                
def get_nomusic_directories(srcdir):
    """Returns a list of subdirectories with no music in them" (at any level)"""
    results = []
    for f in os.listdir(srcdir):
        f = os.path.join(srcdir, f)
        if os.path.isdir(f):
            hasmusic = False
            for dirpath, dirnames, filenames in os.walk(f):
                if has_any_music(filenames):
                    hasmusic = True
                    break
            
            if not hasmusic:
                results.append(f)            
    return results
    
    
def main():
    if len(sys.argv) != 2:
        print('Pass the directory to be cleaned')
        exit(1)
    
    nomusic = get_nomusic_directories(sys.argv[1])
    ans = raw_input('Will delete %d Directories OK? y/n : ' % len(nomusic)) 
    
    if not ans.lower()[0] == 'y':
        print 'OK! exiting...'
        exit(0)
    
    for directory in nomusic:
        def printfails(func, path, errinf):
            print('Failed to remove "%s"' % path)
            
        shutil.rmtree(directory, onerror=printfails)
    
if __name__ == '__main__':
    main()