Last Updated: 20-August-2015

When working with large datasets I tend to use Python as it's a lot faster then excel for file manipulations and doesn't crash on large inputs.

One thing I struggled with in the past was column selection. I have spoken to lots of different people and done plenty of reading on the subject but I think I have found the most elegant solution and even better it's using the standard library - not Pandas that everyone seems to suggest.

Here it is...


import csv # no pip needed

with open("file.csv") as f:
    data = csv.reader(f)
    for line in data:
        print line[0],line[1],line[2],line[3] # specifies the column number

Beautiful, isn't it?

Now lets say you want to label each column as you print them...


import csv

with open("file.csv") as f:
    data = csv.reader(f)
    for line in data:
        print " column1: {0} , column2: {1}, column3: {2}, column4: {3}".format(line[0],line[1],line[2],line[3])

Or maybe you want to search from something within a column to check it exists?


import csv

with open("file.csv") as f:
    data = csv.reader(f)
    for line in data:
        if 'something' in line[2]:
            print 'found', line[2]

Here is a more job specific example, say if you wanted to count anchor text frequency of a backlink profile...


import csv
import collections

lis = []

with open("file.csv") as f:
    data = csv.reader(f)
    for line in data:
        lis.append(line[3])
        counter = collections.Counter(lis)

for word, value in counter.iteritems():
    print word, value

About the author

Image

Craig Addyman @craigaddyman
Head of Digital Marketing. Python Coder.