Last Updated: 20-August-2015

I wanted to keep track of a ranking position today so I repurposed some existing code I had and came up with something similar to the below - I have changed it to be more general but really the only difference is that it only printed to the screen if the ranking domain was the one I was interested in (it was a new page and I was testing how long it would take to rank, if at all) and it just checked the same query every 10 minutes or so.

So without further ado, here is the code...


from bs4 import BeautifulSoup
from urlparse import urlparse
import requests
import time

query = raw_input('Enter a Search Query... ')
pages_deep = raw_input('How many pages of results... ')


def google_rank_checker(query):
    '''
    A users is prompted to enter a search term just like they would in Google,
    along with the number of pages deep into the pagination they would like
    to go.
    '''
    
    number = 0    # keeps track of the ranking position
    
    for start in range(int(pages_deep)):
        '''
        Opens search result and iterates through the SERP pagination.
        Each iteration requests 10 results at a time.
        '''
        url = "http://www.google.com/search?q=" + query.replace(' ', '+') + "&start=" + str(start*10) \
         + '&num=10&pws=0'


        headers = {'User-agent':'Mozilla/5.0'}
        r = requests.get(url, timeout=10, headers = headers)
        data = r.text
        soup = BeautifulSoup(data)
        time.sleep(5)
        
        for result in soup.findAll('cite'):
            number += 1
            '''
            Looks for URL contained within 'cite' tags.
            '''
            domain = result.text
            domain = domain.replace('https://', '')
            domain = 'http://' + domain
            domain = urlparse(domain)
            print number, domain.netloc

Here is the use and output...


google_rank_checker(query)

# Enter a Search Query... python blogs
# How many pages of results... 2
# 1 freepythontips.wordpress.com
# 2 www.pythonblogs.com
# 3 realpython.com
# 4 jessenoller.com
# 5 inventwithpython.com
# 6 www.quora.com
# 7 wiki.python.org
# 8 www.ianbicking.org
# 9 archive.oreilly.com
# 10 www.fullstackpython.com
# 11 www.toptal.com
# 12 developers.google.com
# 13 developers.google.com
# 14 lucumr.pocoo.org
# 15 www.jeffknupp.com
# 16 www.jeffknupp.com
# 17 blogs.msdn.com
# 18 www.stat.washington.edu
# 19 altinukshini.wordpress.com
# 20 blog.getpelican.com
# 21 www.google.co.uk

I actually originally had it running with enumerate() wrapped round the soup.findAll('cite') section but I couldn't get it to work properly due to there being a different amount of results per page/query.

The whole faff with the domain is simply for a cleaner output but actually not really needed (I kept it in from the original code I had written for another task).

About the author

Image

Craig Addyman @craigaddyman
Head of Digital Marketing. Python Coder.