Pause and restart a python script using Pickle

You’ve just quickly crafted a python script, and you ready to let it run the whole night while you sleep. Not long after you’ve fallen asleep, an error happens and the script fails. Turns out, you’ll have to wait next night to run that script…

Here is a quick way to save a script state and resume it roughly where it stopped. The idea is to use a serialization library to save you variables’ states when an error happen, and unserialize them when the script starts.

This doesn’t prevent errors, but it’ll save most of the work done. Here is how I’m using it.

Declaring and saving variables

First, when I declare a new variable, I first check wether a serialized backup exists using the makevar function :

# instead of:
to_visit = {}

# use:
to_visit = makevar('to_visit', {})

To serialize the variable when an error occurs, I use the savevar function :

savevar('to_visit', to_visit)

Using the pickle library

The first step is to install the pickle library.

pip install pickle

Here are my functions to quickly serialize and unserialize a variable.

import pickle
import os.path

def makevar(varname, value):
    filename = varname + '.obj'
    if os.path.exists(filename):
        with open(filename, 'rb') as file:
            return pickle.load(file)
        return value

def savevar(varname, value):
    filename = varname + '.obj'
    with open(filename, 'wb') as file:
        pickle.dump(value, file)

Save program state on error

Eventually, here’s how I generally use it :

    # my program logic here....

where the save_state function calls savevar for each variable of interest.

What I like with the finally block is that it works even when you interrupt the script using ctrl+c, or sending an interrupt signal in a jupyter notebook. So you can stop the script, make a quickfix, and resume its execution without remorse.

Apply this technique in your next python script for scraping!