How to Handle Errors in Using pickle
pickle
is a python module for object serialization. Using this we can save and load python objects as binary file.
Working with the concept of object oriented programming, it is natural to organize related data and functions into custom class and handle objects of the class. At some point it dawns to you that it must be efficient to store some object as binary file and load when you need it, then create the object every time you need it. pickle
is the strongest solution in handing such needs.
ModuleNotFoundError in Working with pickle
When I was a newcomer to python and pickle
, I faced a tragedic ModuleNotFoundError
. I had been happy to use my custom class (call it your_module
) objects stored using pickle
, but someday out of nowhere the file started to fail to be read:
1
ModuleNotFoundError: No module named 'your_module'
Starting from next section I will exemplify why such error occurs and how I could resolve it.
Illustration with Example Project
Let me explain with below custom project structure.
1
2
3
4
5
package
├─ __init__.py
├─ dog.py
├─ dogpickler.py
└─ dogunpickler.py
A custom class Dog
is defined in dog.py. A simple class, which can bark
.
1
2
3
4
5
class Dog:
def __init__(self):
self.bark_sound = "Waloo!"
def bark(self):
print(self.bark_sound)
dogpickler.py script create an instance of Dog
, let it bark
once, and store that Dog
as a binary file using pickle
. The binary file is named pickled-dog
, with no extensions given. We can call this process of saving binary file ‘pickling’, or can say that I pickled my my_dog
instance and saved it as pickled-dog
file.
1
2
3
4
5
6
7
8
9
10
import pickle
from dog import Dog
my_dog = Dog() # Create a 'Dog' instance
my_dog.bark() # Bark!
with open('pickled-dog', 'wb') as f:
pickle.dump(my_dog, f)
# Excution output
>>>Waloo!
After running dogpickler.py
, you can find new file pickled-dog
is created.
1
2
3
4
5
6
package
├─ __init__.py
├─ dog.py
├─ dogpickler.py
├─ dogunpickler.py
└─ pickled-dog
Now move on to another script dogunpickler.py. This class also makes my_dog
object, but by loading data from pickled-dog
rather than creating new instance.
1
2
3
4
5
6
7
8
9
10
import pickle
# Now we load 'pickled-dog', with read-only option rb.
with open('pickled-dog', 'rb') as f:
my_dog = pickle.load(f)
my_dog.bark()
# Execution output
>>>Waloo!
ModuleNotFoundError in Using pickle.load()
Changed Module Name
It is important to know that when pickle
tries to restore your pickled object from saved binary file, it refers to sys.module
to find out the module name with which the pickled object was created.
Keeping above Dog project in mind, let me exemplify a case where we can face an error. In below case, suppose that I renamed dog.py
file to animal.py
after I pickled my_dog
into pickled-dog
binary file.
In this scenario, the result is ModuleNotFoundError
. This is because pickle
cannot find the module dog.py
, which was originally used in creating pickled-dog
.
1
2
3
4
5
6
7
8
9
package
├─ __init__.py
├─ animal.py # dog -> animal로 변경
├─ dogpickler.py
├─ dogunpickler.py
└─ pickled-dog
# dogunpickler 실행 시:
>>>ModuleNotFoundError: No module named 'dog'
The best way to mitigate such problem is to let your modules undergo minimal changes if you are pickling objects from custom classes defined in the modules. However, if renaming is inevitable, there is a workaround to load your (which became obsolete now) pickled-dog
file.
The solution is simple in words - you can tell sys
module to redirect dog
module to animal
module.
To do this, we change dogunpickler.py
as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import animal, sys # imported animal and sys module additionally
import pickle
# redirect 'dog' to 'animal'
sys.module['dog'] = animal
with open('pickled-dog', 'rb') as f:
my_dog = pickle.load(f)
my_dog.bark()
# execution output
>>>Waloo!
Changed Class Structure
In above example, dog
module was renamed but the Dog
class defined in the module was intact. Then, what would happen if module name is still dog
but Dog
class undergoes changes?
Change in Existing Instance Variables
For example, suppose that we modify dog.py
as follows, after created pickled-dog
.
1
2
3
4
5
6
class Dog:
def __init__(self):
# Dog got mad
self.bark_sound = "Grrr..." # originally this was 'waloo'
def bark(self):
print(self.bark_sound)
Then what happens if we try execute dogunpickler.py
? Even though Dog
class has changed, pickled-dog
had been created based on ‘waloo-barking-dog’.
In this case, no error occurs. But unpickled (loaded) dog barks ‘waloo’, not ‘Grrr…‘.
1
2
# execution output
>>>Waloo!
A thing is that even we can rename
the instance variable name bark_sound
. For example, we can rename it barkSound
and dogunpickler.py
can still be executed without error. But when we unpickle pickled-dog
, loaded object does not have barkSound
instance variable. pickled-dog
does know nothing about changes made to Dog
class and does not try to reorganize itself by determining the changes made.
New Instance Variable Added
How about the case where we add new instance variable to Dog
class?
1
2
3
4
5
6
7
8
class Dog:
def __init__(self):
# Dog got mad
self.bark_sound = "Grrr..."
# Now Dog has hair color
self.color = "brown"
def bark(self):
print(self.bark_sound)
Alike to above example, no ModuleNotFoundError
arises. You can unpickle pickled-dog
with dogunpickler.py
and our saved dog barks waloo. But AttributeError
occurs if we try to access color
variable.
New Method Added
Now let’s try adding new method for Dog
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
class Dog:
def __init__(self):
# Dog got mad
self.bark_sound = "Grrr..."
# Now Dog has hair color
self.color = "brown"
def bark(self):
print(self.bark_sound)
# Dog can bark twice now
def bark_twice(self):
for i in range(2): self.bark()
# They even can tell their hair color
def show_color(self):
print("I am a f{self.color} dog!")
Now, to test new methods, let’s change dogunpickler.py
as follows and try executing (do not execute dogpickler.py
).
1
2
3
4
5
6
7
import pickle
with open('pickled-dog', 'rb') as f:
my_dog = pickle.load(f)
my_dog.bark_twice()
my_dog.show_color()
Maybe one would expect below output when dogunpickler.py
is executed:
1
2
3
4
5
6
Dog().bark_twice()
>> Grrr...
>> Grrr...
Dog().show_color()
>>> I am a brown dog!
But, the result is AttributeError
.
1
2
3
4
# execution output
>>>Waloo!
>>>Waloo!
>>>AttributeError: 'Dog' object has no attribute 'show_color'
So, this is the summary:
- If you face
ModuleNotFoundError
in unpickling (usingpickle.load
), you can check if you made changes in your module names. If it is the case, you can updatesys.module
variable make redirection from your old module name to new module name. - Successful unpickling without
ModuleNotFoundError
cannot guarantee that your script based on unpickled object will be perfectly fine. If you have made changes to your custom class, instances of which you have been pickling,AttributeError
can occur because unpickled object does not know the history of changes made to your original module and classes.
Using pickle Well in Your Project
With these observations, now I believe that it is not a good idea to use pickle directly to save a custom class instance. I always try to ensure my class instances are composed of more universal and basic classes (ex. Dictionary) if I am to pickle them. For example, rather than Dog
class having self.bark_sound
defined, I try to make a dictionary and ‘bark_sound’ key lies there. Maybe some better method comes to you when you work on your project with pickle
.