Parse Multipart Request String In Python

November 19, 2023 Post a Comment

I have a string like this '--5b34210d81fb44c5a0fdc1a1e5ce42c3\r\nContent-Disposition: form-data; name=\'author\'\r\n\r\nJohn Smith\r\n--5b34210d81fb44c5a0fdc1a1e5ce42c3\r\nContent-

Solution 1:

It can be parsed by using something like

from requests_toolbelt.multipart import decoder
multipart_string ="--ce560532019a77d83195f9e9873e16a1\r\nContent-Disposition: form-data; name=\"author\"\r\n\r\nJohn Smith\r\n--ce560532019a77d83195f9e9873e16a1\r\nContent-Disposition: form-data; name=\"file\"; filename=\"example2.txt\"\r\nContent-Type: text/plain\r\nExpires: 0\r\n\r\nHello World\r\n--ce560532019a77d83195f9e9873e16a1--\r\n"
content_type ="multipart/form-data; boundary=ce560532019a77d83195f9e9873e16a1"
decoder.MultipartDecoder(multipart_string, content_type)

Solution 2:

Expanding on sam-anthony' answer (I had to make some fixes for it to work on python 3.6.8):

from requests_toolbelt.multipart import decoder

multipart_string = b"--ce560532019a77d83195f9e9873e16a1\r\nContent-Disposition: form-data; name=\"author\"\r\n\r\nJohn Smith\r\n--ce560532019a77d83195f9e9873e16a1\r\nContent-Disposition: form-data; name=\"file\"; filename=\"example2.txt\"\r\nContent-Type: text/plain\r\nExpires: 0\r\n\r\nHello World\r\n--ce560532019a77d83195f9e9873e16a1--\r\n"
content_type ="multipart/form-data; boundary=ce560532019a77d83195f9e9873e16a1"for part in decoder.MultipartDecoder(multipart_string, content_type).parts:
  print(part.text)

JohnSmithHelloWorld

What you'd have to do is install this library through pip install requests-toolbelt --target=. and then upload it along with your lambda script

Here's a working example:

from requests_toolbelt.multipart import decoder

def lambda_handler(event, context):

    content_type_header = event['headers']['Content-Type']

    body = event["body"].encode()

    response = ''for part in decoder.MultipartDecoder(body, content_type_header).parts:
      response += part.text + "\n"return {
        'statusCode': 200,
        'body': response
    }

This should be enough for your dependencies to be recognized. If they aren't, try using the "/python/lib/python3.6/site-packages" file structure inside the zip with your python script at root"

Solution 3:

Had a bunch of weird encoding issues and also odd behavior with api gateway, originally received the body of the request at bytes and then after redeploying started to receive them as base64. Anyway this is the code that ended up working for me.

import json
import base64
import boto3
from requests_toolbelt.multipart import decoder

s3client = boto3.client("s3")
deflambda_handler(event, context):
    content_type_header = event['headers']['content-type']
    postdata = base64.b64decode(event['body']).decode('iso-8859-1')
    imgInput = ''
    lst = []
    for part in decoder.MultipartDecoder(postdata.encode('utf-8'), content_type_header).parts:
        lst.append(part.text)
    response = s3client.put_object(  Body=lst[0].encode('iso-8859-1'),  Bucket='test',    Key='mypicturefinal.jpg')
    return {'statusCode': '200','body': 'Success', 'headers': { 'Content-Type': 'text/html' }}

Solution 4:

If you want to use Python's CGI,

from cgi import parse_multipart, parse_header
from io import BytesIO

c_type, c_data = parse_header(event['headers']['Content-Type'])
assert c_type == 'multipart/form-data'
decoded_string = base64.b64decode(event['body'])
#For Python 3: these two lines of bugfixing are mandatory#see also: https://stackoverflow.com/questions/31486618/cgi-parse-multipart-function-throws-typeerror-in-python-3
c_data['boundary'] = bytes(c_data['boundary'], "utf-8")
c_data['CONTENT-LENGTH'] = event['headers']['Content-length']
form_data = parse_multipart(BytesIO(decoded_string), c_data)

for image_str in form_data['file']:
    ...

Solution 5:

If using CGI, I recommend using FieldStorage:

from cgi import FieldStorage

fs = FieldStorage(fp=event['body'], headers=event['headers'], environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':event['headers']['Content-Type'], })['file']
originalFileName = fs.filename
binaryFileData = fs.file.read()

If the event body contains multiple files:

fs = FieldStorage(fp=event['body'], headers=event['headers'], environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':event['headers']['Content-Type'], })['file']

delivers a list of FieldStorage objects. So you can do:

for f in fs:
    originalFileName = f.filename
    binaryFileData = f.file.read()

Altogether my solution for dealing with a single file as well as multiple files as well as a body containing no file and assuring that it was mutlipart/form-data:

from cgi import parse_header, FieldStorage

#see also: https://stackoverflow.com/a/56405982/10913265
c_type, c_data = parse_header(event['headers']['Content-Type'])
assert c_type == 'multipart/form-data'#see also: https://stackoverflow.com/a/38718958/10913265
fs = FieldStorage(fp=event['body'], headers=event['headers'], environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':event['headers']['Content-Type'], })['file']

#If fs contains a single file or no file: making FieldStorage object to a list, so it gets iterableifnot(type(fs) == list):
    fs = [fs]

for f in fs:
    originalFileName = f.filename
    #no file: if originalFileName == '':
        continue
    binaryFileData = f.file.read()
    #Do something with the data

Python Developer