unpack requires a buffer of 16 bytes #274

akr1991 · 2022-07-06T12:12:28Z

In order to get your bug addressed in a timely manner, or at all 😃, please fill out the below bug report. Please try to make it as easy as possible for us to understand what is going on. We may close out any bugs or issues without warning that are not complete or coherent.

In the bug template below anything is [square brackets] should be filled out or removed if the item doesn't apply.

Should you encounter an error that has not already been reported, please do the following when reporting it:
Bug Metadata

Version of extract_msg: [0.34.3]
Your python version: Python [3.6.7]
How did you launch extract_msg?
- My command line or
- I used the extract_msg package

Describe the bug
I am 100's of email from which I want to extract the message details. But for some of the emails I am encountering below error:
struct.error: unpack requires a buffer of 16 bytes

[ If applicable ]
**What code did you use or can we use to reproduce this error?

I ran below command from command line.
I could not share the email file to avoid any compliance issue but I can share the email size which is 55kb.
I have also observed that some email even bigger that 100kb are getting extracted successfully so I don't think it is due to email size.

python -m extract_msg "error-email.msg"

Is there a message.msg file you want to share to help us reproduce this?

Uploaded message (drag and drop on this window)
Emailed message as an attachment to admins: [Enter Subject Line Here]

Traceback

[Put your traceback here]

Screenshots

Additional context
[Add any other context about the problem here.]

The text was updated successfully, but these errors were encountered:

TheElementalOfDestruction · 2022-07-06T13:25:50Z

Yeah, this isn't related to the size of the msg file. Looks like something went wrong with the main properties stream, most likely because it's misaligned with what was expected.

Let's confirm what the problem is by simply logging the size of the properties stream before it errors so we know why it broke. Unfortunately you'll have to edit one of the files for this test, but you can revert the change immediately after. In your traceback there is a path for properties.py. On line 54 you'll see

streams = divide(self.__stream[skip:], 16)

Insert the following line after that, right before the for loop:

logger.warning(len(self.__stream))

When you run extract_msg again, this will add a log message immediately before the traceback that contains a number. Let me know what that number is.

Thanks

akr1991 · 2022-07-07T07:42:09Z

I added the _logger.warning(len(self._stream)) code right before the for loop and below is the output:
2022-07-07 13:07:17,855 - extract_msg.properties - WARNING - 628

TheElementalOfDestruction · 2022-07-07T07:44:09Z

Yep, the alignment was off. For a message it should be divisible by 16 but yours was only divisible by 8. I'll check to see if I got the details wrong but I believe my implementation was right.

TheElementalOfDestruction · 2022-07-07T08:01:29Z

Confirmed, it's parsing the header correctly. Looks like the data in your msg file is blatantly malformed, and I don't know why.

Can you tell me anything about it like what program made it and if outlook can open it properly?

akr1991 · 2022-07-07T08:06:52Z

it is a email chain conversation between our executive and client. Also it is opening properly in Outlook.

TheElementalOfDestruction · 2022-07-07T08:11:43Z

Did outlook make the file?

Anyways, you should probably just change that log to just output the stream itself instead of the size and send that. The properties stream doesn't contain sensitive info. The most is has is random date properties. I need to see what format it is using and why.

Also, to confirm, the number for the log, did that print more than once for the email or did it error immediately after the first log?

akr1991 · 2022-07-07T08:55:51Z

Yes file is made from outlook only.

PFB output of stream:
2022-07-07 14:08:48,600 - extract_msg.properties - WARNING - b'\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x01\xff\x0f\x06\x00\x00\x00H\x00\x00\x00\x00\x00\x00\x00\x02\x01\xf6\x0f\x06\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x03\x00\r4\x02\x00\x00\x008\x00\x05\x00\x00\x00\x00\x00\x03\x00\x0f4\x02\x00\x00\x008\x00\x05\x00\x00\x00\x00\x00\x0b\x00\x02\x00\x02\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x0b\x00\x1b\x0e\x06\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\xde?\x06\x00\x00\x00\xe9\xfd\x00\x00\x00\x00\x00\x00\x02\x01\x13\x10\x06\x00\x00\x00\xad\xb8\x00\x00\x00\x00\x00\x00\x0b\x00\x1f\x0e\x06\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00@\x00\x06\x0e\x06\x00\x00\x00\x00\x9f\xc7\x8e\xcdI\xd8\x01@\x009\x00\x06\x00\x00\x00\x00\x9f\xc7\x8e\xcdI\xd8\x01\x03\x00\xf4\x0f\x06\x00\x00\x00\x07\x00\x00\x00\x00\x00\x00\x00\x03\x00\xf7\x0f\x06\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x03\x00\xfe\x0f\x06\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x1f\x007\x00\x06\x00\x00\x00\xb8\x00\x00\x00\x00\x00\x00\x00\x1f\x00\x1d\x0e\x06\x00\x00\x00\xb0\x00\x00\x00\x00\x00\x00\x00\x1f\x00=\x00\x06\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\x1f\x00p\x00\x06\x00\x00\x00\xb0\x00\x00\x00\x00\x00\x00\x00@\x00\x070\x06\x00\x00\x00\x17\xb8\xdf\x97\xcdI\xd8\x01@\x00\x080\x06\x00\x00\x00\x17\xb8\xdf\x97\xcdI\xd8\x01\x03\x00&\x00\x06\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x03\x00\x17\x00\x06\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x03\x00\x08?\x06\x00\x00\x00\t\x04\x00\x00\x00\x00\x00\x00\x03\x00\x07\x0e\x06\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x80\x10\x06\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x1f\x00#@\x06\x00\x00\x004\x00\x00\x00\x00\x00\x00\x00\x1f\x008@\x06\x00\x00\x004\x00\x00\x00\x00\x00\x00\x00\x1f\x00"@\x06\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\x02\x01\x19\x0c\x06\x00\x00\x00\x8a\x00\x00\x00\x00\x00\x00\x00\x1f\x00\x1f\x0c\x06\x00\x00\x004\x00\x00\x00\x00\x00\x00\x00\x1f\x00\x1a\x0c\x06\x00\x00\x004\x00\x00\x00\x00\x00\x00\x00\x1f\x00\x1e\x0c\x06\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\x1f\x00\x04\x0e\x02\x00\x00\x00<\x00\x00\x00\x00\x00\x00\x00\x1f\x00\x03\x0e\x02\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x1f\x00\x02\x0e\x02\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x1f\x00\x1a\x00\x06\x00\x00\x00\x12\x00\x00\x00\x00\x00\x00\x00\x03\x00\x08\x0e\x06\x00\x00\x00y\xbe\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

Number for the log : it got printed once only. PFB the snapshot. I added the log for size as well as stream output.

TheElementalOfDestruction · 2022-07-07T09:06:10Z

Sorry, apparently I need to make a correction cause I screwed up. 628 actually onligns to 4 bytes, not 8 or 16, making this file weird as all heck.

I actually checked it manually, and I can see that it isn't misaligned (the properties are exactly where they should be, the header is valid, etc.) It just, for whatever reason, has 4 extra null bytes at the end. I'm looking into what might cause this and whether this is considered acceptable for the standard to know how best to handle it.

TheElementalOfDestruction · 2022-07-07T09:13:53Z

Nothing is mentioned in the docs, so my guess is that because everything is aligned properly it manages to read the things, fails to read the end, silently fails but has already parsed all the data it needs to, and as such just looks like everything is fine. So that's what I'll do: I'll add a check to make sure the size is 16, and if it isn't then I'll just pretend it doesn't exist. I'll bundle this fix into 0.35.0 which is pretty close to being done and has a lot of improvements and bug fixes.

No idea why outlook did this tbh, and I'd actually recommend you try to report it to Microsoft as it seems like a bug in outlook.

TheElementalOfDestruction · 2022-07-07T09:20:25Z

If you want to have a fix immediately, you can replace the following lines:

for st in streams:
    prop = createProp(st)
    self.__props[prop.name] = prop

With this:

for st in streams:
    if len(st) == 16:
        prop = createProp(st)
        self.__props[prop.name] = prop
    else:
        logger.warning(f'Found stream from divide that was not 16 bytes: {st}. Ignoring.')

akr1991 · 2022-07-07T16:05:53Z

Thanks for sharing the fix.
I added this fix in properties.py file. Now the error is gone but it is not extracting the email correctly.
It only extracted below 7 lines but not the final Body of email.
From:
To:
Cc:
Bcc:
Subject:
Date:
---------------

TheElementalOfDestruction · 2022-07-07T21:52:03Z

Odd. I'd like to turn on the debug logging and have you send me a copy of the set of log messages. To do this from the command line, simply add --verbose as an option somewhere and it will print out a lot more messages. Of course, I recommend you take a cursory glance at it to strip any sensitive information it might have before sending it, but I don't think there should be any. These log messages will tell me a lot about the structure of the file and what the module was trying to access that it couldn't find (as it looks like the body and header properties were not found at all).

In addition, there are 2 other things I would like to check. The first is if using a different save type other than the default (I would recommend either RTF or HTML) causes data to show up at all (just need to know if it does, not the full details of the data). The second is if you open it in outlook and go to the print preview, what fields of the header (things like To, From, etc.) appear at the top? I don't need to data in those fields, just which ones. If outlook shows a field, it means it has accessible data that the module is failing to access.

Thanks

akr1991 · 2022-07-08T06:55:51Z

Output from --verbose log

Also as requested :

I saved the file as HTML and in HTML file everything is showing up
I opened the file in outlook and went to print preview and there also everything is showing up. PFB snapshot.

In the email trail messages I see a Image link which is not showing up and appearing as below. Can this cause any issue?

TheElementalOfDestruction · 2022-07-08T07:06:18Z

Unlikely that that caused any issue. To be clear, the html contained the header that looked correct?

Additionally, to be clear, was the header section of the output from extract-msg populated with the actual data when you saved plain text?

Also, I see why your output looked so bad. Two streams were completely absent from the file: plain text body and compressed RTF body. If the plain text body isn't found, the program may try to generate it from the RTF if possible. But the RTF body wasn't there. As such, plain text just doesn't output anything.

akr1991 · 2022-07-08T07:40:14Z

Unlikely that that caused any issue. To be clear, the html contained the header that looked correct?
Yes
Additionally, to be clear, was the header section of the output from extract-msg populated with the actual data when you saved plain text?
Yes

TheElementalOfDestruction · 2022-07-08T07:43:00Z

Alright, I misunderstood the issue a bit. I thought the header just contained the field names but no data.

Yeah, just a case of no plain text body being available and no current method for extracting plain text out of the HTML. In addition, I've added a bit of code in the last commit that will improve the error handling for such a scenario where the body stream doesn't exist and can't be generated.

akr1991 · 2022-07-08T14:17:08Z

Can we expect a fix for this issue in upcoming release? or the fix would be to improve error handling?

TheElementalOfDestruction · 2022-07-08T17:23:34Z

The fix for the properties stream is there, as well as better error handling. Aside from that, nothing else. Changing things to add it once I figure out the best way will be easy, as only MessageBase actually needs to be changed and then all of the saveable classes that use a body will be updated with that code.

For better tracking, I recommend making that a specific feature request as it is separate from the original issue of this post.

TheElementalOfDestruction · 2022-07-08T23:02:47Z

Next release now contains what may be the finalized code for version 0.35.0 if you would like to try that out and see if it works properly. I think everything should be working correctly, I'm just still running some tests on it to make sure everything is in working order.

TheElementalOfDestruction · 2022-07-11T21:56:35Z

All of the fixes for this are now done in 0.35.0. I created a new feature request for generating the plain text body from the HTML body where possible, #278. Let me know if the main bug from this was not resolved.

TheElementalOfDestruction added the In Progress This issue or feature request has been confirmed or approved, respectively, and is being worked on. label Jul 7, 2022

TheElementalOfDestruction added a commit that referenced this issue Jul 7, 2022

Protection for #274

aad7a54

TheElementalOfDestruction added a commit that referenced this issue Jul 8, 2022

Another improvement based on #274. Reverted last commit.

765afb8

TheElementalOfDestruction mentioned this issue Jul 11, 2022

Version 0.35.0 #276

Merged

TheElementalOfDestruction closed this as completed Jul 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unpack requires a buffer of 16 bytes #274

unpack requires a buffer of 16 bytes #274

akr1991 commented Jul 6, 2022

TheElementalOfDestruction commented Jul 6, 2022

akr1991 commented Jul 7, 2022

TheElementalOfDestruction commented Jul 7, 2022

TheElementalOfDestruction commented Jul 7, 2022 •

edited

Loading

akr1991 commented Jul 7, 2022

TheElementalOfDestruction commented Jul 7, 2022

akr1991 commented Jul 7, 2022

TheElementalOfDestruction commented Jul 7, 2022

TheElementalOfDestruction commented Jul 7, 2022 •

edited

Loading

TheElementalOfDestruction commented Jul 7, 2022

akr1991 commented Jul 7, 2022

TheElementalOfDestruction commented Jul 7, 2022

akr1991 commented Jul 8, 2022

TheElementalOfDestruction commented Jul 8, 2022

akr1991 commented Jul 8, 2022 •

edited

Loading

TheElementalOfDestruction commented Jul 8, 2022

akr1991 commented Jul 8, 2022

TheElementalOfDestruction commented Jul 8, 2022

TheElementalOfDestruction commented Jul 8, 2022

TheElementalOfDestruction commented Jul 11, 2022

unpack requires a buffer of 16 bytes #274

unpack requires a buffer of 16 bytes #274

Comments

akr1991 commented Jul 6, 2022

TheElementalOfDestruction commented Jul 6, 2022

akr1991 commented Jul 7, 2022

TheElementalOfDestruction commented Jul 7, 2022

TheElementalOfDestruction commented Jul 7, 2022 • edited Loading

akr1991 commented Jul 7, 2022

TheElementalOfDestruction commented Jul 7, 2022

akr1991 commented Jul 7, 2022

TheElementalOfDestruction commented Jul 7, 2022

TheElementalOfDestruction commented Jul 7, 2022 • edited Loading

TheElementalOfDestruction commented Jul 7, 2022

akr1991 commented Jul 7, 2022

TheElementalOfDestruction commented Jul 7, 2022

akr1991 commented Jul 8, 2022

TheElementalOfDestruction commented Jul 8, 2022

akr1991 commented Jul 8, 2022 • edited Loading

TheElementalOfDestruction commented Jul 8, 2022

akr1991 commented Jul 8, 2022

TheElementalOfDestruction commented Jul 8, 2022

TheElementalOfDestruction commented Jul 8, 2022

TheElementalOfDestruction commented Jul 11, 2022

TheElementalOfDestruction commented Jul 7, 2022 •

edited

Loading

TheElementalOfDestruction commented Jul 7, 2022 •

edited

Loading

akr1991 commented Jul 8, 2022 •

edited

Loading