Skip to content

Commit

Permalink
Merge pull request #276 from TeamMsgExtractor/next-release
Browse files Browse the repository at this point in the history
Version 0.35.0
  • Loading branch information
TheElementalOfDestruction authored Jul 11, 2022
2 parents f493f36 + 53d915e commit 0b58ed4
Show file tree
Hide file tree
Showing 51 changed files with 7,522 additions and 1,553 deletions.
1 change: 1 addition & 0 deletions .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
custom: ['https://www.buymeacoffee.com/DestructionE']
54 changes: 54 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,61 @@
**v0.35.0**
* [[TeamMsgExtractor #206](https://github.com/TeamMsgExtractor/msg-extractor/issues/206)] Implemented full support for Post objects, including the ability to save them.
* [[TeamMsgExtractor #212](https://github.com/TeamMsgExtractor/msg-extractor/issues/212)] Implemented full support for Task objects, including the ability to save them. This also includes TaskRequest objects.
* [[TeamMsgExtractor #110](https://github.com/TeamMsgExtractor/msg-extractor/issues/110)] Implemented full support for Contact objects, including the ability to save them.
* [[TeamMsgExtractor #143](https://github.com/TeamMsgExtractor/msg-extractor/issues/143)] Rewrote the system used for `Appointment` objects to include all of the objects specified in [MS-OXOCAL]. Name changed to `AppointmentMeeting`. Completed support for Appointment objects, including the ability to save them.
* [[TeamMsgExtractor #243](https://github.com/TeamMsgExtractor/msg-extractor/issues/256)] Added optional dependency `mimetype-magic` (installable using the `mime` extra) which helps to identify attachments that do not give a mime-type.
* [[TeamMsgExtractor #274](https://github.com/TeamMsgExtractor/msg-extractor/issues/274)] Apparently the properties stream can have random garbage at the end of it (outlook generate the file that showed this) so code was added to ensure it wouldn't break everything.
* [[TeamMsgExtractor #274](https://github.com/TeamMsgExtractor/msg-extractor/issues/274)] Made sure that the save function would report if it failed to find or generate a valid body. Specifically, if you were just trying to use the plain text body but it didn't exist (the stream didn't exist, not that the stream was empty) it would silently pass, which was bad behavior. Additionally, `allowFallback` will change the message to specify that current options were not usable for getting a valid body.
* [[TeamMsgExtractor #207](https://github.com/TeamMsgExtractor/msg-extractor/issues/207)] Changed behavior for max date. Apparently it looks like it is supposed to be August 31, 4500 at 11:59 PM. However, in case this needs to change, we have created a constant called `extract_msg.constants.NULL_DATE` to represent this that you can use in your code to not have to worry about changing your code if we check it.
* Moved a few more minor constants to `enums`.
* Added support for many internal data structures, specifically Entry ID structures.
* Refactored classes from `extract_msg.data` to submodule `extract_msg.structures`.
* Added `python_requires` to setup.py as I noticed that it was missing.
* Due to new saving requirements, adjusted the way header injection worked all around. Functions are now built-in to `MessageBase`. `getSaveXBody` functions have also been moved down to be defined in `MessageBase`. If the extension class needs to specify custom behaviors for creating the save bodies, these functions will need to be overridden.
* For saving, `MessageBase` (being the lowest one to currently contain bodies) has a few new properties. These properties represent the injection strings that will be injected into the bodies for the header, with an additional property to specify what properties map to what part of the format string. See `MessageBase.headerFormatProperties` for more information and an example of how to implement this in your own class.
* Injection strings in constants have been removed in favor of dynamic generation, which only creates what is needed. No you will no longer see an empty Bcc field in your messages when you save them.
* Plain text bodies now also use this injection, making it easy to change the header in all bodies by overwriting a single property that tells the program what data to put where.
* Fixed issue in encapsulated RTF header that caused the "To" field to not be present. I had to write them by hand, so it was bound to happen. They are now dynamically generated by each instance, so these fields should always appear.
* All save code has been moved down from `Message` into `MessageBase` for convenience. `Message` exists now for specific checking and for future specializations. This also means that anything that is a `MessageBase` now has the entire framework for saving built-in, with easy way to change details.
* Fixed bad property in `Contact`.
* Created save function for `Contact`. Saving, though it exists, is rather minimal and is limited to plain text and HTML.
* Significantly extended the `Contact` class's properties.
* Adjusted the naming of a few `Contact` properties to better match the microsoft names.
* `firstName` -> `givenName`.
* `lastName` -> `surname`.
* `businessPhone` -> `businessTelephoneNumber`
* Etc.
* Changed existing fax properties to give a dictionary of the properties they actually contain rather than just the number. This makes them behave like the newly added email properties.
* Fixed issues with `Task` properties being incorrect.
* Added implementation for PtypErrorCode.
* Changed behavior of `Properties.date` to *only* return the submit time. This is to ensure messages that were never sent do not have a sent date.
* Changed behavior of `MessageBase.date` to only return a send date if the message has been sent. For messages with no flags, it assumes `True`.
* Generally brought saving behavior closer to the way outlook handles it.
* Made `SignedAttachment` and `BaseAttachment` more similar by adding properties to each that are shared. `BaseAttachment` now have a `name` property and `SignedAttachment` now have `longFilename` and `shortFilename`.
* Fixed issue in HTML saving that would cause some characters to be dropped when rendering them due to how the header injection worked.
* Removed `__init__` methods from MSG classes that don't change it. This ensures notes are easily passed down.
* Correction to last comment, *one* max date was supposed to be at that date, but another max date is at a different date of the same year.
* Changed the way that `PtypTime` is handled, making it a single function in `utils`.
* Upgraded dependencies to newer versions (some really need to be newer, like `tzlocal`, for best results). Included dependencies are `beautifulsoup4` and `tzlocal`.
* Fixed an issue where zip file naming conflicts *always* failed in zip files for attachments. Both the embedded msg and the plain attachments would fail.
* *Actually* fixed the issue that would break the main loop.
* MSGFile no longer inherits directly from `OleFileIO`. While I would prefer to do that, the `__init__` method for it is rather expensive, and allowing embedded msg files to directly share each other's instances of `OleFileIO` would improve speed immensely.
* Fixed attachments not being preemptively loaded when `delayAttachments` was `False`.
* `utils.openMsg` now delays attachments while loading the file to get the class type. This means all time for attachments is cut in half as they are only ever loaded once. It also means that files that won't open due to attachments will error a little later, but this shouldn't be a problem.
* Changed named properties `Guid` back to constants. This has to do with the next entry.
* Fixed a major issue in named properties. Apparently the ID is not enough and you *must* have the GUID as well, as multiple properties can share the exact same ID.
* Added option `--no-folders` to the command line allowing you to save all attachments from a set of MSG files into a single folder.
* Added option `--skip-embedded` to the command line to skip saving embedded MSG files.
* Added option `skipEmbedded` to `Attachment.save` (and all other related save methods that call it) to skip saving an embedded MSG file.
* Changed `__main__` so that it opens the zip file there instead of relying on everything it calls to do it again and again.
* Changed the behavior of `--verbose` to allow it to be stacked for more verboseness. Specifying it once turns on warnings, twice for info, and three times for debug. Not specifying it only turns on error logging.

**v0.34.3**
* Fixed issue that may have caused other olefile types to raise the wrong type of error when passed to `openMsg`.
* Fixed issues with changelog format.
* Fixed issue that caused progress to sometimes break the main loop when a file had Unicode characters if the console it was writing to didn't support them.
* Added option to `MessageBase` (and subsequently `openMsg`) that allows you to override the code being used for deencapsulation. See `MessageBase.__init__` for details on how to create an override function.
* Added additional msg class types that I found to the list of known class types.

**v0.34.2**
* [[TeamMsgExtractor #267](https://github.com/TeamMsgExtractor/msg-extractor/issues/267)] Fixed issue that caused signed messages that were .eml files to have their data field *not* be a bytes instance. This field will now *always* be bytes. If a problem making it bytes occurs, an exception will be raised to give you brief details.
Expand Down Expand Up @@ -45,6 +98,7 @@
* Actually removed the exception I meant to remove in 0.31.0.
* Changed `Attachment.type` to an enum instead of a string. This makes it easier to see all of the possible values.
* Added option to allow attachment saving to be skipped when calling `Message.save`.
* Added `type` property to all attachment types. `AttachmentBase` uses `AttachmentType.UNKNOWN`, `BrokenAttachment` uses `AttachmentType.BROKEN`, and `UnsupportedAttachment` uses `AttachmentType.UNSUPPORTED`.

**v0.31.1**
* Updated signed attachment mimetype property from `mime` to `mimetype` to match with the regular attachment property.
Expand Down
71 changes: 36 additions & 35 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,45 +54,47 @@ refer to the usage information provided from the program's help dialog:

usage: extract_msg [-h] [--use-content-id] [--dev] [--validate] [--json] [--file-logging] [--verbose] [--log LOG] [--config CONFIGPATH] [--out OUTPATH] [--use-filename]
[--dump-stdout] [--html] [--pdf] [--wk-path WKPATH] [--wk-options [WKOPTIONS ...]] [--prepared-html] [--charset CHARSET] [--raw] [--rtf]
[--allow-fallback] [--zip ZIP] [--attachments-only] [--out-name OUTNAME | --glob] [--ignore-rtfde] [--progress]
[--allow-fallback] [--zip ZIP] [--attachments-only] [--no-folders] [--skip-embedded] [--out-name OUTNAME | --glob] [--ignore-rtfde] [--progress]
msg [msg ...]

extract_msg: Extracts emails and attachments saved in Microsoft Outlook's .msg files. https://github.com/TeamMsgExtractor/msg-extractor

positional arguments:
msg An MSG file to be parsed.
msg An MSG file to be parsed.

optional arguments:
-h, --help show this help message and exit
--use-content-id, --cid
Save attachments by their Content ID, if they have one. Useful when working with the HTML body.
--dev Changes to use developer mode. Automatically enables the --verbose flag. Takes precedence over the --validate flag.
--validate Turns on file validation mode. Turns off regular file output.
--json Changes to write output files as json.
--file-logging Enables file logging. Implies --verbose.
--verbose Turns on console logging.
--log LOG Set the path to write the file log to.
--config CONFIGPATH Set the path to load the logging config from.
--out OUTPATH Set the folder to use for the program output. (Default: Current directory)
--use-filename Sets whether the name of each output is based on the msg filename.
--dump-stdout Tells the program to dump the message body (plain text) to stdout. Overrides saving arguments.
--html Sets whether the output should be HTML. If this is not possible, will error.
--pdf Saves the body as a PDF. If this is not possible, will error.
--wk-path WKPATH Overrides the path for finding wkhtmltopdf.
--wk-options [WKOPTIONS ...]
Sets additional options to be used in wkhtmltopdf. Should be a series of options and values, replacing the - or -- in the beginning with + or ++,
respectively. For example: --wk-options "+O Landscape"
--prepared-html When used in conjunction with --html, sets whether the HTML output should be prepared for embedded attachments.
--charset CHARSET Character set to use for the prepared HTML in the added tag. (Default: utf-8)
--raw Sets whether the output should be raw. If this is not possible, will error.
--rtf Sets whether the output should be RTF. If this is not possible, will error.
--allow-fallback Tells the program to fallback to a different save type if the selected one is not possible.
--zip ZIP Path to use for saving to a zip file.
--attachments-only Specify to only save attachments from an msg file.
--out-name OUTNAME Name to be used with saving the file output. Cannot be used if you are saving more than one file.
--glob, --wildcard Interpret all paths as having wildcards. Incompatible with --out-name.
--ignore-rtfde Ignores all errors thrown from RTFDE when trying to save. Useful for allowing fallback to continue when an exception happens.
--progress Shows what file the program is currently working on during it's progress.
-h, --help show this help message and exit
--use-content-id, --cid
Save attachments by their Content ID, if they have one. Useful when working with the HTML body.
--dev Changes to use developer mode. Automatically enables the --verbose flag. Takes precedence over the --validate flag.
--validate Turns on file validation mode. Turns off regular file output.
--json Changes to write output files as json.
--file-logging Enables file logging. Implies --verbose level 1.
--verbose Turns on console logging.
--log LOG Set the path to write the file log to.
--config CONFIGPATH Set the path to load the logging config from.
--out OUTPATH Set the folder to use for the program output. (Default: Current directory)
--use-filename Sets whether the name of each output is based on the msg filename.
--dump-stdout Tells the program to dump the message body (plain text) to stdout. Overrides saving arguments.
--html Sets whether the output should be HTML. If this is not possible, will error.
--pdf Saves the body as a PDF. If this is not possible, will error.
--wk-path WKPATH Overrides the path for finding wkhtmltopdf.
--wk-options [WKOPTIONS ...]
Sets additional options to be used in wkhtmltopdf. Should be a series of options and values, replacing the - or -- in the beginning with + or ++,
respectively. For example: --wk-options "+O Landscape"
--prepared-html When used in conjunction with --html, sets whether the HTML output should be prepared for embedded attachments.
--charset CHARSET Character set to use for the prepared HTML in the added tag. (Default: utf-8)
--raw Sets whether the output should be raw. If this is not possible, will error.
--rtf Sets whether the output should be RTF. If this is not possible, will error.
--allow-fallback Tells the program to fallback to a different save type if the selected one is not possible.
--zip ZIP Path to use for saving to a zip file.
--attachments-only Specify to only save attachments from an msg file.
--no-folders When used with --attachments-only, stores everything in the location specified by --out. Incompatible with --out-name.
--skip-embedded Skips all embedded MSG files when saving attachments.
--out-name OUTNAME Name to be used with saving the file output. Cannot be used if you are saving more than one file.
--glob, --wildcard Interpret all paths as having wildcards. Incompatible with --out-name.
--ignore-rtfde Ignores all errors thrown from RTFDE when trying to save. Useful for allowing fallback to continue when an exception happens.
--progress Shows what file the program is currently working on during it's progress.

**To use this in your own script**, start by using:

Expand Down Expand Up @@ -182,7 +184,6 @@ Here is a list of things that are currently on our todo list:
* Tests (ie. unittest)
* Finish writing a usage guide
* Improve the intelligence of the saving functions
* Improve handling of named properties
* Improve README
* Create a wiki for advanced usage information

Expand Down Expand Up @@ -219,8 +220,8 @@ your access to the newest major version of extract-msg.
.. |License: GPL v3| image:: https://img.shields.io/badge/License-GPLv3-blue.svg
:target: LICENSE.txt

.. |PyPI3| image:: https://img.shields.io/badge/pypi-0.34.3-blue.svg
:target: https://pypi.org/project/extract-msg/0.34.3/
.. |PyPI3| image:: https://img.shields.io/badge/pypi-0.35.0-blue.svg
:target: https://pypi.org/project/extract-msg/0.35.0/

.. |PyPI2| image:: https://img.shields.io/badge/python-3.6+-brightgreen.svg
:target: https://www.python.org/downloads/release/python-367/
Expand Down
12 changes: 8 additions & 4 deletions extract_msg/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python
#!/usr/bin/env python3
# -*- coding: latin-1 -*-
# Date Format: YYYY-MM-DD

Expand Down Expand Up @@ -27,21 +27,25 @@
# along with this program. If not, see <http://www.gnu.org/licenses/>.

__author__ = 'Destiny Peterson & Matthew Walker'
__date__ = '2022-06-15'
__version__ = '0.34.3'
__date__ = '2022-07-11'
__version__ = '0.35.0'

import logging

from . import constants, enums
from .appointment import Appointment
from .appointment import AppointmentMeeting
from .attachment import Attachment
from .contact import Contact
from .exceptions import UnrecognizedMSGTypeError
from .meeting_forward import MeetingForwardNotification
from .meeting_request import MeetingRequest
from .meeting_response import MeetingResponse
from .message import Message
from .message_base import MessageBase
from .message_signed import MessageSigned
from .message_signed_base import MessageSignedBase
from .msg import MSGFile
from .post import Post
from .prop import createProp
from .properties import Properties
from .recipient import Recipient
Expand Down
Loading

0 comments on commit 0b58ed4

Please sign in to comment.