GitHub - eliask/pdfssa4met: PDF Structure and Syntactic Analysis for Metadata Extraction and Tagging

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
LICENSE.txt		LICENSE.txt
README.txt		README.txt
config.py		config.py
headings.py		headings.py
openCalais.py		openCalais.py
pdf2xml.py		pdf2xml.py
references.py		references.py
socialtags.py		socialtags.py
utils.py		utils.py

Repository files navigation

===============================================================================
PDF Structure and Syntactic Analysis for Metadata Extraction and Tagging
===============================================================================

PDFSSA4MET attempts to provide metadata extraction and tagging based on
structural and syntactic analysis of content in XML.

PDFSSA4MET depends on pdf2xml which is available in binary form at
https://github.com/eliask/pdf2xml/tags. The master branch of the
repository also contains the source.

PDFSSA4MET is written in Python.

The following scripts can be used directly to output results to STDOUT:
pdf2xml.py
headings.py
references.py
socialtags.py

Help, options and arguments for each are available using -h or --help option
e.g.
python references.py --help