Cobb Tracker

Tool that pulls meeting minutes for the local governments in Cobb County.

The Problem

Each city in Cobb County, and the county itself, stores the minutes they hold for meetings in different ways and the data is also presented very differently. Almost every city in the county uses a different provider for the websites where they present this information to the public.

This makes it hard for average citizens and journalists to look through all of this info. There's no way to search for things you care about or follow up on events that took place without just flat out remembering or manually searching by looking through a bunch of PDFs that may or may not have searchable text.

A Solution

This tool allows us to take a bunch of PDF file links, feed them into a "downloader" and then convert those PDFs into text that are put into an sqlite3 database. The design allows websites to be pluggable.

Take for example Marietta's website and Smyrna's website. They both are run on completely different platforms, but if we can get all of the links to all of the minutes files they have available we can treat them like the same site.

The PDFs are then converted into plain text with Tesseract OCR, and stored in an SQLite database. You can then search through this database with key terms that appear in the minutes text, filter for certain dates, municipalities and types of meetings.

We're currently using Datasette to present this information, you can see it here

Requirements

Python 3.11
Tesseract
Linux or MacOS (May work in WSL).
Docker

Installation

git clone https://github.com/ABetterCobb/cobb-tracker.git
cd cobb-tracker
poetry install && poetry shell

Usage

usage: cobb-tracker [-h] [-m MUNICIPALITY] [-p] [-a] [-f] [-v]

options:
  -h, --help            show this help message and exit
  -m MUNICIPALITY, --municipality MUNICIPALITY
                        The city that you want to download minutes for. This
                        includes Cobb.
  -p, --push-to-database
                        The existing minutes that you have will be converted
                        to text and pushed to a database
  -a, --pull-all-cities
                        All cities will have their minutes downloaded
  -f, --force           Force rewriting of minutes files that already exist
  -v, --verbose         More information will be printed

Current Limitations

This program is in early stages, there are a few things that are not yet implemented or may never be.

Laserfische WebLink

Laserfische WebLink doesn't have permanent links to files, and instead download links are generated upon user request. In order for Laserfische to be scraped we essentially need to implemented a way to "walk down" the psuedo file system they have with a persistent user session.

PDFs need to have proper metadata associated with them

If for a given PDF and there is no date data, no meeting data, etc, you will have to make up data or the PDFs will not be unique when they are written to the filesystem.

Authors

Sam Foster
Tyler Bigler

License

cobb-tracker is licensed under GPL 3.0.

Acknowledgments

Philip James for presenting the core idea in this video

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
.github/workflows		.github/workflows
docker		docker
src/cobb_tracker		src/cobb_tracker
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cobb Tracker

The Problem

A Solution

Requirements

Installation

Usage

Current Limitations

Laserfische WebLink

PDFs need to have proper metadata associated with them

Authors

License

Acknowledgments

About

Releases

Packages

Languages

License

tyler-8/cobb-tracker

Folders and files

Latest commit

History

Repository files navigation

Cobb Tracker

The Problem

A Solution

Requirements

Installation

Usage

Current Limitations

Laserfische WebLink

PDFs need to have proper metadata associated with them

Authors

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages