Skip to content

Latest commit

 

History

History
241 lines (138 loc) · 10.2 KB

README.md

File metadata and controls

241 lines (138 loc) · 10.2 KB

linkedin2md

This repository illustrates a scheme for building a simple resume/profile database based on a minimalist approach and storing structured profile information in markdown format.

Basic concepts:

Resumes/profiles come from various sources such as LinkedIn or some job boards. The input data can come in various formats, and in order to somehow unify this, I took two formats as a basis.

  • The first format is the "original" PDF. For example, taken from LinkedIn.

Go to people LinkedIn Profile, Diana Ponomareva profile for example:

alt text

and save it to original PDF:

alt text

  • The second format is a JSON based on jsonresume unified schema.

To get from LinkedIn, I use the LinkedIn Profile to JSON Resume Browser Tool by Joshua Tzucker as a various and simple scraper.

alt text

And for Diana:

alt text

Additionally, using this extension, you can get a person's contact card in VCF format.

And you'll see saved data in my downloads folder:

alt text

Next, I use some utilities to generate a unified MD and PDF stored files.

The solution is very simple: HackMyResume utility create polished resumes and CVs in multiple formats from your command line or shell. Author in clean Markdown and JSON, export to Word, HTML, PDF, LaTeX, plain text, and other arbitrary formats. Fight the power, save trees. Compatible with FRESH and JRS resumes. HackMyResume can convert between the FRESH and JSON Resume formats.

alt text

Run:

bash install.sh

to install bash examples scripts structure. The script uses jq command line JSON processor for JSON parse, prepares the structure for parsing and installs all necessary dependencies.

You can see a project structure:

alt text

with input data:

alt text

and output structure:

alt text

And then run magic shell with yout path parameters:

sudo bash update.sh "/Volumes/Transcend/projects/linkedin2md/input" "/Volumes/Transcend/projects/linkedin2md/output" "/Volumes/Transcend/projects/linkedin2md/backup"

You can see all profiles transformation process in your Terminal:

alt text

and created Diana's profile folder in /output/resume/

alt text

Diana's profile contains some files:

alt text

  • Diana_Ponomareva.summary.md - summary generated markdown file for manually added information
  • Diana_Ponomareva.md - resume generated in markdown format
  • Diana_Ponomareva.pdf - resume generated in PDF format
  • Diana_Ponomareva.original.pdf - original resume PDF file
  • Diana_ponomareva.vcf - contact VCF file
  • Diana_Ponomareva.resume.json - source JSON file
  • artifacts - folder for other manually added artifacts

This is how markdown genereted files looks in my Obsidian:

Diana_Ponomareva.md

alt text

Diana_Ponomareva.summary.md

alt text

And profiles list file (profiles.md)

alt text

And also

Diana_Ponomareva.vcf

alt text

Diana_Ponomareva.pdf

alt text

Diana_Ponomareva.original.pdf

alt text

profiles.md file contains all profiles short data and links. You can run command for rebuild it with tour specific output path:

sudo bash build.sh "/Volumes/Transcend/projects/linkedin2md/output"

alt text

And so, run it again for more profiles:

alt text

Results in profiles folder:

alt text

profiles.md:

alt text

and resumes in markdown:

alt text

alt text

*Note: In the examples above, I use the sudo command because in my structure the project is in a secure container and access to it is restricted. In general, the use of sudo is not required.

alt text

Why markdown and resume json format?

  • Because it's simple and flexible
  • Because it's Open Source and open format
  • Because it's well tolerated
  • Because it's not cloud and ATS
  • Because it's imported to other systems, etc. ATS
  • Because it's compatible with scrapers and services like Nubela and others
  • Because it's fully markdown and Obsidian (and other markdown processors) compatible

Data Shaping Approach:

In the same way, you can get JSON resume data from other sources. For example, from various job boards, transform them and supplement the database.

The json resume format is compatible with scraping services such as Nubela, and you can build an automated scraping system. I posted an example of how this is done here: LinkedIn Scrapper worked from Nubela Service, written on Python

People search and building mind maps:

Storing a resume in the markdown format gives advantages when exporting it to relational and document-oriented databases, and also allows you to apply search solutions and solutions that implement link building schemes, in particular graph view and mind map systems.

Graph view:

alt text

And BIGGEST graph view:

alt text

You can read more about these developments here:

https://help.obsidian.md/Plugins/Graph+view

And how some fully structural system viewed:

alt text

Real use example:

I am currently experimenting with GraphView mappings on a base of over 50,000 non-abstracts real people resume loaded from one of more ATS systems and will post the results of these experiments.

In this concept, I make extensive use of simple search approaches in markdown with Linux commands (sed, sed, awk and other Linux commands and specific utilities for search data in PDF, DOCX and other file formats), because it's very simple and useful.

Restrictions:

*General application:

I haven't tested any of the above on Windows systems as I only use Linux systems. In general, it is assumed that the readers of this manual have the skills of advanced Linux users.

*Application area:

This development is not commercial or industrial and is intended primarily for the rapid handling of large datasets of people profiles, based on the concepts of working with markdown, without resorting to commercial systems, such as ATS and relational databases.

*About LinkedIn scraping:

LinkedIn is quite difficult to scrape, and direct profile scraping entails a number of problems, such as account bans, captcha, counteraction to security systems, and others. For scraping LinkedIn profiles I recommend using specialized services such as Nubela.

*About profile photo saving:

This has come up before and I did not categorize it as a bug that can be "fixed", and I'm still reluctant to do so. The URL that LinkedIn provides for the profile picture is not actually a permanent URL - it looks like some variant of a temporary signed URL. If I make a guess that the e={int} portion of that URL corresponds with expiration, and the integer is a unix timestamp, then it looks like that image should expire in a few months (November). I'm also not sure how consistent LinkedIn is in setting the future expiration - it almost looks like they might batch generate these instead of dynamically generating them on-demand, as loading up my own profile today showed that same e={int} value. With the knowledge that the profile picture URL might expire at a time outside my control, I'm still torn on whether or not this would be good to include in the JSON output. On the one hand, resumes are often used in the same time period they are generated, so maybe expiring images aren't that big of a deal. On the other hand, I don't want to get blamed if someone uses the image URL as part of their job application or website and it suddenly fails to load. The easiest solution here, which I mentioned the last time this came up, is for everyone to just not rely on LinkedIn for image hosting; upload your profile picture to something like imgur and then manually fill in the basics.image field.

TODOs:

  • Fix bugs
  • Add new cosmetics features
  • Automatically publish resumes in Telegram channel(s)
  • Updating resumes and tags
  • Photo updating
  • Show ATS integrations
  • Show how to really parse and cracks some job boards
  • Show big graph view and mind maps

UPD:

In the latest implementation of scripts, I added DEBUG mode support, verification and analysis of resume errors.

alt text

alt text

alt text

Cooperation:

I am looking into collaborating to transform and enrich your resume databases. I have experience in this field for about 7 years. You can read more about my experience in my GitHub profile: bormaxi8080

Donates:

I will be grateful for donations to this project:

Etherium: 0xe29685d6f0032bccac08b0e745a1a69ef9803973

*Note: Diana gave her permission to show her public LinkedIn profile in this project )