Skip to content

Commit

Permalink
Updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
AvnerCohen committed Nov 6, 2012
1 parent 216431e commit 0771074
Show file tree
Hide file tree
Showing 2 changed files with 46 additions and 3 deletions.
2 changes: 0 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,8 @@
"dependencies": {
"ecstatic": "~0.1.6",
"node.io": "~0.4.12",
"async": "~0.1.22",
"solr": "~0.2.2",
"cheerio": "~0.10.1",
"request": "~2.11.4",
"director": "~1.1.6",
"union": "~0.3.4"
},
Expand Down
47 changes: 46 additions & 1 deletion readme.md
Original file line number Diff line number Diff line change
@@ -1 +1,46 @@
A node js + solr + request + node.io + cheerio example for web scraping, data indexing and search
Birds Search
====

This is a sample repo of a work with the following libs/frameworks:

* Node JS - Server side
* flatiron/director - Routing lib
* ecstatic - Serve static files
* cheerio - A jquery core selector implentation
* Node.IO - scraping lib (can be removed..)
* node-solr - Apache Solr client
* Apache Solr - search engine


The flow of setting up the project is as follows:

### 1 ###


node scrape

This will start a process to scrape the list of birds recorded in israeli from: http://www.israbirding.com/checklist/

Than, with a minor tweak on the bird names, it will scrape the relevant bird pages from wikipedia.

### 2 ###

solr /path/to/config

Make sure solr is up.

### 3 ###

node solr-index

process to pick up the files scraped from the web and create the solr documents

### 4 ###

node app

Start the web server

You should now be able to http://127.0.0.1:8080/ locally and play around with the data

0 comments on commit 0771074

Please sign in to comment.