Spider

A web crawler and scraper, building blocks for data curation workloads.

Concurrent
Streaming
Decentralization
Headless Chrome Rendering
HTTP Proxies
Cron Jobs
Subscriptions
Smart Mode
Anti-Bot mitigation
Privacy and Efficiency through Ad, Analytics, and Custom Tiered Network Blocking
Blacklisting, Whitelisting, and Budgeting Depth
Dynamic AI Prompt Scripting Headless with Step Caching
CSS/Xpath Scraping with spider_utils
HTML to markdown, text, and etc transformations with spider_transformations
Changelog

Getting Started

The simplest way to get started is to use the Spider Cloud hosted service. View the spider or spider_cli directory for local installations. You can also use spider with Node.js using spider-nodejs and Python using spider-py.

Benchmarks

See BENCHMARKS.

Examples

See EXAMPLES.

License

This project is licensed under the MIT license.

Contributing

See CONTRIBUTING.

Name		Name	Last commit message	Last commit date
Latest commit History 969 Commits
.github/workflows		.github/workflows
benches		benches
examples		examples
spider		spider
spider_chrome		spider_chrome
spider_cli		spider_cli
spider_transformations		spider_transformations
spider_utils		spider_utils
spider_worker		spider_worker
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spider

Getting Started

Benchmarks

Examples

License

Contributing

About

Releases 116

Packages

Contributors 14

Languages

License

spider-rs/spider

Folders and files

Latest commit

History

Repository files navigation

Spider

Getting Started

Benchmarks

Examples

License

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 116

Packages 0

Contributors 14

Languages

Packages