-
Notifications
You must be signed in to change notification settings - Fork 2
/
README
116 lines (78 loc) · 4.25 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
= Introduction =
Spynner is a stateful programmatic web browser module for Python. It is based upon [http://www.qtsoftware.com/ Qt] and [http://webkit.org/ WebKit], so it supports Javascript, AJAX, and every other technology that !WebKit is able to handle (Flash, SVG, ...). Spynner takes advantage of [http://jquery.com jQuery], a powerful Javascript library that makes the interaction with pages and event simulation really easy.
Using Spynner you would able to simulate a web browser with no GUI (though a browsing window can be opened for debugging purposes), so it may be used to implement crawlers or acceptance testing tools.
= Dependencies =
* [http://www.python.org Python] (>=2.5)
* [http://www.riverbankcomputing.co.uk/software/pyqt/download PyQt] (>=4.4.3): Python wrappers for the [http://www.qtsoftware.com/ Qt] framework.
= Install =
* A [http://code.google.com/p/spynner/downloads/list release] version (recommended):
{{{
$ wget http://spynner.googlecode.com/files/spynner-VERSION.tgz
$ tar xvzf spynner-VERSION.tgz
$ cd spynner-VERSION
$ sudo python setup.py install
}}}
or
{{{
$ sudo easy_install spynner
}}}
* The bleeding edge version:
{{{
$ svn checkout http://spynner.googlecode.com/svn/trunk/ spynner-trunk
$ cd spynner-trunk
$ sudo python setup.py install
}}}
= API =
http://tokland.freehostia.com/googlecode/spynner/api/
You can generate the API locally (will create docs/api directory):
$ python setup.py gen_doc
= Usage =
A basic example:
{{{
import spynner
browser = spynner.Browser()
browser.load("http://www.wordreference.com")
browser.runjs("console.log('I can run Javascript')")
browser.runjs("console.log('I can run jQuery: ' + jQuery('a:first').attr('href'))")
browser.select("#esen")
browser.fill("input[name=enit]", "hola")
browser.click("input[name=b]")
browser.wait_page_load()
print browser.url, browser.html
browser.close()
}}}
Sometimes you'll want to see what is going on:
{{{
browser = spynner.Browser()
browser.debug_level = spynner.DEBUG
browser.create_webview()
browser.show()
...
}}}
See more examples in the repository:
http://code.google.com/p/spynner/source/browse/#svn/trunk/examples
= Running Javascript =
Spynner uses jQuery to make Javascript interface easier. By default, two modules are injected to every loaded page:
* [http://docs.jquery.com/Downloading_jQuery jQuery core]: Amongst other things, it adds the powerful [http://docs.jquery.com/Selectors jQuery selectors], which are used internally by some Spynner methods (_fill_, _select_, _click_, _check_, ...). Of course you can also use jQuery when you inject your own code into a page.
* [http://code.google.com/p/jqueryjs/source/browse/trunk/plugins/simulate jQuery _simulate_ plugin]: Makes it possible to simulate mouse and keyboard events (for now spynner uses it only in the _click_ action). Look up the library code to see which kind of events you can fire.
Note that you must use __jQuery(...)_ instead of _jQuery(...)_ or the common shortcut _$(...)_. That prevents name clashing with the jQuery library used by the page.
= Cook your soup: parsing the HTML =
You can parse the HTML of a webpage with your favorite parsing library ([http://www.crummy.com/software/BeautifulSoup BeautifulSoup], [http://codespeak.net/lxml/ lxml], ...). Since we are already using Jquery for Javascript, it feels just natural to work with [http://pypi.python.org/pypi/pyquery pyquery], its Python counterpart:
{{{
import spynner
import pyquery
browser = spynner.Browser()
...
d = pyquery.Pyquery(browser.html)
d.make_links_absolute(browser.get_url())
href = d("#somelink").attr("href")
browser.download(href, open("/path/outputfile", "w"))
}}}
= Running Spynner without X11 ==
Spynner needs a X11 server to run. If you are running it in a server without X11 you must install the virtual [http://en.wikipedia.org/wiki/Xvfb Xvfb server]. Debian users can use the small wrapper (xvfb-run). If you are not using Debian, you can download it here:
http://www.mail-archive.com/[email protected]/msg69632/x-run
{{{
$ xvfb-run python myscript_using_spynner.py
}}}
= Feedback =
Open an [http://code.google.com/p/spynner/issues/list issue] to report a bug or request a new feature. Other comments and suggestions can be directly emailed to me: [mailto://[email protected] [email protected]].