Skip to content

Commit

Permalink
v2.0.0 - Move from Twisted to native AsyncIO + major overhaul
Browse files Browse the repository at this point in the history
All of the project has been migrated from Twisted Reactor to native Python AsyncIO, which affects
almost every single file in the project, so I'm not going to go into detail on the individual changes
for that.

This release is a major overhaul, with many new features, improvements, fixes and more. Not everything
is listed below, but most important changes are covered in great detail.

**Key Changes**

 - Many previously hardcoded settings have been changed so that they can be specified via either
   environment variables, CLI arguments, or both.
 - Various new environment variables and CLI arguments
 - Argument parsing in `app.py` and `health.py` have been re-formatted, and now respect
   the values in `rpcscanner.settings` as defaults.

 - Node list file
    - The default node list file has been changed from `nodes.txt` to `nodes.conf` - this
        allows for IDEs and text editors to apply some syntax highlighting, especially for comments,
        along with being able to use more assistive IDE features such as keyboard shortcuts to quickly
        comment / uncomment nodes.
    - `core.py` now uses regex to extract nodes from the node list, instead of a simple `.readlines()`
      and a `.strip()` loop. This means in-line comments next to listed RPC nodes are now possible,
      without causing a problem with the file parsing.
    - `nodes.txt.example` is now `example.nodes.conf` - which has had many RPC nodes added and removed,
      plus fancy comment blocks to separate sections of nodes in the file, and inline comments
      demarcating who runs each node (if known).

 - `rpcscanner/MethodTests.py`
    - The instance attribute `MethodTests.METHOD_MAP` has now been refactored into a module level
      attribute. This allows external RPC method testing functions/methods to be configured, alongside
      the ones built into the `MethodTests` class.
    - API Method testing functions now take a first argument `host` - this is to compensate for
      the newly added capability for adding external RPC method testing functions/methods, by
      making the pre-existing methods consistent with how an external method would receive the
      host URL being tested.
    - Added new `test_all` method, which works similarly to the original `test` method, but
      tests all supported API methods, or a subset if you specify a whitelist/blacklist.

 - `rpcscanner/rpc.py`
    - Classes and functions in this file which previously expected a `reactor` instance to be passed
      to them, no longer take a `reactor` argument. This is a breaking change, as the order of
      arguments for various functions/methods/constructors have changed.
    - General cleanup of `NodePlug._ident_jussi`, including refactoring the server type identification
      code into a static method which parses a dict/str response.

 - `health.py`
    - The default `MAX_SCORE` is now `50` instead of `20`
    - The scoring algorithm used in `score_node` has been tweaked, and also has a new scoring metric based
      on whether a node is out of sync, applying a varying score penalty based on how badly out of sync
      the node is.
    - New fields `Network` and `PassedStages` have been added to the individual RPC health output
    - The `Time` field for individual RPC health has been adjusted to show how far behind the node is.

 - `rpcscanner/RPCScanner.py`
    - The server scanning stages have been adjusted to provide partial compatibility for older Steem-based networks
      such as Whaleshares, by using `database_api` instead of `condenser_api`, along with some other small adjustments.
    - The scanning stages now detect the `network` that a node is on, based on the native currency of the network
      returning within the dynamic global props. By network, I mean `Steem`, `Hive`, `Whaleshares` etc.
    - Various new methods such as `add_tasks` and `rpc_tasks` and others, which are helper methods for dealing with
      AsyncIO, reducing code duplication
    - `filter_badnodes` and `identify_nodes` have both been cleaned up, and have some new features added to them, related
      to the new `network` detection.
    - `plugin_test` now records the timing + retries for individual plugin testing.

 - App-wide changes
    - `verbose` and `quiet` have been refactored to behave differently. `verbose` is now more respected,
      while `quiet` is stricter on the log messages that it allows through
    - Various changes to the logging system, including the addition of log files, with automatic
      log folder creation to avoid issues.
    - Migrated from standard `requirements.txt` to **pipenv**, as pipenv handles both dependency management,
      and creation/maintenance of virtualenv's.
    - Added `run.sh` runner script, to make it easier to install, update, and run rpcscanner via pipenv.
    - Added a `Dockerfile`, so that rpc-scanner can be easily ran within a Docker container on any platform.
      Pre-built images coming soon.
    - General reliability and user experience improvements across the application

**and various other fixes, changes and improvements...**
  • Loading branch information
Someguy123 committed May 29, 2020
1 parent d50228e commit 56102a6
Show file tree
Hide file tree
Showing 21 changed files with 2,808 additions and 330 deletions.
10 changes: 10 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
/venv
/env/
/.env
/test.py
/logs
/.vscode
/.idea
__pycache__
/nodes.conf
/nodes.txt
12 changes: 12 additions & 0 deletions .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# These are supported funding model platforms

github: [Someguy123] # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
# patreon: # Replace with a single Patreon username
# open_collective: # Replace with a single Open Collective username
# ko_fi: # Replace with a single Ko-fi username
# tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
# community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
# liberapay: # Replace with a single Liberapay username
# issuehunt: # Replace with a single IssueHunt username
# otechie: # Replace with a single Otechie username
custom: ['https://wallet.hive.blog/~witnesses', 'https://www.privex.io'] # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
11 changes: 8 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
venv/
.vscode
/venv/
/env/
/.vscode/
/.idea/
/.env
__pycache__
test.py
nodes.txt
nodes.txt
nodes.conf
14 changes: 14 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM python:3.8
VOLUME /app
WORKDIR /app

RUN pip3 install -U pipenv wheel pip && \
curl -fsS https://cdn.privex.io/github/shell-core/install.sh | bash >/dev/null

COPY Pipfile Pipfile.lock /app/

RUN pipenv install --ignore-pipfile

COPY . /app

ENTRYPOINT [ "/app/run.sh" ]
19 changes: 19 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]
nest-asyncio = "*"
jupyter = "*"

[packages]
httpx = "*"
attrs = "*"
colorama = "*"
privex-helpers = "*"
python-dateutil = "*"
python-dotenv = "*"

[requires]
python_version = "3.8"
1,573 changes: 1,573 additions & 0 deletions Pipfile.lock

Large diffs are not rendered by default.

192 changes: 176 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,69 @@
# Steem node RPC scanner
# Hive / Steem-based RPC node scanner

by [@someguy123](https://steemit.com/@someguy123)
by [@someguy123](https://peakd.com/@someguy123)

![Screenshot of RPC Scanner](https://i.imgur.com/B9EShPn.png)
![Screenshot of RPC Scanner app.py](https://cdn.privex.io/github/rpc-scanner/rpcscanner_list_may2020.png)

A fast and easy to use Python script which scans [Steem](https://www.steem.io) RPC nodes
asynchronously using request-threads and Twisted's Reactor.
A fast and easy to use Python script which scans [Hive](https://www.hive.io), [Steem](https://www.steem.io),
and other forks' RPC nodes asynchronously using [HTTPX](https://github.com/encode/httpx) and
native Python AsyncIO.

**Features:**

- Colorized output for easy reading
- Tests a node's reliability during data collection, with multiple retries on error
- Reports the average response time, and average amount of retries needed for basic calls
- Detects a node's Steem version
- Detects a node's Blockchain version
- Show the node's last block number and block time
- Can determine whether a node is using Jussi, or if it's a raw steemd node
- Can scan a list of 10 nodes in as little as 20 seconds thanks to Twisted Reactor + request-threads
- Can determine whether a node is using Jussi, or if it's a raw `steemd` node
- Can scan a list of 20 nodes in as little as 10 seconds thanks to native Python AsyncIO plus
the [HTTPX AsyncIO requests library](https://github.com/encode/httpx)

Python 3.7.0 or higher recommended
Python 3.8.0 or higher strongly recommended

Python 3.7.x may or may not work

# Install

### Easy way

```sh
git clone https://github.com/Someguy123/steem-rpc-scanner.git
cd steem-rpc-scanner

./run.sh install
```

### Manual install (if the easy way isn't working)

```sh
# You may need to install the default python version for your distro, for newer python versions
# to work properly (e.g. 'pip' and 'venv' may only be available as python3-pip and python3-venv)
apt install -y python3 python3-dev
apt install -y python3-pip python3-venv
# Python 3.8+ is recommended, if available on your system.
apt install -y python3.8 python3.8-dev
# If you don't have 3.8 available, python 3.7 may work.
apt install -y python3.7 python3.7-dev

# Install pipenv using the newest version of Python on your system
python3.8 -m pip install -U pipenv

# Clone the repo
git clone https://github.com/Someguy123/steem-rpc-scanner.git
cd steem-rpc-scanner
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
cp nodes.txt.example nodes.txt
# Create a virtualenv + install dependencies using pipenv
pipenv install
# Activate the virtualenv
pipenv shell
# Copy the example nodes.conf file into nodes.conf
cp example.nodes.conf nodes.conf
```

# Usage

### Scan a list of nodes and output their health info as a colourful table

For most people, the defaults are fine, so you can simply run:

```
Expand All @@ -58,8 +90,136 @@ optional arguments:
-f NODEFILE specify a custom file to read nodes from (default: nodes.txt)
```

# License
### Scan an individual node with UNIX return codes

![Screenshot of RPC Scanner health.py](https://cdn.privex.io/github/rpc-scanner/rpcscanner_health_may2020.png)

RPCScanner can easily be integrated with monitoring scripts by using `./health.py scan`, which returns a standard UNIX
error code based on whether that RPC is working properly or not.

**Example 1** - Scanning fully functioning RPC node

```
user@host ~/rpcscanner $ ./run.sh health -q scan "https://hived.privex.io/"
Node: http://hived.privex.io/
Status: PERFECT
Network: Hive
Version: 0.23.0
Block: 43810613
Time: 2020-05-29T00:30:24 (0:00:00 ago)
Plugins: 8 / 8
PluginList: ['condenser_api.get_followers', 'bridge.get_trending_topics', 'condenser_api.get_accounts', 'condenser_api.get_witness_by_account', 'condenser_api.get_blog', 'condenser_api.get_content', 'condenser_api.get_account_history', 'account_history_api.get_account_history']
PassedStages: 3 / 3
Retries: 0
Score: 50 (out of 50)
user@host ~/rpcscanner $ echo $?
0
```

As you can see, `hived.privex.io` got a perfect score of `20`, and thus it signalled the UNIX return code `0`, which means
"everything was okay".

**Example 2** - Scanning a misbehaving RPC node

```
user@host ~/rpcscanner $ ./run.sh health -q scan "https://steemd.privex.io/"
Node: http://steemd.privex.io/
Status: BAD
Network: Steem
Version: error
Block: 43536277
Time: 2020-05-20T13:59:57 (8 days, 10:31:40 ago)
Plugins: 4 / 8
PluginList: ['condenser_api.get_account_history', 'condenser_api.get_witness_by_account', 'condenser_api.get_accounts', 'account_history_api.get_account_history']
PassedStages: 2 / 3
Retries: 0
Score: 2 (out of 50)
user@host ~/rpcscanner $ echo $?
8
GNU AGPL 3.0
```

Unfortunately, `steemd.privex.io` didn't do anywhere near as well as `hived.privex.io` - it scored a rather low `7 / 20`, with
only 4 of the 8 RPC calls working properly which were tested.

This resulted in `health.py` signalling return code `8` instead (non-zero), which tells a calling program / script that
something went wrong during execution of this script.

In this case, `8` is the default setting for `BAD_RETURN_CODE`, giving a clear signal to the caller that it's trying to tell it
"the passed RPC node's score is below the threshold and you should stop using it!".

You can change the numeric return code used for both "good" and "bad" results from the individual node scanner by setting
`GOOD_RETURN_CODE` and/or `BAD_RETURN_CODE` respectively in `.env`:

```env
# There isn't much reason to change GOOD_RETURN_CODE from the default of 0. But the option is there if you want it.
GOOD_RETURN_CODE=0
# We can change BAD_RETURN_CODE from the default of 8, to 99 for example.
# Any integer value from 0 to 254 can generally be used.
BAD_RETURN_CODE=99
```

#### Making use of these return codes in an external script

![Screenshot of extras/check_nodes.sh and py_check_nodes.py running](https://i.imgur.com/cm4DPVN.png)

Included in the [extras folder of the repo](https://github.com/Someguy123/steem-rpc-scanner/tree/master/extras), are two
example scripts - one in plain old Bash (the default terminal shell of most Linux distro's and macOS), and a python script,
intended for use on Python 3.

Both scripts do effectively the same thing - they load `nodes.txt`, skipping any commented out nodes, then check whether each
one is fully functional or not by calling `health.py scan NODE`, and check for a non-zero return code. Then outputting
either a green `UP NODE http://example.com` or a red `DOWN NODE http://example.com`.

Pictured above is a screenshot of both the bash example, and the python example - running with the same node list, and same
version of this RPC Scanner.

Handling program return codes is generally going to be the easiest in **shell scripting languages**, including Bash - as most
shell scripting languages are built around the UNIX methodology - everything is a file, language syntax is really just executing
programs with arguments, and return codes from those programs power the logic syntax etc.

The most basic shell script would be a simple ``if`` call, using ``/path/to/health.py scan http://somenode`` as the ``if`` test.
Most shells such as Bash will read the return (exit) code of the program, treating 0 as "true" and everything else as "false".

#### Basic shell script example

```shell script
#!/usr/bin/env bash

if /opt/rpcscanner/health.py scan "https://hived.privex.io" &> /dev/null; then
echo "hived.privex.io is UP :)"
else
echo "hived.privex.io is DOWN!!!"
fi
```


# License

See file LICENSE
[GNU AGPL 3.0](https://github.com/Someguy123/steem-rpc-scanner/blob/master/LICENSE)

See file [LICENSE](https://github.com/Someguy123/steem-rpc-scanner/blob/master/LICENSE)

# Common environment settings

- `RPC_TIMEOUT` (default: `3`) Amount of seconds to wait for a response from an RPC node before giving up.
- `MAX_TRIES` (default: `3`) Maximum number of attempts to run each call against an RPC node. Note that this
number includes the initial try - meaning that setting `MAX_TRIES=1` will disable automatic retries for RPC calls.

DO NOT set this to `0` or the scanner will simply think all nodes are broken. Setting `MAX_TRIES=0` may however be useful
if you need to simulate how an external application handles "DEAD" results from the scanner.
- `RETRY_DELAY` (default: `2.0`) Number of seconds to wait between retrying failed RPC calls. Can be a decimal number of seconds,
e.g. `0.15` would result in a 150ms retry delay.
- `PUB_PREFIX` (default: `STM`) The first 3 characters at the start of a public key on the network(s) you're testing. This
is used by `rpcscanner.MethodTests.MethodTests` for thorough "plugin tests" which validate that an account's public
keys look correct.
- `GOOD_RETURN_CODE` (default: `0`) The integer exit code returned by certain parts of RPCScanner, e.g. `health.py scan [node]`
when the given RPC node(s) are functioning fully.
- `BAD_RETURN_CODE` (default: `0`) The integer exit code returned by certain parts of RPCScanner, e.g. `health.py scan [node]`
when the given RPC node(s) are severely unstable or missing vital plugins.
57 changes: 27 additions & 30 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,27 +8,29 @@
Python 3.7.0 or higher recommended
"""
from os.path import join
import dotenv
dotenv.load_dotenv()

from twisted.internet.defer import inlineCallbacks
from twisted.internet.task import react
from privex.loghelper import LogHelper
import asyncio
from privex.helpers import ErrHelpParser
from rpcscanner import RPCScanner, settings, BASE_DIR, load_nodes
from rpcscanner import RPCScanner, settings, load_nodes, set_logging_level
import logging
import signal

log = logging.getLogger('rpcscanner.app')


parser = ErrHelpParser(description='Scan RPC nodes from a list of URLs to determine their last block, '
'version, reliability, and response time.')
parser.add_argument('-v', dest='verbose', action='store_true', default=False, help='display debugging')
parser.add_argument('-q', dest='quiet', action='store_true', default=False, help='only show warnings or worse')
parser.add_argument('-f', dest='nodefile', default='nodes.txt',
help='specify a custom file to read nodes from (default: nodes.txt)')
parser.add_argument('--account', dest='account', default='someguy123',
help='Steem username used for tests requiring an account to lookup')
parser.add_argument('--plugins', action='store_true', dest='plugins', default=False,
help='Run thorough plugin testing after basic filter tests complete.')
parser.set_defaults(verbose=False, quiet=False, plugins=False, account='someguy123')
parser.add_argument('-v', dest='verbose', action='store_true', help='display debugging')
parser.add_argument('-q', dest='quiet', action='store_true', help='only show warnings or worse')
parser.add_argument('-f', dest='nodefile', help=f'specify a custom file to read nodes from (default: {settings.node_file})')
parser.add_argument('--account', dest='account', help='Steem username used for tests requiring an account to lookup')
parser.add_argument('--plugins', action='store_true', dest='plugins', help='Run thorough plugin testing after basic filter tests complete.')
parser.set_defaults(
verbose=settings.verbose, quiet=settings.quiet, plugins=settings.plugins,
account=settings.test_account, nodefile=settings.node_file
)
args = parser.parse_args()

# Copy values of command line args into the application's settings.
Expand All @@ -37,33 +39,28 @@

debug_level = logging.INFO

if settings.verbose:
if settings.quiet:
debug_level = logging.CRITICAL
elif settings.verbose:
print('Verbose mode enabled.')
debug_level = logging.DEBUG
elif settings.quiet:
debug_level = logging.WARNING
else:
print("For more verbose logging (such as detailed scanning actions), use `./app.py -v`")
print("For less output, use -q for quiet mode (display only warnings and errors)")
print("For less output, use -q for quiet mode (display only critical errors)")

f = logging.Formatter('[%(asctime)s]: %(funcName)-18s : %(levelname)-8s:: %(message)s')
lh = LogHelper(handler_level=debug_level, formatter=f)
lh.add_console_handler()
log = lh.get_logger()
set_logging_level(debug_level)

# s = requests.Session()


@inlineCallbacks
def scan(reactor):
async def scan():
node_list = load_nodes(settings.node_file)
rs = RPCScanner(reactor, nodes=node_list)
yield from rs.scan_nodes()
rs = RPCScanner(nodes=node_list)
await rs.scan_nodes()
rs.print_nodes()


if __name__ == "__main__":
# Make CTRL-C work properly with Twisted's Reactor
# Make CTRL-C work properly with Twisted's Reactor / AsyncIO
# https://stackoverflow.com/a/4126412/2648583
signal.signal(signal.SIGINT, signal.default_int_handler)
react(scan)
asyncio.run(scan())

Loading

0 comments on commit 56102a6

Please sign in to comment.