Skip to content

shengjh/arctern

 
 

Repository files navigation

中文README

Notice: Arctern is still in development and the 0.1.0 version is expected to be released in April 2020.

Overview

Arctern is a geospatial analytics engine for massive-scale data. Compared with other geospatial analytics tools, Arctern has the following advantages:

  1. Provides domain-specific APIs to improve the development efficiency of upper-level applications.
  2. Provides extensible, low-cost distributed solutions.
  3. Provides GPU acceleration for geospatial analytics algorithms.
  4. Provides hybrid analysis with GIS, SQL, and ML.

Architecture

The following figure shows the architecture of Arctern 0.1.0.

Arctern includes two components: GIS and Visualization. Arctern 0.1.0 includes most frequently used 35 GIS APIs in the OGC standard, including construction, access, correlation analysis, measurement for geometric objects. The visualization component is responsible for rendering geometry objects and provides APIs according to the Vega standard. Different from traditional web rendering, Arctern uses server-side rendering and can render choropleths, heatmaps, and scatter plots for massive-scale data. In 0.1.0, geospatial data analytics and visualization with both CPU and GPU based implementation. Arctern provides a unified set of APIs for you to determine whether to use GPU acceleration based on your own needs.

For data interfaces, Arctern supports standard numeric types, WKB formats, and files with JSON, CSV, and parquet format. Arctern organizes data in the memory in a column-based data manner according to the Arrow standard. In this way, Arctern supports zero-copy data exchange with external systems.

For invocation interfaces, Arctern includes three column-based interfaces: C++ API, Python API, and Spark API. The C++ API transfer and return arguments with the Arrow standard, Python and Spark APIs pass arguments in dataframe format. Because Spark will start to support GPU resource management since the 3.0 version, the Spark interface of Arctern only supports Spark 3.0.

Code sample

# Invoke Arctern API in PySpark

from pyspark.sql import SparkSession
import arctern

if __name__ == "__main__":
    spark = SparkSession \
        .builder \
        .appName("Arctern-PySpark example") \
        .getOrCreate()

    spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
    arctern.pyspark.register(spark)

    within_df = spark.read.json('./example.json').cache()
    within_df.createOrReplaceTempView("within")
    spark.sql("select ST_Within_UDF(geo0, geo1) from within").show()
    spark.stop()

Visualization

Arctern will be open sourced in sync with Sulidae, which is a front-end visualization system developed by ZILLIZ and provides hybrid visualization solutions with both web frontend and server-side rendering. Sulidae combines the speed and flexibility of web frontend rendering and massive-scale data rendering of the backend.

Arctern 0.1.0 is compatible with Sulidae. The following figures show the visualization effects of a headmap and choropleth with 10 million data.

Arctern roadmap

v0.1.0

  1. Support most frequently used 35 GIS APIs in the OGC standard.
  2. Support rendering horopleths, heatmaps, and scatter plots for massive-scale datasets.
  3. Provide C++, Python, and Spark APIs with the Arrow standard.
  4. Arctern engine with CPU based implementation.
  5. Arctern engine with GPU based implementation.
  6. Compatibility with Sulidae.
  7. Documentation for installation, deployment, and API reference.

v0.2.0

  1. Domain-specific API for trace analysis and geospatial data statistics.
  2. Geospatial indexes for domain-specific API.
  3. Performance optimization for Spark 3.0.
  4. Support more GIS APIs.
  5. Continuously improve system stability.

In progress:

Completed by 2020.03.10

  1. Support most frequently used 35 GIS APIs in the OGC standard.
  2. Support rendering horopleths, heatmaps, and scatter plots for massive-scale datasets.
  3. Support C++, Python, and Spark APIs based the Arrow standard.
  4. Arctern engine with CPU support.
  5. Arctern engine with GPU support.

Contact us

Email

[email protected]

ZILLIZ Wechat

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 54.3%
  • Python 18.9%
  • Shell 10.6%
  • CMake 7.5%
  • Cuda 4.7%
  • Groovy 1.6%
  • Other 2.4%