Skip to content

anselmevignon/mist

 
 

Repository files navigation

Build Status Build Status Coverage Status GitHub version Maven Central Dependency Status

Mist

Mist is a service for exposing analytics jobs and machine learning models as web services.

Mist provides an API for Scala & Python Apache Spark jobs and for machine learning models.

It implements Spark as a Service and creates a unified API layer for building enterprise solutions and services on top of a Big Data lake.

Mist use cases

Discover more Hydrosphere Mist use cases.

Table of Contents

Features

  • Realtime low latency models serving/scoring
  • Exposing Apache Spark jobs through REST API
  • Spark 2.0.0 support!
  • Spark Contexts orchestration
  • Super parallel mode: multiple Spark contexts in separate JVMs or Dockers
  • HTTP & Messaging (MQTT) API
  • Scala and Python Spark jobs support
  • Support for Spark SQL and Hive
  • High Availability and Fault Tolerance
  • Self Healing after driver program failure
  • Powerful logging
  • Clear end-user API

Getting Started with Mist

######Dependencies

  • jdk = 8
  • spark >= 1.5.2 (earlier versions were not tested)
  • MQTT Server (optional)

######Run mist

docker run -p 2003:2003 -v /var/run/docker.sock:/var/run/docker.sock -d hydrosphere/mist:master-2.0.0 mist

More about docker image

######Run example

sbt "project examples" package

curl --header "Content-Type: application/json" -X POST http://localhost:2003/api/simple-context --data '{"digits": [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]}'

Check out Complete Getting Started Guide

Building from source

  • Build the project
git clone https://github.com/hydrospheredata/mist.git
cd mist
sbt -DsparkVersion=2.0.0 assembly 
  • Run
./bin/mist start master

Development mode

# clone mist repo 
git clone https://github.com/Hydrospheredata/mist

# available spark versions: 1.5.2, 1.6.2, 2.0.0
export SPARK_VERSION=2.0.0
docker create --name mist-${SPARK_VERSION} -v /usr/share/mist hydrosphere/mist:tests-${SPARK_VERSION}
docker run --name mosquitto-${SPARK_VERSION} -d ansi/mosquitto
docker run --name hdfs-${SPARK_VERSION} --volumes-from mist-${SPARK_VERSION} -d hydrosphere/hdfs start

# run tests
docker run -v /var/run/docker.sock:/var/run/docker.sock --link mosquitto-${SPARK_VERSION}:mosquitto --link hdfs-${SPARK_VERSION}:hdfs -v $PWD:/usr/share/mist hydrosphere/mist:tests-${SPARK_VERSION} tests
# or run mist
docker run -v /var/run/docker.sock:/var/run/docker.sock --link mosquitto-${SPARK_VERSION}:mosquitto --link hdfs-${SPARK_VERSION}:hdfs -v $PWD:/usr/share/mist hydrosphere/mist:tests-${SPARK_VERSION} mist

What's next

Version Information

Mist Version Scala Version Python Version Spark Version
0.1.4 2.10.6 2.7.6 >=1.5.2
0.2.0 2.10.6 2.7.6 >=1.5.2
0.3.0 2.10.6 2.7.6 >=1.5.2
0.4.0 2.10.6, 2.11.8 2.7.6 >=1.5.2
0.5.0 2.10.6, 2.11.8 2.7.6 >=1.5.2
0.6.5 2.10.6, 2.11.8 2.7.6 >=1.5.2
0.7.0 2.10.6, 2.11.8 2.7.6 >=1.5.2
0.8.0 2.10.6, 2.11.8 2.7.6 >=1.5.2
master 2.10.6, 2.11.8 2.7.6 >=1.5.2

Roadmap


  • Persist job state for self healing
  • Super parallel mode: run Spark contexts in separate JVMs
  • Powerful logging
  • RESTification
  • Support streaming contexts/jobs
  • Reactive API
  • Realtime ML models serving/scoring
  • CLI
  • Web Interface
  • Apache Kafka support
  • Bi-directional streaming API
  • AMQP support

Docs Index

Contact

Please report bugs/problems to: https://github.com/Hydrospheredata/mist/issues.

http://hydrosphere.io/

LinkedIn

Facebook

Twitter

About

Model serving middleware on top of Apache Spark

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 63.6%
  • Scala 29.1%
  • CSS 2.7%
  • HTML 1.6%
  • Shell 1.6%
  • Python 1.4%