Microsoft has revamped its MMLSpark open source project, the better to integrate “many deep learning and data science tools to the Spark ecosystem,” according to the notes on the project repository.

MMLSpark, originally released last year, is a collection of projects intended to make Spark more useful in many contexts—mainly machine learning, but also in some general-purpose ways.

[ The essentials from InfoWorld: What is Apache Spark? The big data analytics platform explained • Spark tutorial: Get started with Apache Spark • What is data mining? How analytics uncovers insights. | Cut to the key news and issues in cutting-edge enterprise technology with the InfoWorld Daily newsletter. ]

Some of MMLSpark’s features integrate Spark with Microsoft machine learning offerings such as the Microsoft Cognitive Toolkit (CNTK) and LightGBM, as well as with third-party projects such as OpenCV. Others are about turning Spark into a service or client—for example, allowing Spark computations (including machine learning predictions) to be easily served via the web, or allowing Spark to interact with other web services via HTTP. One function, LIME on Spark, provides annotated results for the predictions served by a given image classifier, an at-a-glance way to determine if the classifier is working right.

To read this article in full, please click here