Innovation and Business Driven Outcomes
SOLR Search Gives an "All Clear"
for IOT-Based Railway Analytics
Capturing and Monitoring IoT Data is Critical to the Positive Train Control (PTC) Mandate
by Venkatesh Sekar
The Federal Railroad Administration (FRA), one of 10 agencies within the U.S. Department of Transportation concerned with intermodal transportation, has mandated the implementation of "Positive Train Control (PTC)" across a significant portion of the rail industry.
PTC uses communication-based/processor-based train control technology that provides a system capable of reliably and functionally preventing:
Incursions into established work zone limits that would impact rail worker safety
The movement of a train through a main line switch in the wrong position
What Is Being Monitored?
As part of PTC enablement, various segments of railways (trains, tracks, stations etc.) are fitted with monitoring and control systems that send out signals to the control center via various transport mechanisms (WiFi, radio etc.). The messages and signals vary, but some examples are below:
Current train location (spatial)
Dimensions of the train (locomotives, weight, load type, start / end destinations, etc.)
Current engineer operating the train
Various messages that are sent from control center to train and corresponding responses
Health of various controls and monitoring systems on the train
An ability to capture and monitor these types of datasets allows for a complete history of end-to-end train movement and can be used to determine likely causes of failure, incidents, etc. Once PTC is fully implemented, the control center receives these IoT-based sensor readings and messages regularly across the entire railway system network.
The latest status on PTC Implementation by both freight and passenger rail is provided by the FRA in the graphic below:
Some Significant Data Challenges
If you consider an individual railway company and the requirements of PTC, some very challenging aspects become evident:
Significant amounts of real time streaming IoT data must be captured, transmitted, and processed and potentially acted upon
The historical dataset size gets very large over time (of Big Data size)
To interact with the datasets (both real time and historical), several data interaction capabilities are required including:
Real Time Dashboards
Real Time Alerting and Notifications
Rapid Search and Drill Down
With that background established, I will present how I’m leveraging Solr (a key part of the Big Data search ecosystem) to enable a variety of use cases in this space with a large railway company.
You’ve Probably Heard of Solr
As the lucene.apache.org site very concisely states, “Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene”. It also comes packaged as part of leading big data platform distributions like Hortonworks, Cloudera, and DataStax. Solr can also be installed independently from a big data platform distribution.
Solr Uniquely Addresses the FRA PTC Search Problem
IoT messages in the context of FRA and PTC are mostly enumeration based, i.e. train type, direction of travel, etc. and span a variety of devices. Some messages do get generated though which are free form text, i.e., a message displayed to engineer, an engineer’s response to a message, etc.
For operations personnel to understand what has happened, they must shift between various messages from various devices to establish a timeline of events meaning that an operator must be able to quickly and accurately search for a particular time or train or segment where an individual alert has been generated.
Designing a NoSQL schema geared towards these types of scenarios is very difficult as it would almost always lead to table scans. Writing a standard SQL query against a Hive table would also be slow, as there could be multiple table joins.
Solr free form query provides an easier way to perform searches across all of the messages received from all IoT devices AND also across the varying message types. Standard Solr functions such as "Boosting", "Proximity" search functionality provide a decided advantage that SQL queries just can’t match.
The following is a pseudo example of one such query:
ptcrt/query?q="*:* +(messageType:0100)^3 +(messageType:0205)^2"&fq="train:XP17639"
Here is how the above query breaks down…
get all rows (across all messages, devices) from the collection ptcrt.
boost records whose messagetype is 0100 (to the very top of the resultset)
boost records whose messagetype is 0205 (to the very top of the resultset, after the 0100 records)
apply filter on the resultset only those records that are related to train "XP17639"
Yes!!! One single simple line -- try that with a SQL query that spans multiple tables -- it will be very tough and could span more than 1 page.
Throw in a UI (like Google’s search box) and it makes the operator’s / user’s life that much easier and they don’t have to necessarily learn SQL.
What About Spatial?
Some of the IoT messages contain spatial information. These messages sometimes indicate alerts, warnings for signal failure or track failure, etc. Spatial Search in Solr natively provides the "location" & "location_rpt" field data types which are specialized datatypes to store latitude/longitude information.
With the assistance of Solr spatial functions like "geodist" we can immediately address some common scenarios such as:
Generating bin alerts and warnings across locomotives, train, and railway segments on the entire network helping us to identify faulty sectors.
Combining a current locomotive’s spatial information with city/town/sub-division information enables us to perform impact analysis of potential highly populated zones.
Analyzing a current locomotive’s spatial information combined with weather information helps determine if train should be stopped based on the particular environmental conditions (pending hurricane or potential wild-fire zone)
Providing real-time warnings of approaching trains to personnel on tracks for added safety
How Does Faceting Play a Part in IoT Enablement?
Faceting is a really special feature of Solr which enables the arrangement of search results into categories based on indexed terms and provides capabilities such as:
IoT devices get updated and replaced sporadically, meaning not all enumerations of a particular event time are known. As new events are emitted (from a more recent IoT device), they could get lost if the aggregation/dashboard is not hardcoded. Faceting eliminates the need to "hardcode" in the Solr query (this “hardcoding” is often required in the SQL world).
For example, take a look at the following query:
get all rows (across all messages, devices) from the collection ptcrt
provide a facet on "alert" field
If newer alerts are emitted, then the "alert" facet would contain the result. A quick caveat - how a particular UI is implemented for alerting is specific to the implementation and is independent from Solr’s query response.
Bringing It All Together with Visualization – GUIs and Dashboards
Solr provides REST based services, thus the client implementation has flexibility to select GUIs and Dashboard solutions as required. Solr does provide some base visualization software that clients could use or enhance if they chose to - these options provide quick prototyping functionality.
Solaritas is the name of a contribution module that integrates Solr with Apache Velocity. It is basically a response writer that uses the Apache Velocity template engine to render Solr responses with a graphical user interface.
Banana is a data visualization tool that uses Solr for data analysis and display. Data display in Banana is based on dashboards, which contain rows of panels that implement the analysis required – you’ll get some basic dash-boarding capabilities.
Develop a Customer GUI for Even More Functionality
Implementing a custom GUI can help address some more specific needs, and Solr provides a Rest API to address this. Developing a custom GUI could help showcase some of the powerful features that Solr provides such as…
More like this
Spell check / relevancy
Using these features and coupling them with “Relevant Search” implementation allows for a very powerful tool for operators/users and provides significant response time benefits.
Where Can You Go From Here?
I’ve just scratched the surface of what’s possible when you apply Solr to IOT-based Railway Analytics. You should also check out some of the other Solr features such as Security (Solr can be configured with technologies such as Kerberos, Ranger, and Sentry), Scaling/Distributed Indexing and Search (Solr Cloud), Collections Alias, and SQL and Graph SQL in Solr 6.
If you need help with IoT-based applications, Solr, or anything else Big Data (including Hashmap’s Tempus IIoT / IoT Framework for Cloud-Edge-ML), feel free to connect with me on the channels below and be sure and share this post as well.
. . .
You can keep up with all new content from Hashmap at https://medium.com/hashmapinc.
Venkatesh Sekar is a Big Data Consultant, Architect, and Developer at Hashmap working across industries with a group of innovative technologists and domain experts accelerating the value of connected data for the open source community and customers. You can connect with him on LinkedIn at https://www.linkedin.com/in/venkatesh-sekar-6367b71/.