Innovation and Business Driven Outcomes

SOLR  Search Gives an "All Clear"
for IOT-Based Railway Analytics
Capturing and Monitoring IoT Data is Critical to the Positive Train Control (PTC) Mandate
by Venkatesh Sekar

The Federal Railroad Administration (FRA), one of 10 agencies within the U.S. Department of Transportation concerned with intermodal transportation, has mandated the implementation of "Positive Train Control (PTC)" across a significant portion of the rail industry.


PTC uses communication-based/processor-based train control technology that provides a system capable of reliably and functionally preventing:


  • Train-to-train collisions

  • Overspeed derailments

  • Incursions into established work zone limits that would impact rail worker safety

  • The movement of a train through a main line switch in the wrong position


What Is Being Monitored?


As part of PTC enablement, various segments of railways (trains, tracks, stations etc.) are fitted with monitoring and control systems that send out signals to the control center via various transport mechanisms (WiFi, radio etc.). The messages and signals vary, but some examples are below:


  • Current train location (spatial)

  • Dimensions of the train (locomotives, weight, load type, start / end destinations, etc.)

  • Current engineer operating the train

  • Various messages that are sent from control center to train and corresponding responses

  • Health of various controls and monitoring systems on the train


An ability to capture and monitor these types of datasets allows for a complete history of end-to-end train movement and can be used to determine likely causes of failure, incidents, etc. Once PTC is fully implemented, the control center receives these IoT-based sensor readings and messages regularly across the entire railway system network.


PTC Progress


The latest status on PTC Implementation by both freight and passenger rail is provided by the FRA in the graphic below:

To give you a perspective on how an individual railway company is progressing with the PTC initiative, here is a chart released by Union Pacific in March 2017 with some of their key milestones:

Some Significant Data Challenges


If you consider an individual railway company and the requirements of PTC, some very challenging aspects become evident:


  1. Significant amounts of real time streaming IoT data must be captured, transmitted, and processed and potentially acted upon

  2. The historical dataset size gets very large over time (of Big Data size)

  3. To interact with the datasets (both real time and historical), several data interaction capabilities are required including:


  • Real Time Dashboards

  • Real Time Alerting and Notifications

  • Rapid Search and Drill Down

  • Batch Reporting


With that background established, I will present how I’m leveraging Solr (a key part of the Big Data search ecosystem) to enable a variety of use cases in this space with a large railway company.

You’ve Probably Heard of Solr

As the site very concisely states, “Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene”. It also comes packaged as part of leading big data platform distributions like Hortonworks, Cloudera, and DataStax. Solr can also be installed independently from a big data platform distribution.

Solr Uniquely Addresses the FRA PTC Search Problem

IoT messages in the context of FRA and PTC are mostly enumeration based, i.e. train type, direction of travel, etc. and span a variety of devices. Some messages do get generated though which are free form text, i.e., a message displayed to engineer, an engineer’s response to a message, etc.


For operations personnel to understand what has happened, they must shift between various messages from various devices to establish a timeline of events meaning that an operator must be able to quickly and accurately search for a particular time or train or segment where an individual alert has been generated.


Designing a NoSQL schema geared towards these types of scenarios is very difficult as it would almost always lead to table scans. Writing a standard SQL query against a Hive table would also be slow, as there could be multiple table joins.


Enter Solr.


Solr free form query provides an easier way to perform searches across all of the messages received from all IoT devices AND also across the varying message types. Standard Solr functions such as "Boosting", "Proximity" search functionality provide a decided advantage that SQL queries just can’t match.


The following is a pseudo example of one such query:


ptcrt/query?q="*:* +(messageType:0100)^3 +(messageType:0205)^2"&fq="train:XP17639"


Here is how the above query breaks down…


  • get all rows (across all messages, devices) from the collection ptcrt.

  • boost records whose messagetype is 0100 (to the very top of the resultset)

  • boost records whose messagetype is 0205 (to the very top of the resultset, after the 0100 records)

  • apply filter on the resultset only those records that are related to train "XP17639"


Yes!!! One single simple line -- try that with a SQL query that spans multiple tables -- it will be very tough and could span more than 1 page.


Throw in a UI (like Google’s search box) and it makes the operator’s / user’s life that much easier and they don’t have to necessarily learn SQL.

What About Spatial?

Some of the IoT messages contain spatial information. These messages sometimes indicate alerts, warnings for signal failure or track failure, etc. Spatial Search in Solr natively provides the "location" & "location_rpt" field data types which are specialized datatypes to store latitude/longitude information.


With the assistance of Solr spatial functions like "geodist" we can immediately address some common scenarios such as:


  • Generating bin alerts and warnings across locomotives, train, and railway segments on the entire network helping us to identify faulty sectors.

  • Combining a current locomotive’s spatial information with city/town/sub-division information enables us to perform impact analysis of potential highly populated zones.

  • Analyzing a current locomotive’s spatial information combined with weather information helps determine if train should be stopped based on the particular environmental conditions (pending hurricane or potential wild-fire zone)

  • Providing real-time warnings of approaching trains to personnel on tracks for added safety

How Does Faceting Play a Part in IoT Enablement?

Faceting is a really special feature of Solr which enables the arrangement of search results into categories based on indexed terms and provides capabilities such as:


  • Aggregation

  • Enumeration discovery

  • Drilldown


IoT devices get updated and replaced sporadically, meaning not all enumerations of a particular event time are known. As new events are emitted (from a more recent IoT device), they could get lost if the aggregation/dashboard is not hardcoded. Faceting eliminates the need to "hardcode" in the Solr query (this “hardcoding” is often required in the SQL world).


For example, take a look at the following query:




  • get all rows (across all messages, devices) from the collection ptcrt

  • provide a facet on "alert" field


If newer alerts are emitted, then the "alert" facet would contain the result. A quick caveat - how a particular UI is implemented for alerting is specific to the implementation and is independent from Solr’s query response.

Bringing It All Together with Visualization – GUIs and Dashboards

Solr provides REST based services, thus the client implementation has flexibility to select GUIs and Dashboard solutions as required. Solr does provide some base visualization software that clients could use or enhance if they chose to - these options provide quick prototyping functionality.


  • Solaritas is the name of a contribution module that integrates Solr with Apache Velocity. It is basically a response writer that uses the Apache Velocity template engine to render Solr responses with a graphical user interface.

  • Banana is a data visualization tool that uses Solr for data analysis and display. Data display in Banana is based on dashboards, which contain rows of panels that implement the analysis required – you’ll get some basic dash-boarding capabilities.  


Develop a Customer GUI for Even More Functionality


Implementing a custom GUI can help address some more specific needs, and Solr provides a Rest API to address this. Developing a custom GUI could help showcase some of the powerful features that Solr provides such as…


  • More like this

  • Highlighting

  • Spell check / relevancy


Using these features and coupling them with “Relevant Search” implementation allows for a very powerful tool for operators/users and provides significant response time benefits.


Where Can You Go From Here?

I’ve just scratched the surface of what’s possible when you apply Solr to IOT-based Railway Analytics. You should also check out some of the other Solr features such as Security (Solr can be configured with technologies such as Kerberos, Ranger, and Sentry), Scaling/Distributed Indexing and Search (Solr Cloud), Collections Alias, and SQL and Graph SQL in Solr 6.


Download Solr and the latest Solr Reference Guide today and start applying it to your own IoT use cases and challenges – I think you’ll be very pleased!


If you need help with IoT-based applications, Solr, or anything else Big Data (including Hashmap’s Tempus IIoT / IoT Framework for Cloud-Edge-ML), feel free to connect with me on the channels below and be sure and share this post as well.

. . .


You can keep up with all new content from Hashmap at


Venkatesh Sekar is a Big Data Consultant, Architect, and Developer at Hashmap working across industries with a group of innovative technologists and domain experts accelerating the value of connected data for the open source community and customers. You can connect with him on LinkedIn at

  • Hashmap on LinkedIn
  • Tweet Hashmap
  • Hashmap on Facebook
  • Hashmap Stories
  • Data Rebels on Tap on Spotify
  • Hashmap on YouTube
  • Hashmap on Instagram

© 2020 by Hashmap, Inc.