The Wayback Machine - https://web.archive.org/web/20150919005200/http://cloudcomputing.sys-con.com/node/3442180

Click here to close now.





















Welcome!

@CloudExpo Authors: Pat Romanski, Liz McMillan, Elizabeth White, Dinko Eror, Kevin Jackson

Related Topics: Industrial IoT, @CloudExpo, @BigDataExpo, @ThingsExpo

Industrial IoT: Blog Feed Post

SDR – Steamlined Data Refinery | @CloudExpo #BigData #IoT #M2M #API #RTC

As usual, we love to confuse with variety of terms describing the same

Yesterday I attended a session in Palo Alto on the subject of Data Refinery and the speaker was Will Gorman of Pentaho. I did not realize that Pentaho was acquired by Hitachi Data Systems couple of months ago. The terms “data lake” was coined by James Dixon of Pentaho. I wrote a blog on this subject last year. As soon as the term started to appear in the data lexicon, other interesting terms such as “data swamp” appeared.

The term data lake has been coined to convey the concept of a centralized repository containing virtually inexhaustible amounts of raw (or minimally curated) data that is readily made available anytime to anyone authorized to perform analytical activities. The often unstated premise of a data lake is that it relieves users from dealing with data acquisition and maintenance issues, and guarantees fast access to local, accurate and updated data without incurring development costs (in terms of time and money) typically associated with structured data warehouses. According to IBM, “However appealing this premise, practically speaking, it is our experience, and that of our customers, that “raw” data is logistically difficult to obtain, quite challenging to interpret and describe, and tedious to maintain. Furthermore, these challenges multiply as the number of sources grows, thus increasing the need to thoroughly describe and curate the data in order to make it consumable”. I completely agree.

During the early days of Data Warehousing, the terms ETL dealt with all the data preparation stages – extract, transform, and load the curated data for query and reporting. I used to call this jokingly, “answer to 25 years of sin”. In my understanding, Pentaho’s SDR (Streamlined Data Refinery) is a modern form of ETL that deals with both internal structured data and external unstructured data including machine-generated data. In Pentaho’s own words, “The big data stakes are higher than ever before. No longer just about quantifying ‘virtual’ assets like sentiment and preference, analytics are starting to inform how we manage physical assets like inventory, machines and energy. This means companies must turn their focus to the traditional ETL processes that result in safe, clean and trustworthy data. However, for the types of ROI use cases we’re talking about today, this traditional IT process needs to be made fast, easy, highly scalable, cloud-friendly and accessible to business. And this has been a stumbling block – until now. Streamlined Data Refinery, a market-disrupting innovation that effectively brings the power of governed data delivery to “the people” unlocks big data’s full operational potential”.

Earlier I wrote about Data Curation and how new companies such as Tamr are addressing the issue. Pentaho’s SDR is another form of data curation. IBM calls it Data Wrangling process.

As usual, we love to confuse with variety of terms describing the same.

More Stories By Jnan Dash

Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at http://jnandash.ulitzer.com.

@CloudExpo Stories
SYS-CON Events announced today that Logz.io has been named a "Bronze Sponsor" of SYS-CON's @DevOpsSummit Silicon Valley, which will take place November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Logz.io provides open-source software ELK turned into a log analytics platform that is simple, infinitely- scalable, highly available, and secure.
DevOps delivers remarkable results. But does it help all of IT? Can traditional ‘mode 1’ IT benefit as much as innovative ‘mode 2’? How about the rest of your business? Or have you just shifted your bottleneck? And if so, what can you do about it? Improving dev and ops is necessary, but not sufficient. It often just shifts the burden sideways (e.g., to PMs, SQA, InfoSec, DBAs, NOC, etc.), upstream (to the PMO, Controller, Business Liaison, etc.), or downstream (to TechPubs, Service Desk, Traini...
The Internet of Things has the potential to disrupt all industries, not just consumer, as businesses leverage the new insights and capabilities enabled by new devices / things, automation, integration and analytics, etc., to transform how they do business. One industry ripe for disruption is higher education. Colleges and universities are being challenged with serving more students and at the same time ensuring successful student outcomes. In his session at @ThingsExpo, Chris Witeck, Principa...
Cloud Expo, Inc. has announced today that Andi Mann returns to DevOps Summit 2015 as Conference Chair The 5th International @DevOpsSummit, co-located with 17th Cloud Expo, will take place on November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. "DevOps is set to be one of the most profound disruptions to hit IT in decades," said Andi Mann. "It is a natural extension of cloud computing, and I have seen both firsthand and in independent research the fantastic results D...
Contrary to mainstream media attention, the multiple possibilities of how consumer IoT will transform our everyday lives aren’t the only angle of this headline-gaining trend. There’s a huge opportunity for “industrial IoT” and “Smart Cities” to impact the world in the same capacity – especially during critical situations. For example, a community water dam that needs to release water can leverage embedded critical communications logic to alert the appropriate individuals, on the right device, as...
Moving an existing on-premise infrastructure into the cloud can be a complex and daunting proposition. It is critical to understand the benefits as well as the challenges associated with either a full or hybrid approach. In his session at 17th Cloud Expo, Richard Weiss, Principal Consultant at Pythian, will present a roadmap that can be leveraged by any organization to plan, analyze, evaluate and execute on a cloud migration solution. He will review the five major cloud transformation phases a...
The web app is agile. The REST API is agile. The testing and planning are agile. But alas, data infrastructures certainly are not. Once an application matures, changing the shape or indexing scheme of data often forces at best a top down planning exercise and at worst includes schema changes that force downtime. The time has come for a new approach that fundamentally advances the agility of distributed data infrastructures. Come learn about a new solution to the problems faced by software organ...
Using code to define your infrastructure is a trend that is quickly becoming common practice and a critical part of any successful deployment In his session at 17th Cloud Expo, Christopher Gallo, Developer Advocate at SoftLayer, an IBM Company, will discuss what it means to be powered by SoftLayer, and some really awesome tools to help you make your deployments agile.
Racemi, a provider of automated server migration software, announces availability of updated DynaCenter software that offers Amazon Web Services (AWS) Cloud Formation templates to simplify installation and configuration, plus support for eight additional IBM SoftLayer data centers giving customers a choice of deploying to 20 SoftLayer data centers around the globe.
An edge gateway is an essential piece of infrastructure for large scale cloud-based services. In his session at 17th Cloud Expo, Mikey Cohen, Manager, Edge Gateway at Netflix, will detail the purpose, benefits and use cases for an edge gateway to provide security, traffic management and cloud cross region resiliency. He will discuss how a gateway can be used to enhance continuous deployment and help testing of new service versions and get service insights and more. Philosophical and architectu...
Serving more than 600 hospitals in the U.S., Adreima provides clinically integrated revenue cycle services. Read this case study to learn how partnering with Tiempo Development has proved to be the most productive, cost-effective way to advance a software platform that serves marketing strategy, client service delivery, and information management.
In his session at DevOps Summit, Kristopher Francisco, Founder and CTO of Evolute, will evaluate containerization, service discovery, and cluster scheduling in order to obviate the path all microservice architectures across our industry are trying to achieve. By first analyzing the "maturity" of your application landscape, what all organizations moving to production need and the technical capability within your teams, he will present a methodology and toolchain that moves developers and operat...
SYS-CON Events has announced today that Roger Strukhoff has been named conference chair of Cloud Expo and @ThingsExpo 2015 Silicon Valley. The 17th Cloud Expo and 4th @ThingsExpo will take place on November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. "The Internet of Things brings trillions of dollars of opportunity to developers and enterprise IT, no matter how you measure it," stated Roger Strukhoff. "More importantly, it leverages the power of devices and the Interne...
Today, we are in the middle of a paradigm shift as we move from managing applications on VMs and containers to embracing everything that the cloud and XaaS (Everything as a Service) has to offer. In his session at 17th Cloud Expo, Kevin Hoffman, Advisory Solutions Architect at Pivotal Cloud Foundry, will provide an overview of 12-factor apps and migrating enterprise apps to the cloud. Kevin Hoffman is an Advisory Solutions Architect for Pivotal Cloud Foundry, and has spent the past 20 years b...
Everything is moving to the cloud. From public to private to hybrid, enterprises are adopting cloud technologies as their primary operating model. From Juniper’s perspective, a truly cloud-enabled enterprise is more than just how to make more money via a company’s Web site. Achieving a cloud enabled enterprise entails taking a holistic approach to involving the entire network. When it’s done right, a cloud-enabled enterprise results in the following three characteristics.
Cloud Expo is the single show where delegates and technology vendors can meet to experience and discuss the entire world of the cloud.At 16th Cloud Expo in New York City, Sandy Carter keynoted on women in tech and why women need to take risks and embrace failure to give them the courage to crash through the glass ceiling. She now shares some of her own thoughts and experiences from Cloud Expo.
Learn how Backup as a Service can help your customer base protect their data. In his session at 17th Cloud Expo, Stefaan Vervaet, Director of Strategic Alliances at HGST, will discuss the challenges of data protection in an era of exploding storage requirements, show you the benefits of a backup service for your cloud customers, and explain how the HGST Active Archive and CommVault are already enabling this service today with customer examples.
SYS-CON Events announced today that HPM Networks will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. For 20 years, HPM Networks has been integrating technology solutions that solve complex business challenges. HPM Networks has designed solutions for both SMB and enterprise customers throughout the San Francisco Bay Area.
SYS-CON Events announced today that xMatters, inc., a leader in communication-enabled business processes, will exhibit at SYS-CON's @DevOpsSummit Silicon Valley, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. xMatters’ cloud-based communications solutions enable any business process or application to trigger two-way communications (push, voice, email, SMS, etc.) throughout the extended enterprise during time-sensitive events. With over a de...
iRise has launched iRise 10.5, which broadens the ability to integrate iRise’s advanced product definition capabilities into the rest of the application development lifecycle, allowing artifacts to be visible and consumable in tools used later in the software development process. iRise takes you from sketch to realistic prototype-for desktop, mobile or wearable apps-in minutes. The 10.5 release brings more features to the cloud version, and expands integrations with leading ALM tools. (Phot...