The Wayback Machine - https://web.archive.org/web/20171130213810/http://xml.sys-con.com/node/40724

Welcome!

Industrial IoT Authors: Liz McMillan, Stackify Blog, Yeshim Deniz, Elizabeth White, SmartBear Blog

Related Topics: Industrial IoT

Industrial IoT: Article

Seven Ways to Mess Up with XML

Seven Ways to Mess Up with XML

A successful XML publishing project inspired this article. The project's leader, who claims that the financial return gained for his company "made his career" there, achieved success for two reasons: he focused on the right goals and executed the project in the right way.

This article focuses on two things: how to establish the right goals for an XMLbased publishing project and the most common mistakes made. We explore the topic by discussing how to go about it the wrong way.

Mistake #1: Plan too little
Everyone knows the importance of upfront planning, right? Yet, even though "everyone knows," we regularly see projects marred by inadequate and superficial planning.

Why does this happen? Two common reasons emerge. First, most people responsible for planning grew up with word-processing and desktop-publishing software. As a result, they typically think that implementing an XML-based system primarily involves a substitution of technologies and file formats.

In reality, using XML for publishing involves new and unfamiliar concepts - it's a true paradigm shift. Unless someone with XML publishing experience helps with the planning, you will likely invest too little in the upfront work.

Second, the decision to launch an XML publishing project can take too long (doesn't it always?). But because the deadline doesn't change, planning gets squeezed to leave more time for implementing the wrong thing. Dilbert cartoons routinely illustrate this problem quite effectively.

Complicating this problem, it's also possible to go overboard on planning. This occurs much less often, but it's still costly because it delays the realization of benefits. Six to eight weeks for planning is about right. If that's not sufficient, then you're probably making mistake #2.

Mistake #2: Try to do too much at once
Once bitten by the XML publishing bug, it's easy to identify opportunities for dramatic improvement everywhere in your organization. So much waste! So much redundancy! So much inaccuracy! How could we have been so blind?

But you must resist trying to change everything at once. Too many people, too many processes, and too many document types exist to tackle everything at once. Instead, start with one group, one process, and one set of related document types.

Some words of caution: make sure you take the long view when planning so that phase VII of your project works well with phase I. You don't want every phase to require going back and changing previously completed phases.

Mistake #3: Try to change too little
Here's a surefire way to fail: start with the aim of creating "minimum disruption." Sounds good - won't work. You want to leave the same tools and processes in place and get a different result? You don't want to affect anyone or change anything but you want to achieve great benefits?

No magic beans exist. If you want to achieve dramatic results, expect to make dramatic changes. Since people naturally resist change, you will need to sell them on the organizational and individual benefits of the changes.

Mistake #4: Try to automatically convert all existing content to XML
Here's one of the most dangerous misunderstandings in publishing: existing processes and tools produce information that is sufficiently consistent to allow automatic conversion to XML. No matter how many times we have encountered that belief - and no matter how insistently it is expressed - it is always wrong.

Word-processing and desktop-publishing tools survive precisely because of the flexibility and freedom they provide to authors. These product attributes are opposed diametrically to the primary purpose of creating XML content, which involves constraining the author to create content according to a set of rules.

Is it hopeless to convert existing content to XML? Not at all. Tools are available that can convert existing content to XML. But you must accept that manual cleanup will be required, so design your process accordingly.

If you're contemplating a one-time conversion of existing information to XML, that's a subject for another article. In this article, we're focusing on building a new system that uses ongoing conversions from word processors.

In such cases, for simple documents or simple content, the manual cleanup may be minimal and, therefore, reasonable. But for long, complex documents, the cleanup cost may be excessive.

You should carefully avoid presenting a cost justification for your system that depends on ongoing, fully automatic conversion of long, complex information to XML.

Mistake #5: Try to convert word-processing tools to XML editors
We have seen companies waste millions of dollars building applications on top of word processors in an attempt to force authors to conform consistently to a set of rules. Why? Because the tools do not provide the architecture that absolute conformance to a data model requires.

Fortunately, word processors and desktop-publishing software are becoming increasingly XML-aware and a few are even XML-capable. These tools offer a greater chance of success, especially if you arm yourself with expert assistance to dissect vendors' claims.

We'll explore this topic in greater detail in a future article.

Mistake #6: Set up too many rules
We're referring to the data model - the DTD or schema - that guides the author in creating and editing content. Two dimensions exist to the problems of "too many rules." First, the data model is too restrictive, and second, the data model has too many tags.

Many novices begin by designing highly restrictive data models with lots of tags. Such data models involve too many subsequent changes, which cost time and money, and require authors to spend a long time learning them.

To make a model overly restrictive, you would be very careful about limiting where tags can be used and how they can be used. For example, you may decide that a <part number> tag can appear only in a <paragraph> tag. But later you may realize that you have to allow a <title> tag to contain <part number> as well. And then you'll find still more places where you need to be able to use <part number>.

To create a problem of too many tags, give authors somewhere between 200 and 300 tags to learn so that they reach their maximum productivity just about the time that they move on to another job. If you want an overly broad generalization, shoot for 30 tags.

Mistake #7: Use too many moving parts
The problem with too many moving parts is that you must do a lot of work to choose them, integrate them, test them, and keep them all working.

In traditional publishing processes involving a lot of manual work, a problem usually doesn't erupt. Many moving parts may exist but human intervention integrates them and keeps the whole machine working. For example, contributing authors may use word processors while the technical publications department uses desktop-publishing software and manually imports the word-processor files as needed.

In an XML publishing system, however, one of the goals is to eliminate human intervention and make everything work together automatically. Fulfilling this goal requires tight integration among the various software products.

XML publishing systems must also deliver more functionality and productivity than the traditional systems they replace, so a key project requirement usually includes the execution of a content management system as well.

No single vendor offers a complete system that delivers all of the functionality needed in support of every type of content. That leaves customers with the task of selecting vendors for each piece of functionality needed.

The short answer is to limit the number of vendors involved - choose enough to accomplish your goals (both immediate and future!) but no more. The long answer is to get some expert assistance to help you match your current and future needs with the products available.

More Stories By PG Bartlett

PG Bartlett is vice president of product marketing at Arbortext, where he is responsible for corporate positioning, marketing strategy, and product direction. Bartlett joined Arbortext in 1994, bringing more than 18 years of experience in both technical and marketing positions at leading-edge high technology companies. He is a frequent presenter at major industry events and has been invited to speak and chair sessions at Comdex, Seybold Seminars, XML conferences, AIIM conferences, and others.

Comments (2) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
mukhtar 12/19/03 11:34:12 PM EST
James Fuller 11/19/03 06:51:24 PM EST

A few ruminations on your article;

Planning instead of building is an age old concept in software ( as well as buildings ), which with every passing month seems to be reiterated in one passing methodology fad or another.

Most of the points you raise are generally applicable to 'all things' software. I would respectfully point out that there are a few other, possibly more important issues when designing with XML.

I will list some further alternate ways of messing up with xml;

- not recognizing the differences in relational vs hiearchical data; for 20+ years RDBMS have been king....

- not identifying document centric vs data centric data in one's usage of xml

- XML should be human readable, the moment it becomes opaque to human inspection....the moment it becomes hard to debug/read/see if its correctly doing its job

- dont be afraid to cook your own xml vocabulary, but always look around to see if someone else has done it before you. We see too many people replicating effort, where enhancing an existing xml vocabulary is much less effort

- just because you like XML, don't force a declaritive processing model on all your publishing processes, sometimes its easier to just pass a filter through all of your data using classic parser techniques; hybrid approaches tend to be more successful then 'golden hammer'

- dont force XML on domain experts, if they are comfortable with existing methods, then just take their output and xml'ify it at the end of the publishing workflow

- recognize that the biggest impact of XML is Unicode, Ubiqitous usage, and the sheer utility of an easily understandable short term data format

- early taxonomisation of xml is a pitfall, there is little need to initially absolutely define a vocabulary with all the expressive power of XML Schema.

- Publishing can reflect pipelines of processing, take a look at existing XML Application servers...I see many people replicating functionality where Cocoon, AxKit, or Ant maybe appropriate.

and lastly use xml:lang.

regards,

@ThingsExpo Stories
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things’). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing? IoT is not about the devices, it’s about the data consumed and generated. The devices are tools, mechanisms, conduits. In his session at Internet of Things at Cloud Expo | DXWor...
Cloud Expo | DXWorld Expo have announced the conference tracks for Cloud Expo 2018. Cloud Expo will be held June 5-7, 2018, at the Javits Center in New York City, and November 6-8, 2018, at the Santa Clara Convention Center, Santa Clara, CA. Digital Transformation (DX) is a major focus with the introduction of DX Expo within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive ov...
Leading companies, from the Global Fortune 500 to the smallest companies, are adopting hybrid cloud as the path to business advantage. Hybrid cloud depends on cloud services and on-premises infrastructure working in unison. Successful implementations require new levels of data mobility, enabled by an automated and seamless flow across on-premises and cloud resources. In his general session at 21st Cloud Expo, Greg Tevis, an IBM Storage Software Technical Strategist and Customer Solution Architec...
Everything run by electricity will eventually be connected to the Internet. Get ahead of the Internet of Things revolution. In his session at @ThingsExpo, Akvelon expert and IoT industry leader Sergey Grebnov provided an educational dive into the world of managing your home, workplace and all the devices they contain with the power of machine-based AI and intelligent Bot services for a completely streamlined experience.
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
"Digital transformation - what we knew about it in the past has been redefined. Automation is going to play such a huge role in that because the culture, the technology, and the business operations are being shifted now," stated Brian Boeggeman, VP of Alliances & Partnerships at Ayehu, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone inn...
Recently, WebRTC has a lot of eyes from market. The use cases of WebRTC are expanding - video chat, online education, online health care etc. Not only for human-to-human communication, but also IoT use cases such as machine to human use cases can be seen recently. One of the typical use-case is remote camera monitoring. With WebRTC, people can have interoperability and flexibility for deploying monitoring service. However, the benefit of WebRTC for IoT is not only its convenience and interopera...
An increasing number of companies are creating products that combine data with analytical capabilities. Running interactive queries on Big Data requires complex architectures to store and query data effectively, typically involving data streams, an choosing efficient file format/database and multiple independent systems that are tied together through custom-engineered pipelines. In his session at @BigDataExpo at @ThingsExpo, Tomer Levi, a senior software engineer at Intel’s Advanced Analytics gr...
Product connectivity goes hand and hand these days with increased use of personal data. New IoT devices are becoming more personalized than ever before. In his session at 22nd Cloud Expo | DXWorld Expo, Nicolas Fierro, CEO of MIMIR Blockchain Solutions, will discuss how in order to protect your data and privacy, IoT applications need to embrace Blockchain technology for a new level of product security never before seen - or needed.
SYS-CON Events announced today that Evatronix will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Evatronix SA offers comprehensive solutions in the design and implementation of electronic systems, in CAD / CAM deployment, and also is a designer and manufacturer of advanced 3D scanners for professional applications.
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Digital Transformation (DX) is not a "one-size-fits all" strategy. Each organization needs to develop its own unique, long-term DX plan. It must do so by realizing that we now live in a data-driven age, and that technologies such as Cloud Computing, Big Data, the IoT, Cognitive Computing, and Blockchain are only tools. In her general session at 21st Cloud Expo, Rebecca Wanta explained how the strategy must focus on DX and include a commitment from top management to create great IT jobs, monitor ...
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, examined the regulations and provided insight on how it affects technology, challenges the established rules and will usher in new levels of diligence arou...
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
Recently, REAN Cloud built a digital concierge for a North Carolina hospital that had observed that most patient call button questions were repetitive. In addition, the paper-based process used to measure patient health metrics was laborious, not in real-time and sometimes error-prone. In their session at 21st Cloud Expo, Sean Finnerty, Executive Director, Practice Lead, Health Care & Life Science at REAN Cloud, and Dr. S.P.T. Krishnan, Principal Architect at REAN Cloud, discussed how they built...