With Strata NYC coming up next week, vendors are busy promoting their Big Data Analytics capabilities; even SAS will be at Strata, although Jim Goodnight says Big Data is rubbish. Here is a quick review of the capabilities for advanced analytics in Hadoop for five Strata sponsors:
Revolution Analytics (my employer) is also a sponsor, but I can’t talk about our offering until next week.
Also worth mentioning: Datameer, Karmasphere and Platfora offer BI and visualization tools that leverage a Hadoop back end. Datameer partners with Zementis to offer PMML import capability, and offers an active app marketplace.
- H20 (open source project)
- h2o (R package)
Smart people from Stanford with VC backing and a social media program. Services business model with open source software. H20 is an open source library of algorithms designed for deployment in Hadoop or free-standing clusters; aggressive vision, but currently available functionality limited to GLM, k-Means, Random Forests. Update: 0xData just announced H20 2.0, which includes Distributed Trees and Regression, such as Gradient Boosting Machine (GBM), Random Forest (RF), Generalized Linear Modeling (GLM), k-Means and Principal Component Analysis (PCA). They also claim to run “100X faster than other predictive analytics providers”, although this claim is not supported by evidence. R users can interface through h2o package. Limited customer base. Partners with Cloudera and MapR.
- True open source model
- Comprehensive roadmap
- Limited functionality
- Limited user base
- Performance claims undocumented
Alpine Data Labs
- Alpine 2.8
Alpine targets a business user persona with a visual workflow-oriented interface (comparable to SAS Enterprise Miner or SPSS Modeler). Supports a reasonably broad range of analytic features. Claims to run “in” a number of databases and Hadoop distributions, but company is opaque about how this works. (Appears to be SQL/HiveQL push-down). In practice, most customers seem to use Alpine with Greenplum. Thin sales and customer base relative to claimed feature mix suggests uncertainty about product performance and stability. Partners with Pivotal, Cloudera and MapR.
- Reasonable option for users already committed to Greenplum Database
- Limited partner and user ecosystem
- Performance and stability should be vetted thoroughly in POC
- Oracle Advanced Analytics Option
- Oracle R Distribution 2.15.2
- Oracle R Enterprise 1.3
- Oracle Connector for Hadoop v2.2
- Oracle Big Data Appliance
Oracle R Distribution (ORD) is a free distribution of R with bug fixes and performance enhancements; Oracle R Enterprise is a supported version of ORD with additional enhancements (detailed below).
Oracle Advanced Analytics (an option of Oracle Database Enterprise Edition) bundles Oracle Data Mining, a distributed data mining engine that runs in Oracle Database, and Oracle R Enterprise. Oracle Advanced Analytics provides an R to SQL transparency layer that maps R functions and algorithms to native in-database SQL equivalents. When in-database equivalents are not available, Oracle Advanced Analytics can run R commands under embedded R mode.
Oracle Connection to Hadoop is an R interface to Hadoop; it enables the user to write MapReduce tasks in R and interface with Hive. As of ORCH 2.1.0, there is also a fairly rich collection of machine learning algorithms for supervised and unsupervised learning that can be pushed down into Hadoop.
- Good choice for Oracle-centric organizations
- Oracle Data Mining is a mature product with an excellent user interface
- Must move data from Hadoop to Oracle Database to leverage OAA
- Hadoop push-down from R requires expertise in MapReduce
- SAS/ACCESS Interface to Hadoop
- SAS Scoring Accelerator for Cloudera
- SAS Visual Analytics/SAS LASR Server
- SAS High Performance Analytics Server
SAS/ACCESS Interface to Hadoop enables SAS users to pass Hive, Pig or MapReduce commands to Hadoop through a connection and move the results back to the SAS server. With SAS/ACCESS you can haul your data out of Hadoop, plug it into SAS and use a bunch of other SAS products, but that architecture is pretty much a non-starter for most Strata attendees. Update: SAS has announced SAS/ACCESS for Impala.
Visual Analytics is a Tableau-like visualization tool with limited predictive analytic capabilities; LASR Server is the in-memory back end for Visual Analytics. High Performance Analytics is a suite of distributed in-memory analytics. LASR Server and HPA Server can be co-located in a Hadoop cluster, but require special hardware. Partners with Cloudera and Hortonworks.
- Legacy SAS connects to Hadoop, does not run in Hadoop
- SAS/ACCESS users must know exact Hive, Pig or MapReduce syntax
- Visual Analytics cannot work with “raw” data in Hadoop
- Minimum hardware requirements for LASR and HPA significantly exceed standard Hadoop worker node specs
- High TCO, proprietary architecture for all SAS products
- Skytree Server
Academic machine learning project (FastLab, at Georgia Tech); with VC backing, launched as commercial software vendor January 2013. Server-based technology, can connect to a range of data sources, including Hadoop. Programming interface; claims ability to run from R, Weka, C++ and Python. Good library of algorithms. Partners with Cloudera, Hortonworks, MapR. Skytree is opaque about technology and performance claims.
- Limited customer base, no announced sales since company launch
- Hadoop integration is a connection, not “inside” architecture
- Performance claims should be carefully vetted
How do you know that SAS has a scalability problem? When Jim Goodnight says you don’t need it.
“Most of our customers don’t have big data, it’s not that big,” he said in response to a question posed during an on-stage Q&A hosted by Walter Isaacson, president and CEO of the Aspen Institute – and the author of a Steve Jobs biography.
No doubt this is literally true; like most companies, SAS’ customer base is a pyramid, with a few enterprise customers at the top and a much larger number of small customers, who use SAS or JMP on a desktop.
Of course, those enterprise customers account for a disproportionate share of SAS revenue. And yes, they have Big Data.
Before introducing Goodnight, Isaacson told the audience the SAS CEO had all of the positive attributes of Steve Jobs, but without the slightly darker side the late Apple boss had sometimes displayed.
The news report does not say whether the audience rolled on the floor, laughing.
Goodnight argued that “big data” is just another buzzword following on from other recent trends in the IT industry, even suggesting that analysts promote them in order to generate business.
SAS, of course, would never stoop to such practices.
Goodnight’s remarks may come as a surprise to SAS’ largest customers, all of whom struggle to manage Big Data. Many of these customers say they have suffered for years with SAS TCO, proprietary architecture and performance issues.
“The term big data is being used today because computer analysts and journalists got tired of writing about cloud computing,” he said.
“Before cloud computing it was data warehousing or ‘software as a service’. There’s a new buzzword every two years and the computer analysts come out with these things so that they will have something to consult about.”
For Goodnight, big data is really about machine data which involves millions of transactions per hour and he gave the audience his views on what he believes big data actually is.
“Really big data is what we see machines are creating with sensors everywhere; all over the electric grid, all over the railroad tracks across the country, there are sensors that are measuring movements of trains.
“That’s where really big data is coming from; it is all of the machine generated data and logging information, which has moved from one machine or router to another,” Goodnight continued, adding: “All of that is captured and there are literally millions of these transactions an hour.”
So is Big Data a Thing, or is it hype? Goodnight can’t decide.
Of course, we all know that managing machine data is a Big Data use case; what matters is whether SAS can help you draw insight from it. There are folks out there who are getting on top of the avalanche, and they’re not necessarily using SAS.
When asked what should be the next buzzword for the computing industry, Goodnight responded with the rather un-snappy “I think they should go with high performance analytics,” drawing chuckles from the audience.
Keep this in mind when your SAS rep wants to speak with you about SAS High Performance Analytics.
According to SAS’ press release,
SAP and SAS will partner closely to create a joint technology and product roadmap designed to leverage the SAP HANA® platform and SAS analytics capabilities. By incorporating the in-memory SAP HANA platform into SAS applications and enabling SAS’ industry-proven advanced analytics algorithms to run on SAP HANA, decision makers will have the opportunity to leverage the value of real-time data analysis within their existing SAS and SAP HANA environments.
SAS and SAP plan to execute a co-sell pilot program to engage select joint customers to validate SAS applications running on SAP HANA. The goal of this program is to build and prioritize the two firms’ joint technology throughout 2014, in particular for industries such as financial services, telecommunications, retail, consumer products and manufacturing. The applications are expected to target business areas that require a combination of advanced analytics running on an in-memory platform that will be designed to yield high value results. Such opportunities exist in customer intelligence, risk management, asset management and anti-money laundering, among others.
How soon we forget; just six months ago, SAS leadership trashed SAP HANA from the stage at SAS Global Forum.
SAS and SAP share a commitment to in-memory computing, but they have a fundamentally different approach to the technology. SAP HANA is a standards-based persistent in-memory database, with a strong vendor ecosystem. SAS on the other hand, builds its in-memory analytics on a proprietary architecture, and has a vendor ecosystem of one. HANA succeeds because it is an easy decision for SAP-centric companies to adopt the product for small high-concurrency databases with one data source. Meanwhile, even the most loyal SAS customers choke at the TCO of High Performance Analytics.
In-memory databases make economic sense when (a) you don’t have much data, and (b) usage is read-only, (c) users want small random packets of data, and (d) there are lots of users. The NBA’s statistics website (powered by SAP HANA) is a perfect example: less than a terabyte of data, but up to 20,000 concurrent users seeking information about how many free throws Hal Greer hit in 1968 against the Celtics. That’s a great application for BI tools, but not for high-end predictive analytics. SAP’s HANA Predictive Analytics Library may be toylike, but it’s likely good enough for that use case.
SAS Visual Analytics makes more sense coupled to an in-memory database like HANA than to its existing LASR Server architecture. It doesn’t do anything that can’t be done in Business Objects, but there are likely a few customers in the market who are both SAS bigots and have an all-SAP back end.
The announcement sparks the usual speculation that Goodnight has finally found a buyer. Don’t hold your breath, Jim; SAP may be willing to pay ten times sales for KXEN, but it seems unlikely they will pay ten times sales for SAS.
Also, you may want to ask your people to clam up about using Hadoop to catch aliens. That kind of talk makes the boys in Waldorf nervous.
SAS’ recent announcement of an alliance with Hortonworks marks a good opportunity to summarize SAS’ Hadoop capabilities. I get a lot of questions about this, because enterprises are increasingly serious about using Hadoop as an analytics platform, and Hadoop’s “native” analytics project (e.g. Apache Mahout) is immature.
Prior to January, 2012, a search for the words “Hadoop” or “MapReduce” returned no results on the SAS marketing and support websites, which tells you something about SAS’ leadership in this area. (Jim Goodnight actually told a reporter that SAS has experience with Big Data because his software runs on a mainframe). In March 2012, SAS announced support for Hadoop connectivity; since then, SAS has gradually expanded the features it supports with Hadoop.
As of today, there are four primary ways that a SAS user can leverage Hadoop:
- Legacy SAS users can connect to Hadoop through the SAS/ACCESS Interface to Hadoop
- SAS Enterprise Miner users can export scoring models to Hadoop with SAS Scoring Accelerator
- SAS LASR Server, the back end for SAS Visual Analytics, can be co-located in Hadoop
- The SAS High Performance Analytics suite can be co-located in Hadoop
Let’s take a look at each option.
“Legacy SAS” is a convenient term for Base SAS, SAS/STAT and various packages (GRAPH, ETS, OR, etc) that are used primarily from a programming interface. SAS/ACCESS Interface to Hadoop provides SAS users with the ability to connect to Hadoop, pass through Hive, Pig or MapReduce commands, extract data and bring it back to the SAS server for further processing. It works pretty much the same way as all of the SAS/ACCESS engines, but there are some inherent differences between Hadoop and commercial databases that impact the SAS user. For more detailed information, read the manual.
SAS/ACCESS also supports six “Hadoop-enabled” PROCS (FREQ, MEANS, RANK, REPORT, SUMMARY, TABULATE) so it can claim to run “inside” Hadoop, which sounds impressive until you consider that there are some 300 PROCs in Legacy SAS, so there are 294 PROCs that don’t run inside Hadoop. If all you need to do is run frequency distributions, simple statistics and summary reports then knock yourself out and buy this product. Of course, if that’s all you want to do you can use Datameer or Big Sheets and save a bundle on SAS licensing fees.
The ability to pass commands to Hadoop sounds cool, but is overrated (in my opinion). A SAS programmer who also knows how to write Hive, Pig or MapReduce can make this puppy do tricks, but an ordinary SAS user will be left in the lurch, since the SAS software provides minimal support and does not “translate” SAS DATA steps. SAS users who work with the SAS Pass-Through SQL Facility know that in practice one must submit explicit SQL to the database, because “implicit SQL” only works in certain circumstances (which SAS does not document); if SAS can’t implicitly turn your DATA Step into SQL, it hauls your data back to the SAS server –without warning — and performs the operation there. You may not have minded that back in 1990, when you worked with data that fit on a floppy, but with today’s data you will sit there for awhile waiting for the network connection to time out.
SAS/ACCESS Interface to Hadoop works with HiveQL, but the user experience is similar to working with SQL Pass-Through. Limited as “implicit HiveQL” may be, SAS does not claim to offer “implicit Pig” or “implicit MapReduce”. Bottom line: since you need to know how to program in Hive, Pig or MapReduce to use SAS/ACCESS Interface to Hadoop, you might as well submit your jobs directly to Hive, Pig or MapReduce and save a bundle on SAS licensing fees.
Moving on to Scoring Accelerator, this product actually works pretty well. The problem for most SAS users is that it only works with SAS Enterprise Miner, and it doesn’t work with “code nodes”. SAS does not publish how many of its customers use each product, but of the two hundred or so SAS customers I’ve worked with in the past couple of years, no more than a dozen use Enterprise Miner — and many of those make liberal use of code nodes.
Of course, if you’re using SAS Enterprise Miner, you can simply export the models in PMML and use them in any PMML-enabled database or decision engine and save a bundle on SAS licensing fees.
Which brings us to the two relatively new in-memory products, SAS Visual Analytics/SAS LASR Server and SAS High Performance Analytics Server. These products were originally designed to run in specially constructed appliances from Teradata and Greenplum; with SAS 9.4 they are supported in a co-located Hadoop configuration that SAS calls a Distributed Alongside-HDFS architecture. That means LASR and HPA can be installed on Hadoop nodes next to HDFS and, in theory, distributed throughout the Hadoop cluster with one instance of SAS on each node.
That looks great on a PowerPoint, but don’t try running that on your standard Hadoop hardware, at least not if you want to also run MapReduce tasks on the cluster. SAS’ hardware partners recommend 16-core machines with 256-512GB RAM for each HPA/LASR node; that hardware costs five or six times as much as a standard Hadoop worker node machine. Since even the most loyal SAS bigot isn’t willing to replace the hardware in a 400-node Hadoop cluster, most customers will stand up a few high-end machines next to the Hadoop cluster and run the in-memory analytics in what SAS calls Asymmetric Distributed Alongside-HDFS mode, which translates into English as “Hello! We’re still moving data around!”.
While HPA can work directly with HDFS data, VA/LASR Server requires data to be in SAS’ proprietary SASHDAT format. For that, you will need to license SAS Data Integration Server.
A single in-memory node supported by a 16-core/256GB can load a 75-100GB table, so if you’re working with a terabyte-sized dataset you’re going to need 10-12 nodes. This will set you back upwards of $2 million in first year fees for the software alone, although if you’re willing to be a reference customer Jim Goodnight may just give you the software: almost two years after general availability SAS has no public reference customers for HPA. I’m guessing that SAS reps have to fall back on that old start-up trick: you know, where the rep tells a prospective buyer that customers get such a strategic advantage from the software they don’t want anyone to know they have it. (The other trick is to hint darkly that the CIA uses it).
SAS seems to be doing a little better selling SAS VA/LASR Server; they have a big push on in 2013 to sell 2,000 copies of VA and heavily promote a one node version on a big H-P machine for $100K. Not sure how they’re doing against that target of 2,000 copies, but they have announced thirteen sales this year to smaller SAS-centric organizations, all but one outside the US.
First, some background.
– SAS is a privately held company. Founder and CEO Jim Goodnight owns a controlling interest.
– Goodnight is 70 years old. He seems to be in good health, but nobody lives forever.
– Goodnight’s children are not engaged in management of the business.
Succession is a problem for any business; it is especially so for a founder-managed business, where ownership must change as well as management. Goodnight may be interested in SAS as a going concern, but his heirs are more likely to want its cash value, especially when the IRS calls to collect estate taxes.
Large founder-managed firms typically struggle with two key issues. First, the standards of corporate governance in public companies differ markedly from those that apply to private companies. The founder’s personal business may be closely intermingled with corporate business in a manner that is not acceptable in a public company.
For example, suppose (hypothetically) that Goodnight or one of his personal entities owns the land occupied by SAS headquarters in Cary, North Carolina; as a transaction between related parties, such a relationship is problematic for a public company. Such interests must be unwound before an IPO or sale to a public company can proceed; failure to do so can lead to serious consequences, as the Rigas brothers discovered when Adelphia Communications went public.
The other key issue is that founders may clash with senior executives who demonstrate independent thought and leadership. Over the past fifteen years, a number of strong executives with industry and public company experience have joined SAS through acquisition or hire; most exited within two years. The present SAS management team consists primarily of long term SAS employees whose leadership skills are well adapted to survival under Goodnight’s management style. How well this management team will perform when out from under Goodnight is anyone’s guess.
SAS flirted with an IPO in 1999, at the height of the tech-driven stock market boom, and hired ex-Oracle executive Andre Boisvert as COO to lead the transition. Preparations for the IPO proceeded slowly; Boisvert clashed with Goodnight and left; SAS shelved the IPO soon thereafter.
Subsequent to this episode, Goodnight told USA Today that talk about an IPO was never serious, that he had pursued an IPO for the benefit of the employees, and abandoned the move because employees were against it. In the story, USA Today noted that this claim appeared to be at odds with Goodnight’s previous public statements. The reader is left to wonder whether the real reason has something to do with Goodnight’s personal finances, or if he simply did not want to let go of the company. In any case, it’s not surprising that many SAS employees opposed an IPO, since Boisvert reportedly told employees at a company meeting that headcount reduction would follow public ownership.
Since then, there have been opportunities to sell the company in whole or in part. IBM tried to acquire the company twice. Acquisition by IBM made a lot of sense at the time; SAS built its business on the strength of its IBM technology partnership, and SAS earned a large share of its revenue from software running on IBM hardware. (It still does). Both companies have a conservative approach to technology, preferring to wait until innovations are proven before introducing them to their blue chip customers.
But Goodnight rebuffed IBM’s overtures and bragged about doing so, claiming an exaggerated value for SAS of $20 billion, around ten times sales at the time. It’s not unknown for two parties to disagree about the value of a company. But according to a SAS insider, Goodnight demanded that IBM agree to his price “without due diligence”, which no acquiring company can ever agree to do. That seems like the behavior of a man who simply does not want to sell to anyone, under any circumstances.
Is SAS really worth ten times revenue? Certainly not. SAS’ compound annual revenue growth rate over the past twenty years is around 10%, which suggests a revenue multiplier of a little under 4 (see graph below). Of course, that assumes SAS’ past revenue growth rate is a good indicator of its future revenue growth, which is a stretch when you consider the saturation of its market, increased competition and limited customer response to “game-changing” new products like SAS High Performance Analytics Server.
One obstacle to sale of the company is Goodnight’s stated unwillingness to sell to buyers who might cut headcount. SAS’ culture is the subject of business school articles and the like, but the unfortunate truth is that SAS’ revenue per employee badly lags the IT industry, as shown in the table below. SAS appears to be significantly overstaffed relative to revenue compared to other companies in the industry, and markedly so compared to any likely acquirer.
One could speculate about the causes of this relatively low revenue per employee — I won’t — but an acquiring company will expect this to improve. Flogging the business for more sales seems like pushing on a string — according to company insiders, SAS employs more people in its Marketing organization than in its Research and Development organization. An acquirer will likely examine SAS’ product line, which consists of a few strong performers — the “Legacy” SAS software, such as Base and STAT — and a long list of other products, many of which do not seem to be widely used. Rationalization of the SAS product line — and corresponding headcount — will likely be Job One for an acquirer.
So what’s ahead for SAS?
One option: Goodnight can simply donate his ownership interest in SAS to a charitable trust, which would continue to manage the business much the way Hershey Trust manages Hershey Foods. This option would be least disruptive to customers and employees, and the current management team would likely stay in place (if the Board is stacked with insiders, locals and friends). It’s anyone’s guess how likely this is; such a move would be consistent with Goodnight’s public statements about philanthropy, but unlike Larry Ellison, Goodnight hasn’t signed Warren Buffett’s Giving Pledge.
But if Goodnight needs the cash, or wants his heirs to inherit something, a buyer must be found. Another plausible option consistent with Goodnight’s belief in the virtues of private ownership would be a private equity led buyout. The problem here is that while private equity investors might be willing to put up with either low sales growth or low employee productivity, they won’t tolerate both at the same time. A private equity investor would likely treat the Legacy SAS software as a cash cow, kill off or spin off the remaining products, and shed assets. The rock collection and the culinary farm will be among the first to go.
There are a limited number of potential corporate buyers. IBM, H-P, Oracle, Dell and Intel all sell hardware that supports SAS software, and all have a vested interest in SAS, but it seems unlikely that any of these will step up and buy the company. Twice rebuffed, IBM has moved on from SAS, reporting double-digit growth in business analytics revenue while SAS struggles to put up single digits. H-P and Dell have other issues at the moment. Oracle could easily put up $10 billion in cash to buy SAS, and Oracle’s analytic story would benefit if SAS were added to the mix, but I suspect that Oracle doesn’t think it needs a better analytics story.
SAP might have the resources to acquire SAS, and such a transaction would add to SAP’s credibility in analytics, which is pretty low (the recently announced acquisition of KXEN notwithstanding). There is, however, no existing formal alliance between the companies, and SAS executives spent the better part of the last SAS Global Forum strutting around the stage sniping at SAP HANA, which is not a great way to get a possible alliance underway.
Anyway, it will be interesting to see what happens.
Tick, tick, tick…
A reader on Twitter asks: what about employee ownership? Well, yes, but if Goodnight wants to sell the company, the employees would need to come up with the market price of $10-11 billion. That works out to about $750,000 for each employee. There are investors who would consider lending the capital necessary for an employee-led buyout, but they would subject the business and its management to the same level of scrutiny as an independent buyer.
SAP announced today that it plans to acquire KXEN in a deal that will close in the fourth quarter. No purchase price was announced. Since one recently laid-off employee characterized the company’s prospects as “circling the toilet”, this seems like a case of bottom-feeding by SAP.
KXEN has struggled to position and sell its InfiniteInsight analytic software. The vendor’s black-boxy approach has little appeal for hard-core analysts, who prefer tooling that offers greater control over the analytics process. At the other end of the value chain, business executives are not interested in analytics, but in business solutions.
Hence, KXEN is neither fish nor fowl as a standalone company, but its technology is worth something to an enterprise vendor such as SAP, who say they will embed KXEN in applications for managing operations, customer relationships, supply chains, risk and fraud
KXEN has never been terribly forthcoming about details of its technology. The software is server-based, with database integration primarily through ODBC and PMML. KXEN has an established partnership with SAP Sybase, but for model scoring only in a “run-beside” architecture. SAP says it will integrate KXEN with HANA, but I suspect that will also be in a run-beside architecture, since KXEN adds little to SAPs’ in-database Predictive Analytics Library.
Update: Several analysts have commented on SAP’s move, including Curt Monash. Monash correctly distinguishes between analytic programming languages (such as SAS or R) and analytic applications such as KXEN’s InfiniteInsight. (There is a third category, which I call the analytic workbench, that is designed for users who have some understanding of analytics but would rather not program. SPSS Modeler is an example,)
Monash also rightly throws cold water on SAP’s ability to embed KXEN in business solutions, pointing out InfiniteInsight’s lack of tooling needed for risk applications. I’d go farther to say that KXEN has no credibility outside of Marketing Campaign Management, where SAP CRM is sadly stuck behind IBM/Unica, SAS, Neolane, Teradata Aprimo, Oracle and Pitney Bowes.
SAS has charged its sales force with selling 2,000 licenses for Visual Analytics this year. There’s lots of marketing action lately from SAS about this product, so here’s an FAQ.
What is SAS Visual Analytics?
Visual Analytics is an in-memory visualization and reporting tool.
What does Visual Analytics do?
SAS Visual Analytics creates reports and graphs that look nice. You can view them on mobile devices.
VA is now in its third release. Why do they call it Release 6.1?
Someone in SAS Worldwide Marketing thinks that if they call it Release 6.1, you will think it’s a mature product.
Is Visual Analytics an in-memory database, like SAP HANA?
No. HANA is a standards-based in-memory database that runs on many different brands of hardware and supports a range of end-user tools. VA is a proprietary architecture available on a limited choice of hardware platforms. It cannot support anything other than the end-user applications SAS chooses to develop.
What does VA compete with?
SAS claims that Visual Analytics competes with Tableau, Qlikview and Spotfire.
How well does it compare?
You will have to decide for yourself whether VA reports are prettier than those produced by Tableau, Qlikview or Spotfire. On paper, Tableau has more functionality.
VA runs in memory. Does that make it better than conventional BI?
All analytic applications perform computations in memory. Tableau runs in memory, and so does Base SAS. There’s nothing unique about that.
What makes VA different from conventional BI applications is that it loads the entire fact table into memory. By contrast, BI applications like Tableau query a back-end database to retrieve the necessary data, then perform computations on the result set.
Performance of a conventional BI application depends on how fast the back-end database can retrieve the data. With a high-performance database the performance is excellent, but in most cases it won’t be as fast as it would if the data were held in memory.
So VA is faster? Is there a downside?
There are two.
First, since conventional BI systems don’t need to load the entire fact table into memory, they can support usage with much larger datastores. The largest H-P Proliant box for VA maxes out at about 10 terabytes; the smallest Netezza appliance supports 30 terabytes, and scales to petabytes.
The other downside is cost; memory is still much more expensive than other forms of storage, and the machines that host VA are far more expensive than data warehouse appliances that can host far more data.
VA is for Big Data, right?
SAS and H-P appear to be having trouble selling VA in larger sizes, and are positioning a small version that can handle 75-100 Gigabytes of data. That’s tiny.
The public references SAS has announced for this product don’t seem particularly large. See below.
How does data get into VA?
VA can load data from a relational database or from a proprietary SASHDAT file. SAS cautions that loading data from a relational database is only a realistic option when VA is co-located in a Teradata Model 720 or Greenplum DCA appliance.
To use SASHDAT files, you must first create them using SAS.
Does VA work with unstructured data?
VA works with structured data, so unstructured data must be structured first, then loaded either to a co-located relational database or to SAS’ proprietary SASHDAT format.
Unlike products like Datameer or IBM Big Sheets, VA does not support “schema on read”, and it lacks built-in tools for parsing unstructured text.
But wait, SAS says VA works with Hadoop. What’s up with that?
A bit of Marketing slight-of-hand. VA can load SASHDAT files that are stored in the Hadoop File System (HDFS); but first, you have to process the data in SAS, then load it back into HDFS. In other words, you can’t visualize and write reports from the data that streams in from machine-generated sources — the kind of live BI that makes Hadoop really cool. You have to batch the data, parse it, structure it, then load it with SAS to VA’s staging area.
Can VA work with streaming data?
SAS sells tools that can capture streaming data and load it to a VA data source, but VA works with structured data at rest only.
With VA, can my users track events in real time?
Don’t bet on it. To be usable VA requires significant pre-processing before it is loaded into VA’s memory. Moreover, once it is loaded it can’t be updated; updating the data in VA requires a full truncate and reload. Thus, however fast VA is in responding to user requests, your users won’t be tracking clicks on their iPads in real time; they will be looking at yesterday’s data.
Does VA do predictive analytics?
While SAS claims that VA is better than SAP HANA because “HANA is just a database”, the reality is that SAP supports more analytics through its Predictive Analytics Library than SAS supports in VA.
Has anyone purchased VA?
A SAS executive recently claimed 200 customers, a figure that should be taken with a grain of salt. If there are that many customers for this product, they are hiding.
There are four public references, all of them outside the US:
SAS has also recently announced selection (but not implementation) by
OfficeMax has also purchased the product, according to this SAS blog.
What about implementation? This is an appliance, right?
Wrong. SAS’ considers an implementation that takes a month to be wildly successful. Implementation tasks include the same tasks you would see in any other BI project, such as data requirements, data modeling, ETL construction and so forth. All of the back end feeds must be built to put data into a format that VA can load.
According to this paper by an H-P engineer, implementations haven’t exactly been trouble-free. That’s sometimes the case with new products, but the issues described in the paper seem attributable to rushing a product to market before it’s ready for prime time. SAS has removed the paper from its website, so ask your rep for a copy of Paper 466-2013. Then get written guarantees from SAS and H-P that the problems described in the paper are solved.
Bottom line, does it make sense to buy SAS Visual Analytics?
Again, you will have to decide for yourself whether the SAS VA reports are prettier than Tableau or the many other options in this space. BI beauty shows are inherently subjective.
You should also demand that SAS prove its claims to performance in a competitive POC. Despite the theoretical advantage of an in-memory architecture, actual performance is influenced by many factors. Visitors to the recent Gartner BI Summit who witnessed a demo were unimpressed; one described it to me as “dog slow”. She didn’t mean that as a compliment.
The high cost of in-memory platforms mean that VA and its supporting hardware will be much more expensive for any given quantity of data than Tableau or equivalent products. Moreover, its proprietary architecture means you will be stuck with a BI silo in your organization unless you are willing to make SAS your exclusive BI provider. That makes this product very good for SAS; the question is whether it is good for you.
The early adopters for this product appear to be very SAS-centric organizations (with significant prior SAS investment). They also appear to be fairly small. If you have very little data, money to burn and are willing to experiment with a relatively new product, VA may be for you.