The open source Apache Hadoop platform for big data management has had a meteoric rise since its release in 2006 with an advanced distributed file system, but an industry consultant warns its security protection is the equivalent to software released in 1993.
Kevvie Fowler, a risk consulting partner at KPMG Canada, compared the security in Hadoop to Windows for Workgroups 3.11to an audience at the SecTor security conference in Toronto on Wednesday.
โThereโs not a lot of security in the operating system for Windows for Workgroups โฆ and thatโs similar to Apache Hadoop.โ
And given that Hadoop clusters can hold huge amounts of data, he said the risks are significant.
In fact Fowler was baffled why organizations put up with software thatโs so unprotected. But he suggested it follows a pattern.
โBusiness, to try to improve itself, in a lot of cases trumps security. Itโs not the correct approach but if you look at it what business did is took a technology that had no business being in an enterprise and said, โYou know what? Iโm going to become smarter and more agile and make better decisions about my business. Iโm going to take this technology, this nuclear waste, and stick it in my organization because itโs going to help me in the immediate future.โ ย Not looking at the security ramifications.โ
Often companies initiate a small big data project, and when it demonstrates business value it is expanded, Fowler said โ and by that time itโs too late if there are security holes.
Security professionals need to alert management when projects are at an early stage, he said.
Apache Hadoop isnโt the only version of the platform. A number of software companies have taken itย and added capabilities โ Intel Hadoop, for example, comes with encryption built-in.
MORE FROM THE CONFERENCE
Are there limits to ethical hacking?
A video tour of the trade show floor
Meanwhile, Fowler offered eight steps to better secure Apacheย Hadoop custers:
โIf you donโt need sensitive financial or personal information, donโt put it in Apache Hadoop. Once in, itโs hard to erase data in the clusters. Obfuscate sensitive data that has to go in โ and before it ย goes in;
โUse a configuration management tool to deploy and manage nodes and clusters in a consistent way. If necessary there are free services like Puppet;
โLock the front door. โItโs almost comicalโ that Hadoop doesnโt have default user authorization. Set that up before allowing any users to access data. He advises using Kerberos โ itโs not easy but it offers secure authentication;
โSecure the underlying operating system by hardening servers and encrypt data at rest. If you donโt do this then when anyone logs into the system Apache Hadoop looks like a group of files.
โUse transmission-level security, otherwiseย data from Hadoop goes through your infrastructure in plain text;
โHave a choke point to stop intruders, such as a VPN to log and control users access before accessing the cluster;
โSecure Hadoop-related applications, such as Apacheย Hive for creating data warehousesย andย Apache HBase, a noSQL database. A lot of the SQL injection vulnerabilities in SQL databases are present in HiveQL, he said. And a number of other databases connect directly to Hadoop, he added, so attacks can be layered.
โYou can spend all the time in the world securing your Hadoop without securing your applications and youโre goingย to have a huge disasterย on your hands.โ
Fowler also noted that latest versions of Hive (Sever 2) have the ability toย revoke access to the warehouse,ย but any version of Hive server 1x only secures metadata and not the underlying data.
โEnsure your incident response and forensics program incorporates big data technology.