Hortonworks accelerates Spark at scale for the enterprise

Hortonworks has given details of coming advancements of Hortonworks Data Platform (HDP™) with the in-memory analytic capabilities of Spark. Apache Spark 1.5.2 will include support for Spark SQL and Spark Streaming. Hortonworks’ commitment to Spark is focused on helping customers accelerate data science, maintain seamless data access, drive innovation at the core and ultimately scale for the enterprise.

  • 9 years ago Posted in

“We continue to see customers across all industries derive real value from using Spark with Hortonworks Data Platform,” said Tim Hall, vice president of product management at Hortonworks. “Our customers rely on us to guide them on their Spark journey, and our ability to scale Spark against massive data-sets is remarkable.  With the inclusion of Spark 1.5.2 on HDP, customers can get new Spark capabilities and maximize its value for the enterprise.”

 

“Webtrends is working with Hortonworks to take Spark, Hive and Hadoop and execute these jobs in parallel,” said Peter Crossley, director of architecture, Webtrends. “This capability is critical because it allows us to combine the power of Big Data with the speed and flexibility of an ad hoc system. This means marketers will be able to ask any question of their unlimited data, no matter how structured it may be.”

 

Accelerating Apache Spark for Enterprise Scale  

Hortonworks is providing customers the easiest path for adopting Spark with Hadoop and allowing for innovation at scale. Customers can deploy modern, Spark-based applications alongside Hadoop workloads in a consistent, predictable and reliable way. In order to meet the requirements of enterprise customers, Hortonworks’ three main areas of focus for Spark include: 

 

Data Science Acceleration

  • Improving data science productivity by enhancing Apache Zeppelin, currently available as a technical preview, and by contributing additional Spark algorithms and packages to ease the development of key solutions. One example is Project Magellan, an open source library for geospatial analytics that facilitates geospatial queries and builds upon Spark to solve hard problems dealing with geospatial data at scale.

 

Seamless Data Access

  • Hortonworks is improving Spark’s integration with YARN, HDFS, Hive, HBase and ORC because customers are running Spark on YARN in combination and in conjunction with many of the other popular data access engines. Specifically, Hortonworks is working to further optimize data access via the new Data Source API. This will allow Spark SQL users to take full advantage of the following capabilities:

o   ORC File instantiation as a table

o   Column pruning

o   Language integrated queries

o   Predicate pushdown

 

Innovation at the Core

  • Enhancing Spark’s enterprise security, governance, operations and overall readiness for real-world production deployment.

 

Fostering Community Innovation

Hortonworks has launched Hortonworks Community Connection (HCC), a new online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub. This new online community is an extension of Hortonworks’ open source roots and underscores its commitment to engaging with the community, and fostering community innovation and knowledge. More information about Project Magellan can be found within HCC.

Beacon, NY, Dec 20, 2024– DocuWare unveils its AI-powered Intelligent Document Processing...
Hitachi Vantara survey finds data demands to triple by 2026, highlighting critical role of data...
Only 45% of business data is fully utilised in decision-making, while 34% of business leaders state...
Hitachi Vantara survey finds data demands to triple by 2026, highlighting critical role of data...
Yamaha Corporation, a world-renowned leader in musical instrument manufacturing, has chosen to...
Panzura and GRAU DATA have formed a partnership and introduced an integrated solution that...
77% cite increasing operational efficiency as the main strategic and spending priority for 2025.
Availability and access to right data is key challenge to decarbonization efforts, despite 54% of...