Tag: data

Docker ported into Hadoop as benchmarks show SCREAMING FAST performance

Code committers hope unholy union of open source tech will spawn speedy gonzalez virtualization The Hadoop community is working on patches that will bring the popular app-containerization technology Docker into the data management system, and independent benchmarks are showing the tech has a huge speedup over traditional virtualization approaches. Docker is an open source Linux containerization technology that uses underlying kernel elements like namespaces, lxc, and cgroups to let an admin run multiple apps with all their dependencies in secure sandboxes on the same underlying Linux OS, making it an attractive alternative to typical virtualization, which bundles a copy of the OS with each app. In a set of benchmarks an IBM employee released on Thursday, the company showed that Docker containerization has some huge advantages …

Original Article Can Be Found Here:

Docker ported into Hadoop as benchmarks show SCREAMING FAST performance

Amazon uses Blu-ray Disc to Archive Data, Just Like Facebook

A few years back, Amazon introduced a somewhat bizarre sounding file backup service. Unlike an ordinary S3 bucket, Glacier was designed to protect data that you didn’t constantly need access to. As the name implies, it’s a sort of digital cold storage. It also moves slowly, like a Glacier would. If you need to retrieve some files, it can take three to five hours to “thaw” them. Ever since the service was announced, people (geeky ones, anyway) have been wondering what kind of hardware Amazon Glacier uses that lets them charge such ridiculously low rates. Storage is never cheap when you’re talking about petabytes of data, but if Amazon’s only charging 1 cent per gigabyte of geo-distributed secure storage they must be …

See original article taken from here:

Amazon uses Blu-ray disc to archive data, just like Facebook

Amazon’s Glacier secret: BDXL

Remember when Amazon Web Services (AWS) announced Glacier, a data archiving service, almost 2 years ago? Long-term, slow-retrieval (3-5 hours) storage for 1¢/GB while maintaining several copies across geographies. Pretty amazing. Less amazing now that disk prices are reaching 3¢/GB, but there’s still power, cooling, mounting and replacement costs to consider in addition to multiple copies. Tape? Amazon denied that. Plus the long-term storage requirements for tape require a level of climate control that their data centers may not support. Not tape. Hard drives to the rescue?That left disk. Perhaps Shingled Magnetic Recording (SMR) drives that, in theory, could double existing drive density at the cost of expensive rewrites. Which an archive wouldn’t have. Seagate announced they’d sold a …

See original article taken from here:

Amazon’s Glacier secret: BDXL

What is Apache Tez?

You might have heard of Apache Tez, a new distributed execution framework that is targeted towards data-processing applications on Hadoop. But what exactly is it? How does it work? Who should use it and why? In their presentation, Apache Tez: Accelerating Hadoop Query Processing, Bikas Saha and Arun Murthy discuss Tez’s design, highlight some of its features and share some of the initial results obtained by making Hive use Tez instead of MapReduce. Presentation transcript edited by Roopesh Shenoy Tez generalizes the MapReduce paradigm to a more powerful framework based on expressing computations as a dataflow graph. Tez is not meant directly for end-users – in fact it enables developers to build end-user applications with much better performance and flexibility. Hadoop has …

See original article taken from here:

What is Apache Tez?


Also published on Medium.

Uncover Google Not Provided Data: An Interview with Chris Adams

In a recent interview from Marketo’s Marketing Nation Summit in downtown San Francisco, Murray Newlands talks with gShift Labs’ Chris Adams about how to uncover Google’s not provided data and the benefits that can have for online publishers and advertisers. To find out more, watch the full interview below: These are the key takeaways from the video: In the interview, Chris explains people are looking for more and more ways to uncover not provided data. He says not provided data is becoming an increasingly important topic for marketing executives and agencies who are struggling with the fact that their clients are asking them where their traffic has gone and are unable to give greater detail on the “not provided” segment. Chris says that over …

Read Original Article Here:

Uncover Google Not Provided Data: An Interview with Chris Adams

© 2024 Paul Parisi

Theme by Anders NorénUp ↑