Microsoft to Yahoo! The purchase of a protracted war may make a lot of people are news fatigue. But see today about the Yahoo! Technology news or worth see: do matters: Yahoo claims 2-petabyte database is world ’s, busiest amount. Yahoo! In this paper the VP Waqar Hasan disclosure of Yahoo! The current capacity of the data warehouse for 2 PB.

Used for analysis of 500 million a month user access to deal with 24 billion times every day, and behavior of the events in the world, known as the single largest and most busy database.

Although some data warehouse capacity than big yahoo. But those DB or store the relationship, or the storage of data compression of the original data after, cannot real-time analysis, yahoo and hundreds of T before such data. The Yahoo! Data warehouse storage is structured, the data can be analyzed. Expected next year could swell to dozens of PB. EBay claims data with six total PB, but according to some reports, a single biggest DB only 1.4 PB.

Yahoo! In 2005 bought a company called Mahat of startups (as is Hasan Waqar requests), the company to PostgreSQL database as the foundation, developed a new type of DB, its characteristic is based on the list and not based on done mode.

It is not difficult to understand, so the Office 2007 speed of data writing will slow down, but the speed of the read a lot faster (last year on this mud, thunder in the speech said he in baidu when doing a optimization example. And this concept is very similar, so I was said to me "has inspired").

Yahoo! Bought, in the product after the continuous improvement (internal code: ELCARO?) For example, compression, parallel processing ability, strengthen the inquiry and so on characteristics of the optimization add improvement. And in view of the user’s interface is still PostgreSQL. It also should calculate PostgreSQL in top enterprise and a successful cases.

So big database and not the use of the traditional SMP architecture, but the construction of PC for clusters (in less than 1000 units). Obviously this is an country and not the DB cluster Storage happiness. Through the above the unique design way, can this mass data for effective analysis, this was quite a technical innovation, and Google Map completely different per the calculation mode of.

Let me think about the world in a large database listing data, now seems to have not stunning. Before always said the information explosion, the age has just arrived.