TA的每日心情 | 擦汗 2019-6-16 23:34 |
---|
签到天数: 1277 天 [LV.10]大乘
|
本帖最后由 冰蚁 于 2016-1-26 09:44 编辑
大数据目前处于非常原始的阶段,和以前的统计/ data mining并没有特别显著的区别。大部分公司挂个大数据的名,干着以前的事,因为根本玩不转那么多数据。前几年就已经有大数据是个筐,什么都往里装的说法。另一个公司的朋友说,他们业内已经不提 big data 这个词。
我觉得目前有点类似互联网兴起后的泡沫,要崩掉一两次后,大概会有一个比较清晰的模式出来。另外,人工智能也得跟上来。这样才能玩转大数据。
PS, 附一段 big data 的定义。我看楼上对 big data 定义有走偏的趋势。
Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time.[13] Big data "size" is a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data. Big data requires a set of techniques and technologies with new forms of integration to reveal insights from datasets that are diverse, complex, and of a massive scale.[14]
In a 2001 research report[15] and related lectures, META Group (now Gartner) analyst Doug Laney defined data growth challenges and opportunities as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources). Gartner, and now much of the industry, continue to use this "3Vs" model for describing big data.[16] In 2012, Gartner updated its definition as follows: "Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization."[17] Gartner's definition of the 3Vs is still widely used, and in agreement with a consensual definition that states that "Big Data represents the Information assets characterized by such a High Volume, Velocity and Variety to require specific Technology and Analytical Methods for its transformation into Value".[18] Additionally, a new V "Veracity" is added by some organizations to describe it,[19] revisionism challenged by some industry authorities.[20] The 3Vs have been expanded to other complementary characteristics of big data:[21][22]
Volume: big data doesn't sample; it just observes and tracks what happens
Velocity: big data is often available in real-time
Variety: big data draws from text, images, audio, video; plus it completes missing pieces through data fusion
Machine Learning: big data often doesn't ask why and simply detects patterns[23]
Digital footprint: big data is often a cost-free byproduct of digital interaction[22]
The growing maturity of the concept more starkly delineates the difference between big data and Business Intelligence:[24]
Business Intelligence uses descriptive statistics with data with high information density to measure things, detect trends, etc..
Big data uses inductive statistics and concepts from nonlinear system identification[25] to infer laws (regressions, nonlinear relationships, and causal effects) from large sets of data with low information density[26] to reveal relationships and dependencies, or to perform predictions of outcomes and behaviors.[25][27]
In a popular tutorial article published in IEEE Access Journal,[28] the authors classified existing definitions of big data into three categories: Attribute Definition, Comparative Definition and Architectural Definition. The authors also presented a big-data technology map that illustrates its key technological evolutions. |
|