Lecture 5
Objectives:
- What is Variety in Big data?
- What is Structured data and Unstructured data and Multistructred data?
Variety
Variety point to the many sources and types of data both structured and unstructured.We used to store data from sources like spreadsheets and databases. Now data comes in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. This variety of unstructured data creates problems for storage, mining and analyzing.
Structured data: refers to a data which are contained in relational databases and spreadsheets.
Unstructured data: is all those data that can’t be so readily classified and fit into a neat box: photos and graphic images, videos, streaming instrument data, webpages, PDF files, PowerPoint presentations, emails, blog entries,tweets, wikis and word processing documents
Multi-structured data refers to a variety of data formats and types and can be derived from interactions between people and machines, such as web applications or social networks. A great example is web log data, which includes a combination of text and visual images along with structured data like form or transactional information