State the three characteristic properties of Big Data.
Volume, Velocity and Variety.
Explain what is meant when big data is said to have a large volume.
- Too much data to fit on one server computer, even with many hard drives
- Data must be distributed across multiple servers, each with many hard drives.
Explain what is meant when big data is said to have a large velocity.
- New data is created rapidly.
- Data sets can frequently change each millisecond.
Explain what is meant when big data is said to have a wide variety.
Data is in multiple formats, such as photos, videos and text and unstructured.
Give three examples of big data.
- Video surveillance.
- Traffic data.
- Bank transaction monitoring.
Explain why relational databases are not suited for analysing big data.
- Big data is unstructured, so cannot fit within rigid row, column format of a relational database.
- Big data is hosted across many server computers, traditional relational database management systems will slow down greatly if scaled with many machines.
State three features of functional programming languages, which make it ideal for processing functional data.
- Immutable data structures.
- Statelessness.
- Higher order functions.
Explain what is meant by a higher order function in functional programming.
A function which takes functions as its input and/ or outputs a function.
Explain what is meant by statelessness in functional programming
A function call will always return the same result, no side effects.
Explain what is meant by immutable data structures in functional programming.
Data structures, such as arrays, lists, floats cannot be changed during execution, only new data structures made from existing data structures.
Explain why the three essential features of the functional programming paradigm make it ideal for processing big data.
- Allows for the distributed, parallel processing, where many computers perform operations on the same data set, to provide a very quick response.
- Makes it easier to write correct code, which is already optimised for distributed parallel processing.
State the three essential features of the fact based model.
- No index, new data appended with timestamp.
- Each fact captures a single piece of information.
- Each fact is immutable (no risk of losing data due to human error)