Data formats

Data represents a variety of useful information that often needs to be stored, sorted, categorized and analyzed to inform decision-making. Data is organized in data structures which represent the data as entities with attributes or characteristics.

Data can be classified as structured, semi-structured or unstructured.

Structured Data

Structured data has a fixed schema where all the data share the same fields and data type for each field. The schema for structured data is usually tabular with columns for the fields and rows for each entity. Structured data is often stored in databases with multiple tables that can reference each other with key values in a relational model.

IDNameSurnameEmail
1NaiomiNaidooNaiomi.Naidoo@technology.online
2FirstnameLastnameFirstname@yahoo.com
Structured data in a table

Semi-structured data

Semi-structured data is information that has some structure but there is variation between the entity instances.

Scenario: Some customers may have an email address while others may have multiple email addresses or no email address at all.

JavaScript Object Notation (JSON) is a common data format used for representing semi-structured data because of it’s flexible nature.

//Customer 1
{
  "id": "1",
  "name": "Naiomi",
  "surname": "Naidoo",
  "contact":
  {
    "email": "naiomi@naidoo.com",
    "phone": "+27121231234"
  }
}
//Customer 2
{
  "id": "2",
  "name": "Firstname",
  "surname": "Lastname",
  "contact":
  {
    "email": "firstname@yahoo.com",
    "phone": "+27987654321"
  }
  "location":
  {
    "city": "Sandton"
  } 
}

Unstructured data

Documents, images, audio, video and binary files can be considered unstructured data.

Types of unstructured data

Leave a Reply

Your email address will not be published. Required fields are marked *