Database – My Microsoft Azure Journey

Data represents a variety of useful information that often needs to be stored, sorted, categorized and analyzed to inform decision-making. Data is organized in data structures which represent the data as entities with attributes or characteristics.

Data can be classified as structured, semi-structured or unstructured.

Structured Data

Structured data has a fixed schema where all the data share the same fields and data type for each field. The schema for structured data is usually tabular with columns for the fields and rows for each entity. Structured data is often stored in databases with multiple tables that can reference each other with key values in a relational model.

ID	Name	Surname	Email
1	Naiomi	Naidoo	Naiomi.Naidoo@technology.online
2	Firstname	Lastname	Firstname@yahoo.com

Structured data in a table

Semi-structured data

Semi-structured data is information that has some structure but there is variation between the entity instances.

Scenario: Some customers may have an email address while others may have multiple email addresses or no email address at all.

JavaScript Object Notation (JSON) is a common data format used for representing semi-structured data because of it’s flexible nature.

//Customer 1
{
  "id": "1",
  "name": "Naiomi",
  "surname": "Naidoo",
  "contact":
  {
    "email": "naiomi@naidoo.com",
    "phone": "+27121231234"
  }
}

//Customer 2
{
  "id": "2",
  "name": "Firstname",
  "surname": "Lastname",
  "contact":
  {
    "email": "firstname@yahoo.com",
    "phone": "+27987654321"
  }
  "location":
  {
    "city": "Sandton"
  } 
}

Unstructured data

Documents, images, audio, video and binary files can be considered unstructured data.

Cosmos DB is a distributed database engine with core features provided for any type of implementation model.

Features of Cosmos DB

Turnkey global distribution

Cosmos DB enables global data distribution and availability as a configuration setting in the portal, via command-line or ARM template, making data replication to a new location within the chosen region as seamless as possible. Both manual and automatic failover is supported as well as multi-read and multi-write from primary and replica databases.

Elastic storage and throughput

Cosmos DB will automatically scale database storage and throughput in a pay for consumption based model. There is no need to pre-provision resources to account to future growth. Cosmos DB measures throughput in a standardized way referred to as Request Units (RUs) and can be considered as an abstraction of physical resources. RUs are provisioned per second, eg. 2000 RU/s.

Throughput is provisioned at a database or container level.

Container Level	Database Level
Isolated throughput	Containers share throughput

Low latency

Microsoft’s financially back SLA provides performance metrics for read and write requests < 10 ms 99% of the time.

Flexible consistency model

Data replication options are available over 5 sliding scale consistency models to optimize the database for a specific workload. Consistency can be configured globally per connection.

Credit : https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels

Enterprise-grade security

A unified security model exists across all APIs, providing built-in encryption at rest and in-transit. IP-based access control is supported.

To connect to a Cosmos DB, 2 pairs of keys, read-write and read-only are used and managed by the service to control access to the account and data.

APIs

Cosmos DB exposes data through a variety of models and APIs. When you request data using a specific API, Cosmos DB will automatically handle the translation of data from the underlying data format to the data model required for the API.

API	Description
*SQL API*	Core API with many unique features. Supports JavaScript logic and SQL queries.
*MongoDB API*	Compatible with MongoDB v3.2 protocol. Supports aggregation pipeline.
*Gremlin API*	Compatible with the Apache TinkerPop graph traversal language (Gremlin). Returns results in GraphSON (extended JSON) format.
Table API	Service-level compatibility with Azure Storage Tables. Migrate applications with no code changes.
*Cassandra API*	Supports Cassandra Query Language (CQL) v4 protocol. Works out of the box with CQL shell.
*etcd API*	Implements etcd wire protocol. Can be used as a backing store for Azure Kubernetes Service.

Resource Model

Data in Azure Cosmos DB is stored in a hierarchy of resources.

Indexing

Cosmos DB automatically indexes all fields within all items or documents by default. While indexing can be useful for many workloads, indexing all fields and items can have a performance impact on more complex data sets.

Performance optimization to control and tune indexing is possible to balance trade-offs between write and query performance.

Index policies can be created to configure indexes by specifying the following:

List of paths to index
Different types of indexing to perform
List of paths to exclude

Types of indexes

Range	Hash	Spatial
Provides comparison functionality	Quick lookup for exact match information	Used for geographical information

Tag: Database

Data formats