Overview
In the study note series, this post covers the broad Google Database options available. All details are accurate at the time of writing, please refer to Google for current details.
There are many existing GCP Database comparisons already out there, however I haven’t found any that include details like ACID compliance, SQL vs NoSQL, ingestion methods, etc. I have therefore put together a blog that:
- Differentiates between them
- Describes how to load data into them
- And in the case of the NoSQL, include examples of how to query them
Comparison
The following are the key features that will determine when to use a given option.
Option | SQL | Category | Indexing | Workload | Storage Limits | ACID[i] |
Cloud SQL | Yes | RDBMS | Yes | OLTP | 30,720 GB | Yes |
Cloud Spanner | Yes | RDBMS | Yes | OLTP[ii] | 2TB per Node | Yes |
Cloud Datastore | No | Document | Yes | Document | Entities <= 1 Mb | Partial |
Cloud Bigtable | No | key/value | Yes | HTAP[iii] | Petabytes | No |
BigQuery | Yes[iv] | RDBMS | No | OLAP[v] | Petabytes | Yes |
Cloud Firestore | No | Document | Yes | Document | Document <= 1 MiB | Yes |
Ingestion
Ok, assuming you have chosen a Database option, the next question is how to ingest data.
Tool | Options |
Cloud SQL | GCP Console from a CSV file Google Cloud Storage (GCS)Cloud Dataflow from any compatible source |
Cloud Spanner | GCP Console from an AVRO file GCSCloud Dataflow from any compatible source |
Cloud Datastore | Managed Export and Import service via Command line from GCSFrom BigQuery via consoleCloud Dataflow from any compatible source |
Cloud Bigtable | Cloud Dataflow from any compatible source |
BigQuery | GCP Console, command line, API, or client library from Avro, CSV, JSON, ORC or Parquet files in GCSGCP Console from Cloud Datastore exports in GCSGCP Console from Cloud Firestore exports in GCSCloud Dataflow from any compatible source |
Cloud Firestore | GCP Console from Cloud Firestore exports in GCSCloud Dataflow from any compatible source |
NoSQL Querying
The final question is how do we extract the data we have ingested. No doubt we are all familiar with SQL but less so with NoSQL, the following are example Python queries
Option | Examples |
Cloud Datastore | query = client.query(kind=’Task’) query.add_filter(‘done’, ‘=’, False) query.add_filter(‘priority’, ‘>=’, 4) query.order = [‘-priority’] query = client.query(kind=’Task’) query.add_filter(‘done’, ‘=’, False) From https://cloud.google.com/datastore/docs/concepts/queries |
Cloud Bigtable | table = instance.table(table_id) row_key = ‘r1’row = table.read_row(row_key.encode(‘utf-8’))column_family_id = ‘cf1’column_id = ‘c1’.encode(‘utf-8’)value = row.cells[column_family_id][column_id][0].value.decode(‘utf-8’)print(‘Row key: {}\nData: {}’.format(row_key, value))From https://cloud.google.com/bigtable/docs/reference/libraries |
Cloud Firestore | var citiesRef = db.collection(“cities”); var query = citiesRef.where(“state”, “==”, “CA”);From https://firebase.google.com/docs/firestore/query-data/queries |
[i] ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of database transactions[ii] Online Transactional Processing[iii] Hybrid Transactional/Analytical Processing[iv] Two versions, Legacy and Standard[v] Online Analytical Processing