Debugging Meltano: When UUID Fields Break Your Data Pipeline
Jan 29, 2025 · 327 words · 2 minutes read
It’s early and the coffee machine is still broken, but thankfully my wife ordered us coffee and pastries to kick-start the day.
I’m tackling updates to our data warehouse this morning. We take a selective approach to BigQuery replication, choosing specific tables rather than mirroring our entire database. Today’s focus is on tables needed for an internal report.
Our data replication stack centers on Meltano, with all configuration living in a single meltano.yml file. Usually, adding new tables is straightforward - just a few lines of config. Usually.
I’ve got my development environment ready: PyCharm for the data-warehouse project (admittedly overkill for YAML editing), Sublime for notes, k9s for monitoring logs, and cloud-sql-proxy running for local testing.
One of Meltano’s pain points is its logging. Almost any error triggers a Traceback, which frustratingly splits across multiple log messages - even in JSON format. After some digging, I found the relevant error:
{"consumer": false, "producer": true, "string_id": "<REDACTED>", "cmd_type": "elb", "run_id": "493311be-c472-4ca3-8df4-70b45b4b73e2", "job_name": "<REDACTED>", "stdio": "stderr", "name": "<REDACTED>", "event": "singer_sdk.exceptions.StreamMapConfigError: Invalid key properties for '<REDACTED>': [customer_id,id]. Property 'id' was not detected in schema.", "level": "info", "timestamp": "2025-01-29T14:33:41.442927Z"}
Meltano claims it can’t find the id column, despite it definitely existing in the source table. I’ve seen this before - it’s not always what it seems. While this error can indicate a missing column, it can also mean Meltano can’t handle the column’s type. In this case, we’re dealing with a UUID type in Postgres, which doesn’t have a direct equivalent in BigQuery.
Fortunately there is an easy solution: Add an inline stream map to cast the id column to a string. After implementing this fix and testing against our dev environment, everything works as expected.
stream_maps:
TABLE_NAME:
id: str(id)
I wish the Meltano had a smarter way to handle default conversions when the destination doesn’t support the source data type. Unfortunately I don’t have time today to file a feature request or dig into the BigQuery target code myself.