Firefox-CI ETL#
The Firefox-CI ETL gathers data related to the Firefox-CI Taskcluster instance and stores it in a series of BigQuery tables. When making changes to the ETL, it’s important to test them before pushing to production.
Most likely, changes will be in the docker-etl repository.
There are staging versions of all the components:
A GCS bucket in
moz-fx-dev-relengA BigQuery dataset in
moz-fx-dev-relengA service account with access to both, as well as read access to the L1 and L3 workers projects (for obtaining metrics from Google Cloud Monitoring).
You must supply your own pulse account.
Setup#
Before you begin testing, you’ll need to:
Create a new user via Pulse Guardian (if you don’t already have one).
Clone mozilla/docker-etl
Then run:
cd jobs/fxci-taskcluster-export cat << EOF > config.dev.toml [pulse] user = "<pulse username>" password = "<pulse password>" [storage] project = "moz-fx-dev-releng" bucket = "fxci-etl-dev" [bigquery] project = "moz-fx-dev-releng" dataset = "fxci_etl_dev" EOF
Finally login to GCP with the GCloud CLI:
gcloud auth login --update-adc
The above requires that your personal credentials have access to all the necessary resources. There’s also a service account you can use if needed.
Running the ETL#
Setup the virtualenv:
uv venv
uv pip install -r requirements/test.txt
uv pip install -e .
Run fxci-etl:
uv run fxci-etl --help
Depending on what you’re testing, there are currently two subcommands you’ll be interested in:
# processes the pending messages in the pulse queues and inserts them into BQ
fxci-etl pulse drain --config config.dev.toml
# ingests data from Google Cloud Monitoring and exports it to BQ
fxci-etl metric export --config config.dev.toml
Important
The ETL does not currently clean up the pulse queues after itself (by design),
and running fxci-etl pulse drain will automatically create all the necessary
queues if they don’t exist. So if you run this, you MUST manually delete them via
PulseGuardian after you are done, otherwise they will grow indefinitely and cause
issues for the pulse server.
After running, inspect the moz-fx-dev-releng.fxci_etl_dev dataset in the GCP console and verify there is data. The tables under this dataset will automatically be re-created by the ETL if they don’t exist, so feel free to delete them if you want a fresh start.