Using the Arbimon Python SDK

The Python SDK is provided for Arbimon and Rainforest Connection users to access the platform programmatically. For example, to integrate with your own workflow to upload or download files automatically, or to manually download a dataset to train your own model offline.

Before you begin

Ensure you have Python version 3.8 or later + pip
Download the latest version of RFCx Python SDK, or use the pypi published version
Install the SDK and dependencies using pip

pip install requests
pip install rfcx-0.2.3-py3-none-any.whl

Authentication

Before you can access the Arbimon platform from Python, you must authenticate with your Arbimon login. You will only have access to projects, sites and recordings that you can see in your Arbimon account. To authenticate, create a file named example.py.

example.py

import rfcx
# Authenticationclient = rfcx.Client()client.authenticate()
for project in client.projects():    print(f"{project['name']} ({project['id']})")

When you run this example ( python example.py ) the first time, the authenticate method will prompt you to login by opening a URL in your browser. After successful login, you should see a list of your projects. Your credentials will be saved in a file named .rfcx_credentials to be used for subsequent runs (without requiring a login prompt).

Downloading audio

A common use of the SDK is to download selected files (e.g. to train a new model). The download_audio_files method performs a download of audio files from a specific site (stream) between two timestamps (min_date and max_date). The following script downloads the files from a site with id "48spsgsxzf5l" between 1 May 2022, 00:00:00 UTC and 31 Aug 2022, 23:59:59 UTC.

example-download-site.py

import datetimeimport rfcx
client = rfcx.Client()client.authenticate()
client.download_audio_files(dest_path='./audio',                            stream='48spsgsxzf5l',                            min_date=datetime.datetime(2022, 5, 1, 0, 0, 0),                            max_date=datetime.datetime(2022, 8, 31, 23, 59, 59),                            parallel=False)

You might not know the ids of the sites (streams) in advance, so the SDK provides methods for querying your projects and sites (streams). The following script searches for a project named "My project" and downloads audio from all the sites in that project between 1 May 2022, 00:00:00 UTC and 31 Aug 2022, 23:59:59 UTC.

Be sure to replace "My project" with your project name (or a project you have access to in Arbimon) and set the min and max dates to minimise the amount of data downloaded.

example-download-project.py

import datetimeimport rfcx
client = rfcx.Client()client.authenticate()
# Get project information from matched project nameproject = client.projects(keyword='My project')[0]
# Get all streams information in projectstreams = client.streams(projects=project['id'])
# Loop streams to download audiofor stream in streams:    print(f"id: {stream['id']}, name: {stream['name']}")    client.download_audio_files(dest_path='./audio',                                stream=stream['id'],                                min_date=datetime.datetime(2022, 5, 1),                                max_date=datetime.datetime(2022, 8, 31),                                parallel=False)

Uploading audio

You can programatically upload audio into Arbimon using the SDK. It is a two-step activity, first the audio is uploaded and then it is processed or "ingested" (after which you will see it as a "Recording" in Arbimon). The ingestion process will fail if invalid, corrupted or duplicate files are detected. Therefore, we recommend scripts check that the ingest process is successful and logs any failures.

example-ingest.py

# ... import rfcx, create client and authenticate
stream_id = 'abc123ef'timestamp = datetime.datetime(2023, 1, 15, 12, 45)filepath = 'test-file.wav'
# Step 1: uploadprint('Upload starting')ingest_id = client.ingest_file(stream_id, filepath, timestamp)print('Uploaded')
# Step 2: check statusstatus, status_name, failure_message = client.check_ingest(ingest_id)print(f'Ingest status: {status_name} {failure_message if status >= 30 else ""}')
status, status_name, failure_message = client.check_ingest(ingest_id, wait_for_completion=True)print(f'Ingest status after wait: {status_name} {failure_message if status >= 30 else ""}')<br>

Ingest status codes

10 → queued/processing
20 → success (recording is available in Arbimon)
30+ → failed

See the GitHub repository for a full ingest example.

Further information

Please see the SDK documentation for a complete reference of methods and parameters.