Getting Started

Dry Voyages Parquet Export

The Dry Voyages Parquet Export API provides access to daily snapshot data in Parquet format for the DRY market segment (dry bulk, MPP, and OBO vessels). This API allows you to download ZIP archives containing Parquet files for a specific date, where each archive represents a snapshot of voyage data as it existed on that date.

Parquet is a columnar storage format optimized for analytics workloads, making it ideal for large-scale data processing and analysis. The API supports HTTP caching via ETag semantics, allowing efficient incremental updates.

Key Features

📦 Bulk Data Export – Download complete daily snapshot datasets for a specific date in Parquet format, optimized for analytics and data processing.

🗜️ ZIP Archive Format – Data is delivered as a ZIP archive containing one or more Parquet files, making it easy to download and process.

🔄 HTTP Caching – Support for ETag-based caching allows clients to efficiently check if data has changed without re-downloading unchanged archives.

📅 Daily Snapshots – Retrieve daily snapshots of voyage data for any date. Each snapshot represents the state of all voyages as they existed on that specific date.

⚡ Efficient Processing – Parquet format enables fast columnar queries and efficient compression, reducing storage and transfer costs.

S3 Redirection

The API returns an HTTP redirect (302/307) to a signed S3 URL where the Parquet archive is stored. Clients must follow redirects to download the actual file. Most HTTP clients (including curl and requests) follow redirects automatically, but ensure your client is configured to do so.

Example Requests

Basic Request

CURL

 curl -X GET "https://apihub.axsmarine.com/dry/voyage/parquet/v1?date=2025-12-17" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -L \
  -o voyages_2025-12-17.zip

Request with ETag Caching

First request:

CURL

 curl -X GET "https://apihub.axsmarine.com/dry/voyage/parquet/v1?date=2025-12-17" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -L \
  -D headers.txt \
  -o voyages_2025-12-17.zip

Extract ETag from headers.txt (e.g., ETag: "abc123def456"), then use it in subsequent requests:

CURL

 curl -X GET "https://apihub.axsmarine.com/dry/voyage/parquet/v1?date=2025-12-17" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "If-None-Match: \"abc123def456\"" \
  -L \
  -v

If the archive hasn't changed, you'll receive a 304 Not Modified response with no body.

Python Example with ETag Caching

Python

 import requests
import os

url = "https://apihub.axsmarine.com/dry/voyage/parquet/v1"
headers = {
    "Authorization": "Bearer YOUR_API_TOKEN"
}
params = {"date": "2025-12-17"}

# Check if we have a cached ETag
etag_file = "etag_2025-12-17.txt"
if os.path.exists(etag_file):
    with open(etag_file, "r") as f:
        etag = f.read().strip()
        headers["If-None-Match"] = etag

# requests.get() follows redirects automatically by default
response = requests.get(url, headers=headers, params=params, allow_redirects=True)

if response.status_code == 304:
    print("Archive has not changed since last download")
elif response.status_code == 200:
    # Save the ETag for next time
    if "ETag" in response.headers:
        with open(etag_file, "w") as f:
            f.write(response.headers["ETag"])

    # Save the ZIP file
    with open("voyages_2025-12-17.zip", "wb") as f:
        f.write(response.content)
    print("Archive downloaded successfully")
elif response.status_code == 404:
    print("Archive not found for the specified date")
    print(response.json())
else:
    print(f"Error: {response.status_code}")
    print(response.json())

Usage Patterns

Daily Snapshot Sync

For daily synchronization of voyage snapshot data:

Download Today's Snapshot – Download the daily snapshot archive for the target date
Store ETag – Save the ETag from the response headers
Periodic Checks – On subsequent days, use the stored ETag with If-None-Match header to check if the snapshot has been updated
Re-download if Changed – If you receive a 200 response, the snapshot has changed and should be processed

Multiple Daily Snapshots Analysis

To analyze voyage data across multiple dates using daily snapshots:

Date Range Loop – Iterate through dates
Download Snapshots – Download each date's snapshot archive
Process Parquet Files – Extract and process Parquet files using tools like Pandas, Apache Spark, or DuckDB
Compare Snapshots – Compare snapshots across dates to track changes in voyage states over time

Batch Processing

For batch processing workflows with multiple daily snapshots:

Download Multiple Snapshots – Download snapshot archives for multiple dates in parallel
Extract Parquet Files – Extract Parquet files from ZIP archives
Load into Analytics Engine – Load Parquet files into your analytics platform (e.g., Apache Spark, DuckDB, BigQuery)
Snapshot Analysis – Analyze and compare snapshots to understand voyage state changes over time

Parquet File Structure

The ZIP archive contains one or more Parquet files representing a daily snapshot. The exact structure may vary, but typically includes:

Voyage data with all fields from the Dry Voyages API, representing the state of voyages on the specified date
Columnar format optimized for analytics queries
Compression for efficient storage

Daily Snapshots

Each archive represents a daily snapshot of voyage data. This means:

Snapshot Content: The archive contains the state of all voyages as they existed on the specified date
Not Historical Records: Each snapshot is independent and does not contain historical changes or updates
State at Date: Voyages are included with their state (status, location, etc.) as of the snapshot date
Time Series Analysis: To track changes over time, download and compare multiple daily snapshots

Market Segment

This API automatically filters results to the DRY market segment, which includes:

dry – Dry bulk carriers
mpp – Multi-purpose vessels
obo – Oil/Bulk/Ore carriers

Header Parameters

If-None-Matchstring

Optional ETag value received from a previous download. If the archive has not changed, the server will respond with 304 Not Modified.

Authorizationstring Required

Bearer token used for authentication.

Query Parameters

datestring Required

Date for which to retrieve the archive (e.g., 2025-12-17). Must be in format YYYY-MM-DD.

Pattern

^\d{4}-\d{2}-\d{2}$

Response

200

Object

ZIP archive containing Parquet files

Response Attributes

responsestring

304

Object

Not Modified - the archive has not changed since the ETag supplied in the `If-None-Match` header.

400

Object

Bad request - Query-parameter `date` is required and must be in format YYYY-MM-DD.

Response Attributes

@contextstring

@idstring

@typestring

titlestring

detailstring

statusinteger

typestring

descriptionstring

404

Object

Not Found - Archive file not found for date "2012-12-16".

Response Attributes

@contextstring

@idstring

@typestring

titlestring

detailstring

statusinteger

typestring

descriptionstring

500

Object

Internal Server error

Response Attributes

@contextstring

@idstring

@typestring

titlestring

detailstring

statusinteger

typestring

descriptionstring

Sections

Dry Voyages Parquet Export

Key Features

S3 Redirection

Example Requests

Basic Request

Request with ETag Caching

Python Example with ETag Caching

Usage Patterns

Daily Snapshot Sync

Multiple Daily Snapshots Analysis

Batch Processing

Parquet File Structure

Daily Snapshots

Market Segment

Header Parameters

Query Parameters

Response

Response Attributes

Response Attributes

Response Attributes

Response Attributes

Response