Dry Voyages Parquet Export
The Dry Voyages Parquet Export API provides access to daily snapshot data in Parquet format for the DRY market segment (dry bulk, MPP, and OBO vessels). This API allows you to download ZIP archives containing Parquet files for a specific date, where each archive represents a snapshot of voyage data as it existed on that date.
Parquet is a columnar storage format optimized for analytics workloads, making it ideal for large-scale data processing and analysis. The API supports HTTP caching via ETag semantics, allowing efficient incremental updates.
Key Features
📦 Bulk Data Export – Download complete daily snapshot datasets for a specific date in Parquet format, optimized for analytics and data processing.
🗜️ ZIP Archive Format – Data is delivered as a ZIP archive containing one or more Parquet files, making it easy to download and process.
🔄 HTTP Caching – Support for ETag-based caching allows clients to efficiently check if data has changed without re-downloading unchanged archives.
📅 Daily Snapshots – Retrieve daily snapshots of voyage data for any date. Each snapshot represents the state of all voyages as they existed on that specific date.
⚡ Efficient Processing – Parquet format enables fast columnar queries and efficient compression, reducing storage and transfer costs.
S3 Redirection
The API returns an HTTP redirect (302/307) to a signed S3 URL where the Parquet archive is stored. Clients must follow redirects to download the actual file. Most HTTP clients (including curl and requests) follow redirects automatically, but ensure your client is configured to do so.
Example Requests
Basic Request
curl -X GET "https://apihub.axsmarine.com/dry/voyage/parquet/v1?date=2025-12-17" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-L \
-o voyages_2025-12-17.zipRequest with ETag Caching
First request:
curl -X GET "https://apihub.axsmarine.com/dry/voyage/parquet/v1?date=2025-12-17" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-L \
-D headers.txt \
-o voyages_2025-12-17.zipExtract ETag from headers.txt (e.g., ETag: "abc123def456"), then use it in subsequent requests:
curl -X GET "https://apihub.axsmarine.com/dry/voyage/parquet/v1?date=2025-12-17" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "If-None-Match: \"abc123def456\"" \
-L \
-vIf the archive hasn't changed, you'll receive a 304 Not Modified response with no body.
Python Example with ETag Caching
import requests
import os
url = "https://apihub.axsmarine.com/dry/voyage/parquet/v1"
headers = {
"Authorization": "Bearer YOUR_API_TOKEN"
}
params = {"date": "2025-12-17"}
# Check if we have a cached ETag
etag_file = "etag_2025-12-17.txt"
if os.path.exists(etag_file):
with open(etag_file, "r") as f:
etag = f.read().strip()
headers["If-None-Match"] = etag
# requests.get() follows redirects automatically by default
response = requests.get(url, headers=headers, params=params, allow_redirects=True)
if response.status_code == 304:
print("Archive has not changed since last download")
elif response.status_code == 200:
# Save the ETag for next time
if "ETag" in response.headers:
with open(etag_file, "w") as f:
f.write(response.headers["ETag"])
# Save the ZIP file
with open("voyages_2025-12-17.zip", "wb") as f:
f.write(response.content)
print("Archive downloaded successfully")elif response.status_code == 404:
print("Archive not found for the specified date")
print(response.json())
else:
print(f"Error: {response.status_code}")
print(response.json())Usage Patterns
Daily Snapshot Sync
For daily synchronization of voyage snapshot data:
- Download Today's Snapshot – Download the daily snapshot archive for the target date
- Store ETag – Save the ETag from the response headers
- Periodic Checks – On subsequent days, use the stored ETag with
If-None-Matchheader to check if the snapshot has been updated - Re-download if Changed – If you receive a 200 response, the snapshot has changed and should be processed
Multiple Daily Snapshots Analysis
To analyze voyage data across multiple dates using daily snapshots:
- Date Range Loop – Iterate through dates
- Download Snapshots – Download each date's snapshot archive
- Process Parquet Files – Extract and process Parquet files using tools like Pandas, Apache Spark, or DuckDB
- Compare Snapshots – Compare snapshots across dates to track changes in voyage states over time
Batch Processing
For batch processing workflows with multiple daily snapshots:
- Download Multiple Snapshots – Download snapshot archives for multiple dates in parallel
- Extract Parquet Files – Extract Parquet files from ZIP archives
- Load into Analytics Engine – Load Parquet files into your analytics platform (e.g., Apache Spark, DuckDB, BigQuery)
- Snapshot Analysis – Analyze and compare snapshots to understand voyage state changes over time
Parquet File Structure
The ZIP archive contains one or more Parquet files representing a daily snapshot. The exact structure may vary, but typically includes:
- Voyage data with all fields from the Dry Voyages API, representing the state of voyages on the specified date
- Columnar format optimized for analytics queries
- Compression for efficient storage
Daily Snapshots
Each archive represents a daily snapshot of voyage data. This means:
- Snapshot Content: The archive contains the state of all voyages as they existed on the specified date
- Not Historical Records: Each snapshot is independent and does not contain historical changes or updates
- State at Date: Voyages are included with their state (status, location, etc.) as of the snapshot date
- Time Series Analysis: To track changes over time, download and compare multiple daily snapshots
Market Segment
This API automatically filters results to the DRY market segment, which includes:
- dry – Dry bulk carriers
- mpp – Multi-purpose vessels
- obo – Oil/Bulk/Ore carriers
Header Parameters
Optional ETag value received from a previous download. If the archive has not changed, the server will respond with 304 Not Modified.
Bearer token used for authentication.
Query Parameters
Date for which to retrieve the archive (e.g., 2025-12-17). Must be in format YYYY-MM-DD.