Getting Started

Tanker Ship Status Parquet Export

The Ship Status Parquet Export API provides access to historical ship status data in Parquet format for the TANKER market segment (tanker, chemical, LNG, LPG, FSO, and OBO vessels). This API allows you to download ZIP archives containing Parquet files for a specific year, representing the consolidated historical records for that period.

Parquet is a columnar storage format optimized for analytics workloads, making it ideal for large-scale data processing and analysis. The API supports HTTP caching via ETag semantics, allowing efficient incremental updates when historical data is adjusted.

Each archive contains the historical state of ship statuses for the requested year. To ensure data accuracy, archives are updated every weekend to include any retrospective adjustments made to the historical records.

Key Features

📦 Historical Data Export – Download complete yearly datasets for a specific year (from 2013 onwards) in Parquet format, optimized for long-term trend analysis.

🗜️ ZIP Archive Format – Data is delivered as a ZIP archive containing one or more Parquet files, encompassing all data for the requested year.

🔄 HTTP Caching – Support for ETag-based caching allows clients to check if a year's archive has been updated (e.g., after weekend adjustments) without re-downloading the entire file.

📅 Yearly Archives – Retrieve data by year. The system maintains the most up-to-date version of the data for each year; older versions are replaced by the latest weekend update.

⚡ Efficient Processing – Parquet format enables fast columnar queries and efficient compression, significantly reducing memory and bandwidth usage compared to CSV or JSON.

This API is designed for bulk historical retrieval and deep analytics. For real-time status updates, consider using the standard Ship Status API endpoints.

Important: The ZIP archives are updated every weekend. If you are maintaining a local copy of historical data, we recommend performing a weekly check using ETags to ensure your local files include the latest adjustments.

S3 Redirection

The API returns an HTTP redirect (302/307) to a signed Amazon S3 URL where the Parquet archive is stored. Clients must follow redirects to download the actual file. Most HTTP clients (including curl and requests) follow redirects automatically, but ensure your client is configured to do so.

Make sure your HTTP client follows redirects. The initial API response will be a redirect to the S3 URL, and you must follow it to download the archive.

Example Requests

Basic Request

Plain text

 curl -X GET "https://apihub.axsmarine.com/tanker/ship-status/parquet/v1?year=2023" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -L \
  -o ship_status_2023.zip

The -L flag ensures curl follows redirects to the S3 URL.

Request with ETag Caching

First request:

Plain text

 curl -X GET "https://apihub.axsmarine.com/tanker/ship-status/parquet/v1?year=2023" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -L \
  -D headers.txt \
  -o ship_status_2023.zip

Subsequent request (extract ETag from headers.txt):

Plain text

 curl -X GET "https://apihub.axsmarine.com/tanker/ship-status/parquet/v1?year=2023" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "If-None-Match: \"abc123def456\"" \
  -L \
  -v

If the archive hasn't changed since the last weekend update, you'll receive a 304 Not Modified response.

Python Example with ETag Caching

Plain text

 import requests
import os
# Configuration
SEGMENT = "tanker"
YEAR = 2023
URL = f"https://apihub.axsmarine.com/{SEGMENT}/ship-status/parquet/v1"
API_TOKEN = "YOUR_API_TOKEN"
headers = {
    "Authorization": f"Bearer {API_TOKEN}"
}
params = {"year": YEAR}
# Check if we have a cached ETag locally
etag_file = f"etag_{SEGMENT}_{YEAR}.txt"
if os.path.exists(etag_file):
    with open(etag_file, "r") as f:
        etag = f.read().strip()
        headers["If-None-Match"] = etag
# requests.get() follows redirects automatically by default (allow_redirects=True)
response = requests.get(URL, headers=headers, params=params, stream=True)
if response.status_code == 304:
    print(f"Archive for {YEAR} has not changed since last weekend.")
elif response.status_code == 200:
    # Save the new ETag
    if "ETag" in response.headers:
        with open(etag_file, "w") as f:
            f.write(response.headers["ETag"])
    # Save the ZIP file using streaming to handle large files efficiently
    filename = f"ship_status_{SEGMENT}_{YEAR}.zip"
    with open(filename, "wb") as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    print(f"Archive for {YEAR} downloaded successfully: {filename}")
else:
    print(f"Error: {response.status_code}")
    print(response.text)

Usage Patterns

Weekly Historical Sync

Since historical data can be adjusted, it is recommended to sync your local data weekly:

Weekend Check: Run your export script every Monday.
ETag Validation: Use the stored ETag to check if the specific year has been updated.
Refresh Local Data: If a 200 OK is returned, replace your local Parquet files with the new ones from the ZIP.

Full History Analysis

To build a complete historical database:

Date Loop: Iterate through years (from 2013 to current).
Download & Extract: Extract the Parquet files from each ZIP.
Load into Engine: Use tools like DuckDB, Pandas, or Apache Spark to query across multiple years.

Parquet File Structure

The ZIP archive contains one or more Parquet files.

Content: Ship status records including vessel identifiers, status codes, timestamps, and locations.
Schema: Self-describing Parquet schema including field names and data types (Integer, String, Timestamp, etc.).
Optimized for: Fast filtering by vessel, status type, or specific date ranges within the year.

Note: The archives can be quite large (hundreds of megabytes for recent years). Ensure sufficient disk space and use streaming downloads to avoid memory issues.

Header Parameters

If-None-Matchstring

Optional ETag value received from a previous download. If the archive has not changed, the server will respond with 304 Not Modified.

Authorizationstring Required

Bearer token used for authentication.

Query Parameters

yearinteger Required

Year for which to retrieve the archive (e.g., 2023). Must be an integer >= 2013.

Minimum

2013

Response

200

Object

ZIP archive containing Parquet files

Response Attributes

responsestring

304

Object

Not Modified – the archive has not changed since the ETag supplied in the `If-None-Match` header.

400

Object

Bad request - Query-parameter `year` is required and must be numeric and >= 2013.

Response Attributes

@contextstring

@idstring

@typestring

titlestring

detailstring

statusinteger

typestring

descriptionstring

500

Object

Internal Server error

Response Attributes

@contextstring

@idstring

@typestring

titlestring

detailstring

statusinteger

typestring

descriptionstring

Sections

Tanker Ship Status Parquet Export

Key Features

S3 Redirection

Example Requests

Basic Request

Request with ETag Caching

Python Example with ETag Caching

Usage Patterns

Weekly Historical Sync

Full History Analysis

Parquet File Structure

Header Parameters

Query Parameters

Response

Response Attributes

Response Attributes

Response Attributes

Response