Tanker Ship Status Parquet Export
The Ship Status Parquet Export API provides access to historical ship status data in Parquet format for the TANKER market segment (tanker, chemical, LNG, LPG, FSO, and OBO vessels). This API allows you to download ZIP archives containing Parquet files for a specific year, representing the consolidated historical records for that period.
Parquet is a columnar storage format optimized for analytics workloads, making it ideal for large-scale data processing and analysis. The API supports HTTP caching via ETag semantics, allowing efficient incremental updates when historical data is adjusted.
Each archive contains the historical state of ship statuses for the requested year. To ensure data accuracy, archives are updated every weekend to include any retrospective adjustments made to the historical records.
Key Features
📦 Historical Data Export – Download complete yearly datasets for a specific year (from 2013 onwards) in Parquet format, optimized for long-term trend analysis.
🗜️ ZIP Archive Format – Data is delivered as a ZIP archive containing one or more Parquet files, encompassing all data for the requested year.
🔄 HTTP Caching – Support for ETag-based caching allows clients to check if a year's archive has been updated (e.g., after weekend adjustments) without re-downloading the entire file.
📅 Yearly Archives – Retrieve data by year. The system maintains the most up-to-date version of the data for each year; older versions are replaced by the latest weekend update.
⚡ Efficient Processing – Parquet format enables fast columnar queries and efficient compression, significantly reducing memory and bandwidth usage compared to CSV or JSON.
This API is designed for bulk historical retrieval and deep analytics. For real-time status updates, consider using the standard Ship Status API endpoints.
Important: The ZIP archives are updated every weekend. If you are maintaining a local copy of historical data, we recommend performing a weekly check using ETags to ensure your local files include the latest adjustments.
S3 Redirection
The API returns an HTTP redirect (302/307) to a signed Amazon S3 URL where the Parquet archive is stored. Clients must follow redirects to download the actual file. Most HTTP clients (including curl and requests) follow redirects automatically, but ensure your client is configured to do so.
Make sure your HTTP client follows redirects. The initial API response will be a redirect to the S3 URL, and you must follow it to download the archive.
Example Requests
Basic Request
curl -X GET "https://apihub.axsmarine.com/tanker/ship-status/parquet/v1?year=2023" \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -L \ -o ship_status_2023.zipThe -L flag ensures curl follows redirects to the S3 URL.
Request with ETag Caching
First request:
curl -X GET "https://apihub.axsmarine.com/tanker/ship-status/parquet/v1?year=2023" \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -L \ -D headers.txt \ -o ship_status_2023.zipSubsequent request (extract ETag from headers.txt):
curl -X GET "https://apihub.axsmarine.com/tanker/ship-status/parquet/v1?year=2023" \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -H "If-None-Match: \"abc123def456\"" \ -L \ -vIf the archive hasn't changed since the last weekend update, you'll receive a 304 Not Modified response.
Python Example with ETag Caching
import requestsimport os# ConfigurationSEGMENT = "tanker"YEAR = 2023URL = f"https://apihub.axsmarine.com/{SEGMENT}/ship-status/parquet/v1"API_TOKEN = "YOUR_API_TOKEN"headers = { "Authorization": f"Bearer {API_TOKEN}"}params = {"year": YEAR}# Check if we have a cached ETag locallyetag_file = f"etag_{SEGMENT}_{YEAR}.txt"if os.path.exists(etag_file): with open(etag_file, "r") as f: etag = f.read().strip() headers["If-None-Match"] = etag# requests.get() follows redirects automatically by default (allow_redirects=True)response = requests.get(URL, headers=headers, params=params, stream=True)if response.status_code == 304: print(f"Archive for {YEAR} has not changed since last weekend.")elif response.status_code == 200: # Save the new ETag if "ETag" in response.headers: with open(etag_file, "w") as f: f.write(response.headers["ETag"]) # Save the ZIP file using streaming to handle large files efficiently filename = f"ship_status_{SEGMENT}_{YEAR}.zip" with open(filename, "wb") as f: for chunk in response.iter_content(chunk_size=8192): f.write(chunk) print(f"Archive for {YEAR} downloaded successfully: {filename}")else: print(f"Error: {response.status_code}") print(response.text)Usage Patterns
Weekly Historical Sync
Since historical data can be adjusted, it is recommended to sync your local data weekly:
- Weekend Check: Run your export script every Monday.
- ETag Validation: Use the stored ETag to check if the specific year has been updated.
- Refresh Local Data: If a 200 OK is returned, replace your local Parquet files with the new ones from the ZIP.
Full History Analysis
To build a complete historical database:
- Date Loop: Iterate through years (from 2013 to current).
- Download & Extract: Extract the Parquet files from each ZIP.
- Load into Engine: Use tools like DuckDB, Pandas, or Apache Spark to query across multiple years.
Parquet File Structure
The ZIP archive contains one or more Parquet files.
- Content: Ship status records including vessel identifiers, status codes, timestamps, and locations.
- Schema: Self-describing Parquet schema including field names and data types (Integer, String, Timestamp, etc.).
- Optimized for: Fast filtering by vessel, status type, or specific date ranges within the year.
Note: The archives can be quite large (hundreds of megabytes for recent years). Ensure sufficient disk space and use streaming downloads to avoid memory issues.
Header Parameters
Optional ETag value received from a previous download. If the archive has not changed, the server will respond with 304 Not Modified.
Bearer token used for authentication.
Query Parameters
Year for which to retrieve the archive (e.g., 2023). Must be an integer >= 2013.