Changelog

November 21

Recently added models: gpt-4o-2024-11-20, step-2-16k, grok-vision-beta
Qwen 2.5 Turbo Million Context Model: qwen-turbo-2024-11-01

November 7

Compatible with Claude native SDK, v1/messages interface is now supported;
Cache and computer usage functions for Claude native interface are not yet supported (prompt caching and computer use). We will continue to improve these features in the next two weeks.

November 5

Added new model: claude-3-5-haiku-20241022
Added Elon Musk's x.ai latest model grok-beta

October 23

Added new model: claude-3-5-sonnet-20241022

October 10

OpenAI's latest caching feature is now online. This feature currently supports the following models:

GPT-4o
GPT-4o-mini
o1-preview
o1-mini

Please note, the gpt-4o-2024-05-13 version is not officially supported.

If a request hits the cache, you will be able to see the relevant cache token data in the backend logs.

For more detailed information and usage rules, please visit the OpenAI official website: OpenAI Caching Feature Details

October 3

Reduced backend billing for the gpt-4o model, synchronized with the official pricing
Added new models: aihubmix-Llama-3-2-90B-Vision, aihubmix-Llama-3-70B-Instruct
Added Cohere's latest models aihubmix-command-r-08-2024, aihubmix-command-r-plus-08-2024

September 19

Added new models: whisper-large-v3 and distil-whisper-large-v3-en
Note: Whisper model billing is based on input seconds, but there is currently a display issue with the page pricing which will be fixed in the future. Backend billing is accurate, fully synchronized with OpenAI's official charges.

September 13

Added models o1-mini and o1-preview; Note: These latest models require changes in input parameters. Some wrapper software may report errors if default parameters are not updated.

Important Notes

Testing shows that the 01 models do not support the following and will report errors:

system field: 400 error
tools field: 400 error
image input: 400 error
json_object output: 500 error
structured output: 400 error
logprobs output: 403 error
stream output: 400 error
o1 series: 20 RPM, 150,000,000 TPM, very low, 429 errors may occur at any time
Others: temperature, top_p, and n are fixed at 1; presence_penalty and frequency_penalty are fixed at 0

September 10

Added new model: mattshumer/Reflection-Llama-3.1-70B; reportedly the strongest fine-tuned version of llama3.1-70b
Increased Claude-3 model pricing to maintain stable supply, currently 10% more expensive than directly using the official API; prices will gradually decrease in the future.
Enhanced concurrent capabilities of OpenAI series models, theoretically supporting unlimited concurrency.

August 11

Added new models: Phi3medium128k, ahm-Phi-3-medium-4k, ahm-Phi-3-small-128k
Improved stability of Llama-related models
Further optimized compatibility of Claude models

August 7

Added OpenAI's newly updated 4o version gpt-4o-2024-08-06, see https://platform.openai.com/docs/guides/structured-outputs
Added Google's latest model: gemini-1.5-pro-exp-0801

August 4

Added online direct payment recharge
Fixed Claude multi-turn conversation format error: 1. messages: roles must alternate between "user" and "assistant", but found multiple "user" roles in a row;
Optimized the index issue when using function features of Claude models
The backup server https://orisound.cn will be fully decommissioned on September 7; please switch to the main server https://aihubmix.com or backup server https://api.aihubmix.com

July 27

Added support for Mistral Large 2, model name: Mistral-large-2407 or aihubmix-Mistral-large-2407;
System optimization

July 24

Added latest llama-3.1 models llama-3.1-405b-instruct, llama-3.1-70b-versatile, and llama-3.1-8b-instant; feel free to try them out.

July 20

Fixed pricing calculation issue for gpt-4o-mini model. Details are as follows: Text input price: The price for text input of OpenAI's official gpt-4o-mini model is only 1/33 of the gpt-4o model price. Image input price: The price for image input of OpenAI's official gpt-4o-mini model is equal to the gpt-4o model price.
To ensure accuracy in price calculation, we multiply the token count for image input of the gpt-4o-mini model by 33 times to align with the official price.
More details can be found in OpenAI Official Pricing

July 19

Added support for gpt-4o-mini model, backend billing synchronized with official

July 15 Announcement

Supports the official API parameter include_usage, passing in parameters can return usage in stream mode, details see Official Documentation

July 14 Announcement

New version of nextweb added support for calling non-OpenAI models Call Non-OpenAI Models on Our Site
Added backend billing for Alibaba Qwen models, overall cost is about 10% higher than calling Alibaba Cloud official
Optimized Azure OpenAI output compatibility with OpenAI interface
Supports Claude-3's tool calling
Added many new models, see Settings for available models

July 3 Announcement

Overall backend interface optimized
Each log request record now shows the model unit price at the time of the request
Added model and pricing page Model/Pricing

June 20 Announcement

The latest claude-3-5-sonnet-20240620 is now supported, for usage instructions see How to Call Non-OpenAI Models on Our Site

June 18 Announcement

The backend log page now supports downloading usage request records

June 16 Announcement

Reduced the probability of randomly hitting Azure OpenAI, now it's almost negligible

June 13 Announcement

Lowered fees for Claude-3 related models (Claude 3 Haiku, Claude 3 Sonnet, Claude 3 Opus) to align with official backend charges; thus, the current retail price for using our API is approximately 14% off the official price.

June 10 Announcement

Overall service architecture upgrade, all servers and data migrated to Microsoft Azure;
In the future, I will conduct secondary deep development and optimization based on the open-source version of one API (we have already obtained a commercial license for the one API project through sponsorship)
Log data is too large (over 100 million request logs) and cannot be migrated temporarily; please contact customer service for previous log queries
Optimized token billing for gpt-4o, tokenizer cI100k_base changed to 0200k_base, previously gpt-4 series used cI100k_base; this results in a decrease in token count for streaming requests in Chinese, Korean, and Japanese.

June 8 Announcement

Added Alibaba's latest open-source model Qinwen2
alibaba/Qwen2-7B-Instruct, alibaba/Qwen2-57B-A14B-Instruct, alibaba/Qwen2-72B-Instruct

May 20 Announcement

Added new model gemini-1.5-flash
Added new model gpt-4o
Error accessing recharge page in Jiangsu region due to telecom hijacking of recharge domain, please contact customer service for recharge.
Added llama3 (llama3-70b-8192, llama3-8b-8192) gemini-1.5-pro, command-r, command-r-plus, feel free to try
Claude-3 model supply restored; we are currently connecting to Claude-3 endpoints deployed on AWS and Google Cloud.
To cover server costs and team expenses, Claude-3 model and price backend billing is 10% more expensive than the official
If call volume increases, prices will gradually decrease to around 5% or even lower,
Current concurrency needs testing and will apply for higher concurrent calls as usage increases.

Changelog

November 21​

November 7​

November 5​

October 23​

October 10​

October 3​

September 19​

September 13​

Important Notes​

September 10​

August 11​

August 7​

August 4​

July 27​

July 24​

July 20​

July 19​

July 15 Announcement​

July 14 Announcement​

July 3 Announcement​

June 20 Announcement​

June 18 Announcement​

June 16 Announcement​

June 13 Announcement​

June 10 Announcement​

June 8 Announcement​

May 20 Announcement​