The model, which spent two months atop OpenRouter under a stealth identity, is now publicly available under an MIT license with aggressive promotional pricing.

Chinese food-delivery and super-app company Meituan has officially released LongCat-2.0, a 1.6-trillion-parameter Mixture-of-Experts (MoE) model optimized for autonomous software engineering, publishing it on GitHub, Hugging Face, and its own platform under a commercially permissive MIT license. [1]

The release also revealed that LongCat-2.0 was the model behind “Owl Alpha,” an anonymous stealth model that had been running on the OpenRouter model-routing platform for approximately two months before Meituan claimed it. [1]

During that unbranded period, Owl Alpha processed roughly 10.1 trillion monthly tokens — averaging 559 billion tokens per day — representing a 242% month-over-month increase in volume that placed it among OpenRouter’s global top three models. [1] By the time Meituan identified itself as the developer, the model had reached the top ranking on the Hermes Agent workspace, second place on Claude Code deployments, and third place across international OpenClaw environments. [1]

Architecture and Context Window

LongCat-2.0 scales to 1.6 trillion total parameters while limiting active computation to an average of 48 billion parameters per token, with dynamic activation ranging from 33 billion to 56 billion parameters depending on query complexity. [1] The model supports a native 1-million-token context window. [1]

To sustain that context window without severe memory overhead, Meituan developed LongCat Sparse Attention (LSA), described as an evolution of DeepSeek Sparse Attention. [1] LSA addresses the quadratic memory costs typical of fine-grained sparse attention through three mechanisms: Streaming-aware Indexing, which restructures token selection into sequential memory blocks to improve High Bandwidth Memory (HBM) utilization; Cross-Layer Indexing, which amortizes attention calculation costs across adjacent model layers; and Hierarchical Indexing, which applies a two-stage coarse-to-fine scoring approach to filter token candidates before fine-grained selection. [1]

The architecture also incorporates an N-gram Embedding module that appends 135 billion parameters to a 5-gram token combination framework, expanding the core embedding space by roughly 100-fold to capture local token relationships and reduce memory input/output bottlenecks during large-batch inference. [1]

Post-Training and Benchmarks

LongCat-2.0’s post-training uses a framework Meituan calls Multi-Teacher Optimization via Mixture of Specialized Experts (MOPD), which separates optimization into three independent expert clusters: Agent Experts focused on tool invocation and self-correcting execution loops; Reasoning Experts optimized for multi-hop logic and STEM problem-solving; and Interaction Experts focused on instruction-following, factual grounding, and safety. [1] A dynamic gate-routing mechanism combines these behaviors at runtime. [1]

On the SWE-bench Pro software engineering benchmark, LongCat-2.0 scores 59.5, compared to 58.6 for OpenAI’s GPT-5.5. [1] The model also scores 70.8 on Terminal-Bench 2.1, 77.3 on SWE-bench Multilingual, and 73.2 on the general corporate workflow simulator FORTE. [1] The source notes that LongCat-2.0 trails premium frontier systems such as Claude Opus 4.8 on broader general-agent benchmarks like FORTE and BrowseComp. [1]

Pricing and Commercial Model

Standard pay-as-you-go API pricing is set at $0.75 per million input tokens and $2.95 per million output tokens. [1] A limited-time promotional discount reduces those rates to $0.30 per million input tokens and $1.20 per million output tokens. [1] All context-cache hits are processed at no charge under either pricing tier. [1]

Meituan also offers structured “Token Packs” — fixed volumetric allocations valid for 30 days — released via flash sales four times daily at 10:00, 16:00, 21:00, and 23:00 Beijing Time on a first-come, first-served basis. [1] The zero-cost cache-hit policy means that in long agentic sessions where a model repeatedly references the same large codebase, developers are only billed for cache misses and new output tokens. [1]

Trained Entirely on Domestic Chinese Chips

Meituan states that LongCat-2.0 was trained on a cluster of more than 50,000 domestic Chinese Application-Specific Integrated Circuits (ASICs), without relying on Nvidia GPUs. [1] The source frames this as evidence that near-frontier models can be scaled on non-U.S. silicon, though it also notes the broader context: the U.S. government has pressured American AI labs to restrict access to their latest models, with OpenAI limiting access to its GPT-5.6 models and Anthropic taking its Claude Fable 5 / Mythos 5 models entirely offline following government orders. [1]

Licensing and Company Background

The MIT license allows developers to modify, integrate, and redistribute LongCat-2.0 within closed-source commercial products without any obligation to disclose derivative work, in contrast to copyleft licenses such as the GNU General Public License (GPL). [1]

Meituan was founded in March 2010 by Wang Xing and reports more than 770 million annual transacting users and a merchant network of over 14.5 million. [1] The company trades on the Hong Kong Stock Exchange. [1] Its AI push began in late 2025 with LongCat-Flash, a 560-billion-parameter MoE model, followed by the reasoning-focused LongCat-Flash-Thinking. [1]


Sources

  1. VentureBeat — Meituan open sources LongCat-2.0, the 1.6T, near-frontier agentic coding model that's been leading OpenRouter — trained entirely on Chinese chips

This article was drafted with AI from the cited sources and checked against them before publication. Spot an error? Let us know.