an AI-powered agent tool that can independently perform tasks using a browser. It is now available to Pro users in the United States. (I feel like the Pro subscription I purchased is worth it!)
Operator
01
What is Operator?
. It integrates the visual capabilities of GPT-4o with advanced reasoning abilities optimized through reinforcement learning, enabling it to simulate human-like browser operations, including clicking, scrolling, and entering text.
For example, if you want to book a highly-rated day tour in Rome. Through Operator, you just need to describe your needs, and it will complete the entire process from searching to selecting the best itinerary recommended by TripAdvisor. The core of this ability is that Operator can "see" web pages and interact with them without relying on API integration.
Operator is collaborating with several well-known companies, including DoorDash, Uber, and Instacart, to optimize task execution efficiency. In addition, it is also exploring applications in the public domain, such as assisting residents in registering for city services more conveniently.
Main functions of Operator
: Suitable for handling repetitive browser tasks, such as filling out forms, placing orders, or even creating fun emoticon packs. : It can run multiple tasks simultaneously, such as customizing a mug on Etsy while booking a campsite. : Users can set personalized instructions based on specific websites, for example, prioritizing a particular airline when booking flights. :Operator will request user takeovers when login or payment information needs to be entered, ensuring that the operation is both secure and intuitive.
CUA
02
What is CUA?
is the first practical application of CUA technology.
is a universal interface model that combines the visual capabilities of GPT-4o with advanced reasoning abilities trained through reinforcement learning. It can interact with graphical user interfaces by observing and manipulating elements such as buttons, menus, and text boxes on the screen. This capability does not require specific API support, allowing CUA to directly use digital tools and web pages commonly used by humans.
The key capabilities of CUA include:
: Understanding the current interface state through screenshots. : Generating multi-step task plans via "chain-of-thought" and dynamically adjusting operation steps. : Complete operations such as clicking, scrolling, and inputting via virtual mouse and keyboard.
This enables CUA to perform complex tasks in a variety of digital environments, such as filling out forms and handling web navigation, greatly expanding the application scenarios of AI.
Technical Highlights
The development of CUA embodies years of research achievements in the field of multi-modal understanding and reasoning:
: CUA can switch between different task scenarios, such as handling web forms or performing complex cross-platform operations. : When encountering issues, CUA can make dynamic adjustments to optimize the task completion path. :CUA performs excellently in multiple benchmark tests: , the success rate of CUA's full computational tasks reaches 38.1%; as high as 87%.
Evaluation and performance
CUA has refreshed records in multiple industry benchmark tests, demonstrating its extensive adaptability:
Browser tasks: (Simulating real web page environments), the success rate of CUA reached **58.1%**. (Real website testing, such as Amazon, GitHub, etc.), the success rate was as high as **87%**. Operating system tasks: a success rate, close to humans' **72.4%**. Below👇🏻 is the comparison with peers. (It can be less capable than humans, but it must not be worse than its peers.)
CUA can improve performance through more operational steps, but there is still a gap compared to human performance, especially in more complex tasks.
Have a
Try!
03
Trial use
My pro account has finally come in handy!!!
I tried to have the AI help me arrange a one-day plan~

First, roughly outline the plan

The operator started helping me search online and prepare a strategy

Help me book a hotel

Interact with me and ask for my opinions

Help me book a flight ticket.

Call me over to Take Control when manual verification is required.
Thinking of me at this moment..

Help me search for flight tickets, and if necessary, I can take control and just make the payment. It's still very convenient.
The overall experience is quite good, feeling that my versatile and powerful assistant Wuna can finally stop being constantly bothered by me, and she can do something more meaningful for the company..