Into Asia
Hosted by writers Chang Che and Ian Buruma, Into Asia explores how China, Japan, and Korea are reshaping the world. From memory politics to AI and demographic decline, they connect history and current affairs to reveal the new role Asia will play in the twenty-first century.
Editing by Sydney Watson
Into Asia
A Year Inside ByteDance's AI Lab
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
From 2025 to 2026, Zhang Chi was an AI researcher at ByteDance Seed, the division behind Doubao, China's most-used chatbot. Now a assistant professor at Peking University, he joins Chang for a rare insider account of life at one of China's top AI labs. He talks about ByteDance's two-hour nap culture, the shortcuts Chinese labs take to catch up to the US, the ubiquity of Claude Code, and why he thinks the gap with American frontier labs is widening.
Chang Che: How did you get into AI?
Zhang Chi: I spent four years in Hangzhou — I studied at Zhejiang University. I majored in computer science, and I sort of started getting interested in AI after Andrew Ng’s Coursera course on machine learning. I found it pretty interesting, especially the digit recognition, the very introductory digit recognition project. I moved to the University of California, Los Angeles, and spent sort of four or five years doing my PhD in AI with Zhu Songchun. Then in 2021, after I graduated, I moved back to Beijing and spent, I guess, 2.5 wonderful years at BIGAI, the Beijing Institute for General Artificial Intelligence, and then I joined ByteDance and became officially a large language model engineer. After a year there, I decided that I would like to do more of the research part of AI, and then I moved back to Peking University and started this tenure-track assistant professorship.
Chang Che: For listeners who don’t know, Zhu Songchun is a really renowned AI scientist. He was involved in a lot of the early — what in the AI field is known as the data-driven approach, which is the approach that current large language models are built on: this idea that you can sort of derive patterns [and intelligence] from data. He moved back to China around 2020 to start his own institute at Peking University, which is where Zhang Chi — is it fair to say that you sort of followed him back to China?
Zhang Chi: Yeah, I think it’s fair.
Chang Che: Zhu Songchun is well known for being opposed to this paradigm of drawing on neural networks — which the field calls the scaling hypothesis: the idea that you can use a large amount of computing to improve intelligence. And he doesn’t think that this is the right path towards artificial general intelligence. Do you agree with that view, and has your time at ByteDance, where you were working primarily on building large language models, changed your view on that?
Zhang Chi: I sort of agree with Zhu Songchun that this data-scaling thing is not the whole picture of AI, but to be fair, or from a practical standpoint, I think this data-driven scaling thing is the only method that works across this entire field. So it’s something that — it’s not enough, it’s not everything, but it’s sort of like a very important component in building an AI system that actually works. So the short answer is, I agree with his standpoint, but to be fair, I believe it’s also one way towards this general AI, this goal of general AI.
Chang Che: When you were at BIGAI, what were you primarily doing? Because this was before the rise of ChatGPT, right?
Zhang Chi: Yeah, before ChatGPT came out, no one really knew a unified approach to everything. The entire thing was divided into multiple subgroups. There was a group focusing on CV — basically computer vision — natural language processing, cognitive reasoning, simulation, multi-agent systems.
Chang Che: So basically different branches of AI.
Zhang Chi: I forgot to mention a robotics division. It was like six divisions, and we were trying to use his theory to develop models and methods that could actually work in the specific field.
Chang Che: And so what changed after ChatGPT?
Zhang Chi: Yeah. So basically, after ChatGPT came out, initially people didn’t believe it. People believed that it was something that worked but would not work as well as expected. So initially, at BIGAI, people just realized that this thing called ChatGPT could be useful in some ways, but people still didn’t believe that it could be useful in reasoning, robotics, multi-agent systems — basically, it wouldn’t work in every area. But as time went by, people gradually realized that it might be a way, and people started to use this transformer model for everything. For robotics, there was this visual language model; basically, everyone uses transformers or this large language model as the foundation for multi-agent systems. So people gradually realized that it could be something that works in almost every area. And so basically after a few years, every team used that model. People still have their tasks of interest, but basically everything was based on transformers at —
Chang Che: BIGAI.
Zhang Chi: Yeah, exactly.
Chang Che: Tell me, just chronologically, what were the years that you were at BIGAI?
Zhang Chi: I think I joined initially as an intern before I officially graduated from UCLA — that was 2020. I joined in 2020 and I left in 2025. So basically a year of internship, and then through 2.5 years as a research scientist.
Chang Che: So in 2025 you joined ByteDance. Tell me a little bit about what ByteDance was looking for at that time.
Zhang Chi: Yeah, so I think I can talk a little bit about the history. There was this AI lab called ByteDance AI Lab, at ByteDance, before the ChatGPT era. I didn’t know fully about what they were doing, but they were definitely doing some kind of research. Some team members of AI Lab later left and joined this Seed department. So basically, I think Seed was officially launched in — I guess 2024, or maybe late 2023. Now it’s the main AI department. Before that, it was this AI Lab, but it was basically dismantled later, and everyone in the lab basically joined Seed in a way or another. In late 2024, there was this DeepSeek moment. So before the DeepSeek moment came out, ByteDance, the Seed department, was basically trying to catch up with GPT-4o, and then by December 2024, the team, by their own standard, believed they’d caught up to GPT-4o. But then came this DeepSeek moment, and the team realized that their model was not good enough. So when I joined in early 2025, basically the entire department was trying to implement this reinforcement learning thing and trying to have their own thinking model. That’s kind of the history.
Chang Che: So you’re saying that within ByteDance, they thought that they had already reached the same level as GPT-4o, which is OpenAI’s latest model —[not the latest model sorry I misspoke]— and then they realized that one of their own Chinese models really showed that they were not there yet.
Zhang Chi: Same level by their own standard.
Chang Che: Right. Because at the time, as I recall, at least a lot of the benchmarking and AI leaders in the United States were saying that China was significantly behind. I remember the numbers used to be like one to two years, or one year, or six months — these were some of the things coming out. But then the DeepSeek moment really changed that. And so you joined ByteDance at the DeepSeek moment. What was your job at ByteDance to improve in the reinforcement learning direction?
Zhang Chi: Yeah, I focused on math and coding for the past year. So we basically used a programming language called Lean. It’s a formal language that can express math in code. The good thing about Lean is that it’s automatically verified. So as long as you can express math in Lean and it passes the compiler and is taken as a correct solution, you can basically think of it as a right proof of the problem. So I was working on this project — the goal of the team back then was to create their own model that could reach gold-medal performance at the IMO, the International Math Olympiad. So my job was basically making that happen.
Chang Che: One question I had about this point was: why did ByteDance care so much about math? I understand that math and these kinds of brute-logic questions were one of the weak points of early large language models. Another way to ask this question is: how much of the resources were diverted towards making large language models good at math versus others at ByteDance?
Zhang Chi: Okay. So there was a more significant effort in using informal language to prove math problems, or solve reasoning problems or logic-based problems. But there was also a comparably smaller part devoted to this formal part. So it was more like a side project — not completely a side project, but back then it was like, imagine ByteDance suddenly has a math model that can solve IMO, it would be a perfect thing to brag about, for their capability in large language models. So I guess it was basically the logic of what the team leader, what the division leader, would think about for our team. So it was more for publicity.
Chang Che: Can you give us a sense of where your department sat within the broader — how many divisions there were in Seed working on things beyond the math improvement section, and then where Seed sits in the broader ByteDance ecosystem, because ByteDance is massive, right?
Zhang Chi: Definitely one of the hundreds of departments in ByteDance, but currently the company invested most of their money in AI research, and it’s basically the foundation of their apps towards customers. The Seed department provides models for their apps to customers — everyone knows Seedance, right?
Chang Che: In case people don’t know what Seedance is, it’s the video generator. Recently it blew up in China because it was used to build a lot of microdramas — these sort of small dramas on your iPhones, on your smartphones, that you can watch. It’s very good at generating real life. So that’s kind of the model that was also a part of Seed. So there were people that were building that at Seed.
Zhang Chi: Yeah, so in the Seed department, I think the largest part of Seed is the LLM subdivision, and we also have a world model subdivision — that was also a legacy of AI Lab, like the AI-for-science part in Seed. I believe there are others — I cannot enumerate every one of them, but the team I was affiliated with is Seed LLM. There’s also a VLM part of this Seed division. And in LLM there are many teams, each one specializing in a specific stage of LLM training. For example, there’s a pre-training team, a post-training team, and there was this math team, which was the team I was in. I believe there’s also an evaluation team. I guess basically each team is devoted to different stages of LLM training and deployment.
Chang Che: And was Liang Rubo, the CEO of ByteDance, involved in the Seed team? Or — ByteDance is so big — who was the leader of the Seed team?
Zhang Chi: Liang Rubo is the CEO, and then Wu Yonghui, who joined roughly at the same time as I did, is now the official leader of the entire Seed.
Chang Che: I’m curious how the Seed team thought of this competition. Did they conceive of their competition mainly as domestic players like DeepSeek?
Zhang Chi: I guess the team leaders would like to position Seed models as a top-tier model across the globe — that basically involves ChatGPT, Claude, and Gemini. But very unfortunately, I don’t think we caught up. So the more realistic expectation is to be number one in the domestic market, which I don’t think is realized either.
Chang Che: How intense is it at ByteDance to be an LLM researcher in 2025, 2026? How stressful is it? Are you guys meeting in the morning every day, and the leaders go, “We have to beat OpenAI”?
Zhang Chi: From my perspective, I think the environment or atmosphere is pretty chilled out in one way. The only other thing is that everyone is basically benchmaxxing. So the leaders of teams would focus primarily on how the model performs on the specific benchmark, particularly for the benchmark that a leader is responsible for. So everyone has a very — I think — implicit pressure. If you can’t really match or reach the top performance on the benchmark, that basically means you didn’t do a great job, and that would affect your evaluation in the future. I think no one really pushes you into doing anything, but you sort of have to make sure that you did something useful for the company.
Chang Che: So you’re saying that there were clear benchmarks, and what was success? Success was basically that you were beating Google’s Gemini and OpenAI and Claude on these benchmarks, and failure was that you guys weren’t?
Zhang Chi: Basically. But it’s okay to just catch up — don’t be too far behind.
Chang Che: And within those benchmarks, how was ByteDance doing while you were there?
Zhang Chi: Yeah, I think everyone was doing sort of benchmaxxing, but it doesn’t translate to good performance in real-world use cases.
Chang Che: Right. So you’re saying that benchmaxxing is this idea that you’re kind of playing to the scoreboard rather than actually having a good, capable model overall.
Zhang Chi: Yeah. I think on paper, every big tech company in China has a good model that can sort of catch up with the latest frontier models in the US, but it turns out that from my understanding — from my experience interacting with those models — I don’t think they’re good enough.
Chang Che: Do you have a sense that this kind of benchmaxxing is happening across the board in China?
Zhang Chi: Personally, I don’t have much firsthand experience working in other companies, but I guess, as far as I know from friends working in those companies, everyone is sort of doing the same thing. On one hand, everyone is trying to benchmaxx, but on the other, they should have their own private evaluation set that really tells them how they’re doing. I guess for those frontier models like Claude and Gemini and GPT, one very good thing is that they have rich interactions with users across the globe, and they have some ways to ask their users to tag their answers, and they can use those responses as additional training data and gradually improve their model. So I guess that’s one really big advantage for them.
Chang Che: Can you walk us through what a typical week at ByteDance looks like?
Zhang Chi: From my own experience, working at ByteDance, particularly in the math group, it’s really not a big pressure for everyone. So basically we show up at 10:30 — some people show up at 11:00 — and we leave at around 9:00, 9:30. And then we have around two hours of sleep time at noon. So it was roughly nine hours of work, but that also includes dinner time.
Chang Che: You guys have naps? You guys have afternoon naps?
Zhang Chi: Within the company we have naps — like, for two hours, a nap. So basically starting from 12, and then to 2:00 PM, no one will arrange any meetings, so you can do whatever you want. You can have a nice sleep or whatever. You can also go out to walk or whatever. And that also includes dinner. So ByteDance offers free lunch and dinner. You basically spend an hour or an hour and a half at dinner, and then you get back to work at basically 7:30. You can spend another two hours on work and leave at 9:30, and after 9:30 you can have a free ride home. And we don’t really have many meetings. There was basically one group meeting a week. Other meetings, you just arrange for yourself. So for the other times, you don’t have meetings. You just work on your own and do really useful things, and basically realize your OKRs or KPIs.
Chang Che: Sometimes I wonder about the differences between ByteDance and, say, a company like DeepSeek. One difference that I’ve noticed is that ByteDance is such a commercialized, such a big company that’s actually churning out products for customers. You also mentioned that Seed is not independent of this — that the large language models that they push out, like Seedance, the video generator, are integrated with a lot of these products. How much of that impacts your day-to-day work at ByteDance, because these kinds of pressures don’t exist at, for example, DeepSeek, which isn’t really so integrated with consumer-facing products? So I wonder whether you — yeah, so one of the first questions is, how much of that is present in your work, just keeping in mind the kinds of products that the large language models will have to integrate with downstream? And is that an annoyance? Do you find that a problem, or is that a good thing?
Zhang Chi: I think, except for the math team — for the pre-training team, for the post-training team, they should be fully responsible for the models that they deliver. So for them, it would be great pressure to catch up with the frontier model. So for them, I guess the pressure will be completely different from ours. Those teams need to be responsible for the models that are deployed online — also how the model performs, where the failure cases are, they need to fix them. But for the math team, it was formed initially as a research-oriented team, and the single goal of the team was to reach good performance at IMO last year. After we achieved silver-medal performance in 2025 — that was basically July or August — after that, I started to find that the team was becoming more responsible for delivery. Like, we needed to sort of join the main forces in pushing the model’s performance in math. The team started to carry some responsibility for the model’s performance in specific areas, and we started to do things that are not that research-y. I think from then, things got a little boring and not that interesting.
Chang Che: Who won the gold when you said silver?
Zhang Chi: Gemini won the gold. OpenAI claimed that they won the gold, but it was sort of an unverified thing.
Chang Che: What do you feel explains the difference between your models and Gemini and models like OpenAI’s in these kinds of performances? Is it a scaling thing? Is it something that just takes time? Or is there some kind of ingenuity that you guys don’t know about?
Zhang Chi: I don’t think it was about ingenuity. I think it was more about engineering and scaling. Let me tell you: rumor has it that Google can train or perform a full round of LLM training, both pre-training and post-training, in three months. Like, one iteration every three months. But ByteDance — probably we can only do one iteration in half a year. So it’s quite some difference. The good thing about Google is that they have many more researchers and cards, basically TPUs. They sort of outsource some of their data-labeling efforts to experts around the globe. Then they have more TPUs, that also translates to faster iteration in the larger model. So to be fair, I don’t think any Chinese company can catch up with them soon.
Chang Che: So what you’re referring to there is the quality. So when you say that the training takes longer, that presumably has to do with the chips, right — the semiconductors that you guys have at ByteDance? Can you tell us a little bit more about that? Like, what’s the current stack of chips that ByteDance uses? Are they using primarily Nvidia chips, or what is the current stack?
Zhang Chi: We basically use Nvidia chips. We use this Chinese version of the H20. That’s definitely legal. And then I knew that the company was procuring the B300, something like that. I don’t know — maybe in international markets, but definitely not in mainland China. So that part I don’t have too much to say about, honestly.
Chang Che: Do they have H100s, the sort of premier chips?
Zhang Chi: They definitely have them, but — I think before it was banned, they definitely had some H100s.
Chang Che: Right. And do you know where they are, physically? Like, do they have clusters in different parts of China and abroad?
Zhang Chi: I don’t have any specific information about how many cards, like H100s or H200s, they have and where they are. But ByteDance has many data centers, and most of the data centers — including Alibaba and, I guess, Tencent — these data centers are in Inner Mongolia, or basically northern China, because it’s cold enough.
Chang Che: Because of the difficulty of procuring chips, was there a sense of scarcity — a scarcity mentality — at ByteDance?
Zhang Chi: Yeah, I think the fastest chips are reserved for the most important teams, for example, pre-training and post-training. But other teams, I guess, we use H20. I think the company is trying to buy more chips, but as far as I know, every tech company in China is facing this challenge of where to get more chips, including Alibaba and Tencent.
Chang Che: Did you hear about the news that companies are being pressured by the Chinese government to use domestic chips? Is that something that you’re privy to? Is that true?
Zhang Chi: I knew that Seed has these domestic chips, but I don’t think any team trying to accelerate is actually using those chips. I don’t think teams that look for faster iterations would use those chips because it’s really hard to make full use of them. And there are many challenges. I think some teams, when we talk about model deployment — I guess some teams would consider those chips, but definitely not for training.
Chang Che: I see. So basically for the cream-of-the-crop models, you don’t use the domestic chips, but for other use cases and commercializing or whatever, that’s potentially true. But isn’t that what DeepSeek is doing? Isn’t DeepSeek trying to build its next version of its large language model on Huawei chips? That’s what I have heard.
Zhang Chi: Yeah, the news said that they’re going to release their next version of the model later this month, and presumably it’s going to be deployed partly on Huawei chips, but I’m not sure about it.
Chang Che: Do you get pressure from the government to go in a certain direction in AI development? Is there discussion within the company like, “Sorry, that’s kind of the direction we have to take because the government says so”? Do you feel that pressure, or do you feel like it’s pretty insulated and you guys are mostly working on just benchmarks and trying to make the best large language model?
Zhang Chi: We didn’t face that challenge. I think the company is basically a fully commercialized one. Although it’s in China, it definitely receives some pressure from the government, but I guess the primary goal right now is to catch up. The fastest way is to use the most advanced chips.
Chang Che: I think it was a quote — I don’t know who said it — but basically the idea is that so far the Chinese models have been really good at catching up to the American versions; they haven’t really provided many new innovations. Would you agree with that?
Zhang Chi: Yeah, I definitely do. I don’t even agree with the assumption that Chinese models are catching up — I believe we’re still far behind. I guess the gap is getting larger, very sadly.
Chang Che: So say more about that. Why do you think that is?
Zhang Chi: I’ve been using Claude lately, and also Codex — Claude Code, Codex, vibe coding. I find it to be extremely useful. Some parts of me don’t want to grow any PhD students right now — I can just use Claude Code and Codex for my students, and I can tutor them and give them instructions on doing what I want. And by doing that, I can save quite a large amount of money. But on the other hand, what I think about is that if I don’t grow this new generation of students, who would do the research after I get older? That’s kind of a dilemma. But if I really want to use the Chinese models for the jobs I’m interested in, I don’t find them to be really practical, and I don’t find them to be useful at all. So it’s sad news.
Chang Che: So you’re saying that the difference is significant in coding, in these kinds of agentic versions of AI that are coming out now.
Zhang Chi: Yeah.
Chang Che: Why do you think that is? Because I remember like a year ago we were saying China was a few months, or maybe six months, behind, but now it seems like with these agentic versions of AI, there seems to be less optimism in the Chinese community that they’ll catch up — or maybe that’s just you. But I have heard some pessimism as well from other people. Why do you think that is?
Zhang Chi: I think it’s primarily because of this healthy loop of user feedback. You use more of their model, and they get more of your feedback, and then train a better model. So it’s like a healthy iteration.
Chang Che: So it’s like a first-mover advantage.
Zhang Chi: But Chinese models started not as good, so no one really uses them for like really important things, and so they don’t have really good feedback, and so they can’t have better data, and the models continue to be not that good. I guess that’s the primary reason. And I think on the other hand, since those companies are really on the frontier, they can spend more resources on getting more useful data. But Chinese companies are basically in a catching-up position, so they don’t really have that time and resources to do other advanced things in building models. They have to first catch up, which is still some distance. So I guess it’s going to be some pressure for them.
Chang Che: I don’t know if you had heard about the Claude Code leak. ByteDance and some companies started to analyze that.
Zhang Chi: Yeah, so actually one of my colleagues was doing a modified version of Claude Code. We called it TongTong — Tong was like the brand of BIGAI, the Beijing Institute for General Artificial Intelligence. So we basically had a modified version of that. But I guess the most important thing is not this framework of Claude Code — the most significant part of it is the API that it really calls. So you can change the API for Qwen or DeepSeek, but I don’t think it’s going to be really useful compared to using Claude’s model or maybe OpenAI’s model.
Chang Che: So you’re saying that it’s not enough that the Claude Code source code was leaked?
Zhang Chi: It was far from enough. People basically call it a harness. I think the most important thing is still the model in the backbone, the foundation model that it calls.
Chang Che: So recently you have decided to leave ByteDance to take a position at Peking University. Was there something about the industry that had disillusioned you, or did you just feel that you wanted to do more research-focused stuff rather than commercial?
Zhang Chi: After spending a year at ByteDance, I gradually realized that LLM engineering in a company isn’t that really interesting. You basically use this off-the-shelf framework of either supervised fine-tuning or reinforcement learning. After those frameworks were built, what is left to do is basically getting your data clean enough and good enough for the model — there wasn’t really a lot to do in general. So I started to feel that my time was wasted in a company because I was not doing many interesting things. I was thinking about, maybe I should transition back to research. So I left, and I joined the university roughly a month ago, and I started to find it much more interesting. I talk to students about new ideas and how we can implement them. In numerous discussions, I find that there are so many things that we can do to improve the current LLM landscape. But the sad thing is that the company is not investing in those areas yet — but maybe after you get it done, get it to work, they would try to buy you out.
Chang Che: What do you mean by that?
Zhang Chi: I mean, after you show some promising results, they might try to hire you. They might try to ask you to implement that system or implement that idea in the company’s internal framework.
Chang Che: So the picture that you were drawing basically is that the Chinese LLM companies, including ByteDance, Alibaba — but mostly let’s focus on ByteDance — they’re so focused on catching up to the leading LLMs that they’re sort of laser-focused on one road, and they’re not really opening their eyes to maybe other potential avenues of approach. Is that a fair characterization?
Zhang Chi: Yeah, I think it’s fair.
Chang Che: So at the risk of getting really technical, what are you doing at Peking University that you feel is innovative and is something that maybe LLMs are missing and can improve on?
Zhang Chi: We are basically doing algorithmic improvement in LLM inference. So I think most people in this industry know that it’s not LLM training that costs the most amount of money — it’s LLM inference. Inference could be slow, inference could take a lot of resources, inference needs to handle concurrency and all of those problems. But the thing is that what the company is really doing is just adding more cards to solve this problem, which is a fast and effective way. But from my understanding, there are many aspects where you can improve on the algorithm side. If your algorithm can improve or increase the inference efficiency — let’s say not by a lot, maybe five percent — that should save a lot of money for a company.
Chang Che: It’s a kind of efficiency. Isn’t this kind of the same argument that DeepSeek had purported to make — that we were able to sort of improve our algorithm in a way that we don’t actually need that many chips to build such a strong, powerful model?
Zhang Chi: At least they showed this perspective in their V3 model and their R1 model. For them, it was basically some improvement in model architecture, but I guess there are other areas where you can do this improvement. And I sort of think that’s the future for LLMs if people really want to commercialize them, because currently chip prices are so high, and companies are basically spending money to buy users and they’re not really profiting from it.
Chang Che: So I think you basically had suggested that you feel like the gap in capabilities between the US and China is actually growing rather than closing. Is that a fair assessment? And do you feel that view is shared by others in the Chinese AI world who are involved in building LLMs, or is that something that you hold?
Zhang Chi: I think students around me and colleagues around me would agree with that. But I guess for those leaders in the recently publicly listed companies like Zhipu or MiniMax, I think leaders in those companies wouldn’t agree with me. I guess they would say that they’re catching up.
Chang Che: So why do you think the gap is widening? I’m sure the chip sanctions is certainly one of them. Do you feel like that’s the driving reason?
Zhang Chi: I think there are many aspects of it, but I think the general answer is that we’re not iterating fast enough, which is definitely tied to chips.
Chang Che: And anything else? Any other factors you feel are pertinent?
Zhang Chi: I don’t think we’re getting high-quality data, and I guess many people in the companies would try to do it in a fast way instead of investing in their data pipeline. By “fast,” I can’t be too explicit about how they’re fast, but — Claude recently has posted that it detected many distillation attempts. I guess that’s one way that some companies try to be fast. By doing that, they’re trying to really circumvent the challenge of building their own data pipeline, which is a very big problem right now. So I guess that’s one part. And the other part is definitely chips. And also, I guess we still lag behind in infra — that includes chips but also involves training frameworks and all of this infrastructure basics in a company. For example, when I interned at Google, their infra was just so good. You can use a graphical interface for writing code, and you don’t really care about what their infrastructure looks like, and you just run the code and it’s really smooth. There’s a huge difference between the infrastructure at Google and ByteDance.
Chang Che: Can you explain a little bit what distillation is?
Zhang Chi: Basically, when you want to have high-quality data, there are two ways. One is that you find a really good data labeler, someone that can write a really great explanation for, let’s say, a math problem. They can write a very long, very detailed, correct explanation, and also how they derive it and how the explanation leads to the answer. The other way, which is simpler, is that you just try to ask Gemini or Claude or GPT, and most of the time, if the question is not hard enough, they will give you the right answer, and you just copy that answer. You put their answer into your training data and try to train the model on those data. So it’s kind of the easy way, because as you can imagine, calling ChatGPT or Claude or Gemini doesn’t really take you a lot of time and money. But if you try to ask the data labeler, it costs some money and it costs some time, and it’ll be much slower in iteration. So I guess a lot of companies in China go the easy way.
Chang Che: I had a question about the idea of being sort of AGI-pilled. How prevalent do you feel that idea is — that scaling and LLMs will be able to lead to some kind of generalized artificial intelligence or super intelligence? How prevalent is that belief, do you think, among the ByteDance leadership, for example?
Zhang Chi: Speaking of AGI, I don’t think this data-scaling thing will lead us to this general artificial intelligence, but I do believe that it’s going to be very useful. I started to think about the difference between Claude Code and engineers. My understanding is that Claude Code, or basically these coding agents, are good enough to build most of the aspects of, let’s say, your app. But what’s most important, what’s left for programmers, is this creativity — that you can really find the demands of people and try to build a useful thing for them. But on execution, you can delegate your coding agents to do that. I’m very concerned about Chinese companies’ positions in these areas. I don’t find them, including ByteDance, to be good enough — especially their coding agents — to be good in any of these areas that I’m interested in. But it doesn’t mean that it’s not going to be used for some people, because there are white-collar workers that basically do some paperwork. So I guess that could be useful.
Chang Che: What would you say is an advantage, if there is one, of the Chinese ecosystem versus the American?
Zhang Chi: I think China’s advantage is in manufacturing. So we can really build into embodied systems — the intelligence of language models, I guess, will be really useful. On the other hand, I initially believed that China has lower labor costs, so we can get more data, but it turns out that it might not be true, because those American companies can outsource their data-labeling jobs to people around the globe. On the other hand, I don’t know how they did it, but their data quality seems to be higher than ours. I guess because they had their own data pipeline, but we sort of spent too much time on distillation. I guess we have like the top technique in distillation, but it maybe doesn’t really translate to an advantage in a way or another.
Chang Che: We were sort of blurring the lines between large language models, like ChatGPT, and the embodied systems that are currently being developed for robotics. So help us draw that line for us.
Zhang Chi: Yeah, so currently you hear a lot of news about Chinese embodied AI and basically embodied robots. Especially Unitree, which is going to go public. And most of the companies are doing basically locomotion. Some of them are doing manipulation. Locomotion basically means moving, and manipulation means grasping, like hand movement to do a human’s everyday job. But they don’t really think about building intelligence into these embodied robots. I think some companies are thinking about it, but I don’t have a clear picture of who is doing what. But from my understanding, it’s still in its early stage. I guess gradually, if you want to differentiate yourself from others, sooner or later someone will try to claim that they’ve done it.
Chang Che: My understanding is that robots like Unitree’s, which are ubiquitous in the US — you can see videos of these robots, they’re very sleek, like, they’re all a little bit silvery and short — they actually don’t have a brain in the sense that they don’t have an LLM equipped to them, or what they call a VLA, a vision-language-action model, right? They’re mostly moving — at least when we see them in the Spring Festival and they’re dancing, that’s not really done by a large language model or the kind of conventional AI that we’re talking about. That’s being done by strictly coded rules, pre-programmed. So what you’re saying is kind of the next step is, how do we imbue these machines with intelligence, with dexterity, so that they know how to pick up a cup in whatever scenario the cup is? Do you feel like China — my instinct is that China has an advantage here just because — you mentioned this too — there’s just so many use cases and environments in which Chinese robots are already quite ubiquitous, right? So there’s a lot of ways of collecting the data for this. Do you think that China has an advantage in embodied intelligence?
Zhang Chi: From my understanding — on one hand, China is really good at manufacturing. So Unitree has this huge advantage in building their motors. And on the other hand, I guess currently Chinese language models are good enough for these simple tasks. Though it might not specifically involve actions — it’s currently still language models — but I guess it wouldn’t take very long for people to try to add this action part into their transformers. So I guess it’s not going to be a very hard thing. But I guess I question a little bit more about the use cases of these humanoid robots. If you want to perform some specific tasks, it’s not necessary that you have a humanoid version of a robot. You can have some other simpler forms of robots that can also do the tasks — a dog, or something else. But basically I guess it’s not going to take very long for people to think about that.
Chang Che: So who do you think is leading that? Who’s the DeepSeek — who’s going to have the DeepSeek moment of embodied AI in China in the next few years?
Zhang Chi: I really don’t have an answer, because most people know Unitree, but Unitree is only good at building the motors, which translates to their advantage in locomotion. But I don’t think anyone is very good at manipulation, and we probably need robots that are specifically designed for specific tasks.
Chang Che: But that’s the direction that you’re working in, right, at Peking University?
Zhang Chi: Some researchers at BIGAI, at the institute, are working in that direction.
Chang Che: If 2025 was the DeepSeek moment, what do you think is the lesson of 2026? We’ve just started, it’s April, and agentic AI is huge now. OpenClaw is everywhere in China, and I’d love to hear your thoughts on OpenClaw and that sort of frenzy in China.
Zhang Chi: Yeah, I actually had my own OpenClaw, and for me — to be honest, it was not really useful for me, because I don’t think I care too much about it, but I think privacy is still an issue. I’m not really comfortable handing all of my personal information to the agent. But on the other hand, some of my students are more optimistic about it. I can tell this story — one of my students basically uses OpenClaw as a supervisor of their training, of the model training process. It basically tries to debug the program itself if it runs into some problems. For example, there are some GPU issues, or some training just stalls, and it tries to kill the process and finds the bug and reruns it. So it was like a magic moment. I guess the group is basically thinking about vibe research. We can just basically give them instructions and ask them to, let’s say, set up a cron job, basically a monitoring job — every hour or every 30 minutes, it will check the status of your training program, something like that. Then we can just go to sleep.
Chang Che: So is everybody in AI in China now using Claude Code?
Zhang Chi: I don’t think so, because for me, I’m more comfortable using Copilot. I’m coding, and Copilot suggests some edits, but I also use this agent mode.
Chang Che: But they’re using American agent modes. Everybody is using agentic AI from the US, basically.
Zhang Chi: Yes.
Chang Che: So is that going to mean that in the future, whenever ByteDance rolls out another large language model, it’s going to be partially built by Claude Code? Like, in Claude — they’re like, “we use Claude Code to build some part of Claude.” Is the future of Chinese LLMs going to be partly Claude-Code-coded?
Zhang Chi: I think so, because there are some international divisions of ByteDance, and they can use Cursor. That basically means that it can be built by Claude Code. That’s legal — that’s also possible for the international divisions of the company.
Chang Che: Great. Okay. Best of luck, and thank you so much for doing this, Chi. I really appreciate it.
Zhang Chi: Thank you for having me again.