Can ChatGPT help me at the office? We put the AI chatbot to the test

If ChatGPT, the buzzy new chatbot from Open AI, wrote this story, it would say:

“As companies look to streamline their operations and increase productivity, many are turning to artificial intelligence tools like ChatGPT to assist their employees in completing tasks. But can workers truly rely on these AI programs to take on more and more responsibilities, or will they ultimately fall short of expectations?”

Not great, but not bad, right?

Workers are experimenting with ChatGPT for tasks like writing emails, producing code or even completing a year-end review. The bot uses data from the internet, books and Wikipedia to produce conversational responses. But the technology isn’t perfect. Our tests found that it sometimes offers responses that potentially include plagiarism, contradict itself, are factually incorrect or have grammatical errors, to name a few – all of which could be problematic at work.

ChatGPT is basically a predictive-text system, similar but better than those built into text-messaging apps on your phone, says Jacob Andreas, assistant professor at MIT’s Computer Science and Artificial Intelligence Laboratory, who studies natural language processing. While that often produces responses that sound good, the content may have some problems, he said.

“If you look at some of these really long ChatGPT-generated essays, it’s very easy to see places where it contradicts itself,” he said. “When you ask it to generate code, it’s mostly correct, but often there are bugs.”

We wanted to know how well ChatGPT could handle everyday office tasks. Here’s what we found after tests in five categories.

Responding to messages

We prompted ChatGPT to respond to several different types of inbound messages.

In most cases, the AI produced relatively suitable responses, though most were wordy. For example, when responding to a colleague on Slack asking how my day is going, it was repetitious: “@[Colleague], Thanks for asking! My day is going well, thanks for inquiring.”

The bot often left phrases in brackets when it wasn’t sure what or who it was referring to. It also assumed details that weren’t included in the prompt, which led to some factually incorrect statements about my job.

In one case, it said it couldn’t complete the task, saying it doesn’t “have the ability to receive emails and respond to them”. But when prompted by a more generic request, it produced a response.

Surprisingly, ChatGPT was able to generate sarcasm when prompted to respond to a colleague asking if Big Tech is doing a good job.

Idea generation

One way people are using generative AI is to come up with new ideas. But experts warn that people should be cautious if they use ChatGPT for this at work.

“We don’t understand the extent to which it’s just plagiarising,” Andreas said.

The possibility of plagiarism was clear when we prompted ChatGPT to develop story ideas on my beat. One pitch, in particular, was for a story idea and angle that I had already covered. Though it’s unclear whether the chatbot was pulling from my previous stories, others like it or just generating an idea based on other data on the internet, the fact remained: the idea was not new.

“It’s good at sounding human-like, but the actual content and ideas tend to be well known,” said Hatim Rahman, an assistant professor at Northwestern University’s Kellogg School of Management, who studies artificial intelligence’s impact on work. “They’re not novel insights.”

Another idea was outdated, exploring a story that would be factually incorrect today. ChatGPT says it has “limited knowledge” of anything after the year 2021.

Providing more details in the prompt led to more focused ideas. However, when I asked ChatGPT to write some “quirky” or “fun” headlines, the results were cringeworthy and some nonsensical.

Navigating tough conversations

Ever have a co-worker who speaks too loudly while you’re trying to work? Maybe your boss hosts too many meetings, cutting into your focus time?

We tested ChatGPT to see if it could help navigate sticky workplace situations like these. For the most part, ChatGPT produced suitable responses that could serve as great starting points for workers. However, they often were a little wordy, formulaic and in one case a complete contradiction.

“These models don’t understand anything,” Rahman said. “The underlying tech looks at statistical correlations…So it’s going to give you formulaic responses.”

A lay-off memo that it produced could easily stand up and in some cases do better than notices companies have sent out in recent years. Unprompted, the bot cited “current economic climate and the impact of the pandemic” as reasons for the lay-offs and communicated that the company understood “how difficult this news may be for everyone”. It suggested laid-off workers would have support and resources and, as prompted, motivated the team by saying they would “come out of this stronger”.

In handling tough conversations with colleagues, the bot greeted them, gently addressed the issue and softened the delivery by saying “I understand” the person’s intention and ended the note with a request for feedback or further discussion.

But in one case, when asked to tell a colleague to lower his voice on phone calls, it completely misunderstood the prompt.

Team communications

We also tested whether ChatGPT could generate team updates if we fed it key points that needed to be communicated.

Our initial tests once again produced suitable answers, though they were formulaic and somewhat monotone. However, when we specified an “excited” tone, the wording became more casual and included exclamation marks. But each memo sounded very similar even after changing the prompt.

“It’s both the structure of the sentence, but more so the connection of the ideas,” Rahman said. “It’s very logical and formulaic…it resembles a high school essay.”

Like before, it made assumptions when it lacked the necessary information. It became problematic when it didn’t know which pronouns to use for my colleague – an error that could signal to colleagues that either I didn’t write the memo or that I don’t know my team members very well.

Self-assessment reports

Writing self-assessment reports at the end of the year can cause dread and anxiety for some, resulting in a review that sells themselves short.

Feeding ChatGPT clear accomplishments, including key data points, led to a rave review of myself. The first attempt was problematic, as the initial prompt asked for a self-assessment for “Danielle Abril” rather than for “me”. This led to a third-person review that sounded like it came from Sesame Street’s Elmo.

Switching the prompt to ask for a review for “me” and “my” accomplishments led to complimenting phrases like “I consistently demonstrated a strong ability”, “I am always willing to go the extra mile”, “I have been an asset to the team”, and “I am proud of the contributions I have made”. It also included a nod to the future: “I am confident that I will continue to make valuable contributions.”

Some of the highlights were a bit generic, but overall, it was a beaming review that might serve as a good rubric. The bot produced similar results when asked to write cover letters. However, ChatGPT did have one major flub: it incorrectly assumed my job title.

Takeaways

So was ChatGPT helpful for common work tasks?

It helped, but sometimes its errors caused more work than doing the task manually.

ChatGPT served as a great starting point in most cases, providing a helpful verbiage and initial ideas. But it also produced responses with errors, factually incorrect information, excess words, plagiarism and miscommunication.

“I can see it being useful…but only insofar as the user is willing to check the output,” Andreas said. “It’s not good enough to let it off the rails and send emails to your to colleagues.”

The Washington Post