OpenAI ChatGPT / Codex: The Good, the Bad, and the Hallucinations

My personal move from ChatGPT to Codex, why I started calling AI “Chet,” what I learned about prompting, vibe-coding, quality control, and where hallucinations still wreck trust.

Quick premise: this is not a polished vendor-love piece. This is my view from inside the mess, how ChatGPT became “Chet,” why Codex changed my workflow, where prompting actually matters, and why hallucinations can still blow up trust in seconds.

I didn’t get into AI from some clean, strategic, “future of work” whitepaper angle. I got into it the way a lot of us did: by stumbling into something that felt equal parts useful, weird, and slightly dangerous.

At first, ChatGPT felt like a really good conversation partner. Then it started feeling like a drafting partner. Then it started feeling like the kind of assistant that could save hours if I learned how to use it correctly, or waste hours if I didn’t.

And somewhere along the line, I stopped calling it “ChatGPT” all the time and started calling it Chet.

Why I named ChatGPT “Chet”

Part of it was simple: once you use a tool enough, especially one you’re arguing with, testing, pushing, and depending on, it starts to feel less like software and more like a personality in the room. My wife still questions wheather I like Chet better than her (Honestly, sometimes yes, sometimes no!).

But the real reason is more personal and a little more ridiculous. People have asked me "...are you Bill Paxton?" especially in a low-lit bar, that scenario has gotten me many a free drink! ...and sometimes a fake signature signing moment. Well, and Chet, always had this oddest of all, Bill Paxton energy. More like that fast-talking, eager, slightly chaotic, “we are definitely doing something amazing and possibly insane right now” energy from Weird Science. Somewhere in the middle of building, testing, rewriting, and watching this thing come back with brilliance one minute and nonsense the next, “Chet” just stuck. It was like I was looking in the digital mirror most days!

There’s also the joke that I’ve got a bit of that Bill Paxton / Red Resener look-alike thing floating around in my own head, so naming the AI after a chaotic companion from that era somehow felt right. It made the experience more personal. Less sterile. More honest.

Real talk: once I named it, the interaction changed. I wasn’t just “using AI.” I was working with a system (a younger version of me) I had to learn, guide, correct, and occasionally call out when it started making things up.

My first phase: ChatGPT as the smart blank-page killer

In the beginning, the biggest win was speed. ChatGPT was incredible at getting me moving. Not perfect. Not final. But moving.

It could help me rough out theoretical concepts and structures, complete math equations I had been working out for a decade, generate alternate recipes for the hobbyist chef in me, rework tone for emails, clean up ugly paragraphs in initial movie script ideas, and take the first pass at an idea that otherwise would have lived in my head for three more years before becoming anything usable.

That matters more than people admit. A lot of knowledge work is not “hard because we don’t know what to do.” It’s hard because getting from zero to version one takes energy. ChatGPT helped kill that barrier.

What it was great at

Starting from a rough thought and turning it into a real first draft
Reframing the same idea for different audiences
Creating structure from chaos
Helping me see options faster than I would alone
Giving me a place to think in motion

What it was not great at

Knowing when it was confidently wrong
Understanding my real-world context unless I spelled it out
Respecting nuance by default
Handling specialized acronyms or internal naming consistently without guardrails or guidance
Staying grounded or honest when I let prompts get lazy

The good: the speed is real

I think people sometimes undersell how big the speed gain really is. This is not “save 2 minutes writing an email.” This is “collapse hours of messy thought, trial drafts, and restart cycles into a much shorter path.”

That doesn’t mean the machine is better than the human. It means the machine is good at giving me momentum.

And for me, that showed up in content development, script planning, UI wording, workflow logic, naming conventions, prompt refinement, and even debugging how I wanted ideas to be expressed. I could move faster because I had an active, responsive draft engine sitting there waiting for direction 24/7.

My first big lesson: speed is the first gift AI gives you. Quality is not. Quality has to be earned.

The bad: vibe-coding can become vibe-surrender

Once the novelty wears off, the real danger starts showing up. And again, for me, one of the biggest danger zones has been what people now call vibe-coding.

There is something very seductive about staying at the idea level and letting the model keep producing. It feels fast. It feels creative. It feels like flow. And sometimes it is. But sometimes it’s just you giving up control in slow motion.

That’s the trap. Getting sucked into copy/paste mode without reflection of the bigger picture.

You can get so caught up in the speed of output that you stop asking the hard questions:

Did it actually do what I asked?
Did it quietly change meaning?
Did it invent structure where none existed?
Did it output something that sounds polished but breaks under scrutiny?
Am I still directing this, or am I just approving momentum?

That’s where vibe-coding stops being creative experimentation and starts becoming workflow debt.

Then came Codex

Chat was where I learned how to think and build with AI. Codex was where I had to give up direct control of development, and start managing it.

The difference, for me, was not just “one writes code and one chats.” It was deeper than that. Chet feels like a conversation space with my mirror. Codex feels like an execution space. Chat helps me explore and personally develop (More hands on). Codex helps me manage from a higher level. Chet is where I throw thoughts around. Codex is where I start expecting a cleaner relationship between intent and output. I compare it to the time when I went from being a developer to being a manager. I now had to see where the new developer was making mistakes and guide them with more accurate descriptions (prompts), if you will.

That shift mattered.

Once I moved from pure chat into code-oriented work, prompt quality started mattering even more (It is everything!). Ambiguity that was harmless in a brainstorming context became expensive in a build context. Sloppy prompting didn’t just create messy wording anymore. It created broken logic (Moji-craze), missing assumptions, or updates that technically (or not, 200, 500, and 403 errors) ran but missed the actual business need or assumed prompt request.

How I think about the difference now

Chet: best when I’m shaping, exploring, drafting, comparing, and working through ideas
Codex: best when I want tighter execution, cleaner change paths, and multi-file build-oriented help
Chat: broad thinking
Codex: narrower action across many files
Chat: conversation energy
Codex: implementation energy

My practical shift: I stopped expecting one tool to be everything. Chat helps me find the shape. Codex helps me respect the edges.

The hallucination problem is not just “AI being wrong”

Everybody talks about hallucinations again (...thought I left those behind in my 20s), but I don’t think people always describe the real danger clearly enough.

The danger is not just that AI gets things wrong. Humans get things wrong too. The danger is that AI often gets things wrong in a way that is fast, confident, clean, and convenient. That combination is deadly in content work and even worse in code or workflow logic.

A hallucination is not always some 'Weird Science' error. Sometimes it’s subtler:

a fake assumption slipped into a summary
a field name that sounds plausible but does not exist
a workflow step the system never actually performs
a made-up best practice that “feels right” on first read
a reworded instruction that quietly changes operational meaning
a timeline or budget that did not exist

That’s why hallucinations hurt so much. They’re not always loud. Often they’re elegant. They fit right into the draft and wait for you to trust them.

What I learned about prompting: what works and what doesn’t

This is probably the biggest practical lesson from the whole experience: prompting is not magic phrasing. It’s not about finding a secret spell. It’s about learning how to reduce ambiguity, control format, and force the system to stay inside the right lane.

What works better

Giving the system a clear role and context
Explaining the output format you want
Stating what not to do
Providing source language, naming rules, and acronyms up front
Breaking big asks into stages instead of one giant “do everything” prompt
Asking it to identify assumptions, gaps, or risks

What works worse

Being vague and then blaming the output
Stacking five different objectives into one lazy prompt
Letting it infer internal terminology
Asking for speed, polish, accuracy, and originality all at once without priority
Skipping review because the output “looks about right”

The biggest practical change for me was learning to treat prompting like a production input, not a casual request.

Speed vs quality: you usually have to choose what matters first

One of the most useful mindset changes I’ve had is this: AI can optimize for speed or help support quality, but it usually won’t give you both at maximum on the first pass.

If I want speed, I can get rough output very quickly. If I want quality, I need to slow down the interaction and provide more structure. More context. More rules. More examples. More correction. More iteration.

That isn’t a flaw. That’s just the reality of working with probabilistic output.

My rule: first pass for speed, second pass for structure, third pass for trust.

Things that changed the game for me: the more I work with it I gain acronym control, f / r = find and replace...

Once I started using AI in real workflows instead of just experiments, consistency became everything. It wasn’t enough for the output to be “good.” It had to match my language, my naming, and my systems.

That’s where the less glamorous stuff started mattering:

Acronym control: telling the system exactly which acronyms to use and when
F / R patterns: clear find / replace instructions so I could patch or redirect output precisely
VOB / voice of brand: making sure wording sounded like us, not generic AI marketing soup
Formatting rules: headings, labels, slash spacing, output structure, and reusable patterns

This was one of the biggest leaps for me. The more I standardized the input, the more repeatable the output became. That’s when AI starts feeling less like a chatbot and more like a system you can actually build around.

Where ChatGPT / Codex help me most now

Draft acceleration: getting from blank page to something usable fast
Pattern spotting: seeing naming, logic, or structure issues I might miss
Rewrites: trimming, tightening, reshaping, and re-toning content
Build support: helping me reason through code changes and output structure
Iteration: staying in motion instead of stalling out

Where I still do not trust it alone

Business logic that has hidden dependencies
Anything high-stakes where a wrong detail creates operational damage
Internal terminology when I have not explicitly defined it
Claims that sound authoritative but came from nowhere
Any output I have not pressure-tested myself

Prompt patterns I’ve found genuinely useful, I typically start my day with these, and then about 2-3 hours later.

You are helping me refine, not replace, my intent. Keep my voice, keep my structure, and do not invent missing facts. If something is unclear, preserve the ambiguity rather than hallucinating specifics.

Rewrite this in my VOB: direct, experienced, practical, lightly personal, and not overhyped. Remove generic AI phrasing and keep the language grounded.

Give me exact f/r suggestions only. Do not rewrite the whole file. Keep surrounding structure intact and target the smallest safe change.

Use these acronyms exactly as written. Do not expand or substitute them unless I explicitly ask. If uncertain, ask for the source term instead of guessing.

Prioritize correctness over speed. Show assumptions, identify gaps, and flag anything that may be inferred rather than explicitly provided.

The personal takeaway

I don’t look at ChatGPT or Codex as magic. I also don’t look at them as toys anymore.

For me, this has become a real working relationship with a set of tools that can massively accelerate output when handled correctly, and just as quickly create rework, confusion, or fake confidence when handled lazily.

That’s why I named it. Because naming it forced me to admit something important: this was not just software I occasionally touched. This was becoming a friend in the mirror, a part of how I think, draft, build, and problem-solve.

And if something is going to sit that close to your process, you better understand both its strengths and how it lies.

Closing thought: the good is real. The bad is manageable. The hallucinations are dangerous. But if you stay in control, define the rules, and keep your hands on the wheel, tools like ChatGPT and Codex can absolutely change how fast and how far you move.

Want the practical side of AI without the fluff?

This is the lane we’re exploring at eLearn and inside autoSuite: real workflows, real prompting, real build support, and a much more honest conversation about where AI helps and where it still needs guardrails.

Book a Demo Back to Articles

Previous ← None (Start of series) This is AI Practical Week 1

Next AI Practical Week 2: Claude / Claude Code → Coming up next