When I embarked on my journey of getting N8n to work with Whisper, ChatGBT and WordPress, little did I know about the complexity that lay ahead. On the face of it, everything seemed fairly straightforward – after all, there should ideally be a node for it, right? To my surprise, nodes for Whisper and ChatGBT were hard to come by. I’ve heard rumblings about some recent developments in this direction, but they seemingly hadn’t wooed my release channel yet. The fast-paced evolution of technology throws us such curveballs, doesn’t it?
My aim was to explore the exciting domain of agentic AI. The idea was not just to have the AI supplying code samples or engaging in a casual back-and-forth. I wanted it to actually ‘do’ something for me. And I must say, the mission was a success. The AI took commands about my interests and needs, and transformed them into a blog post in language that felt natural and less AI-ish.
However, this wasn’t without its fair share of hurdles. N8n’s UI came with some stringent limitations. It didn’t give me options to send the binary data of the recording along with the parameters for the Whisper model. On top of this, navigation was a major pain since the UI was virtually inaccessible to screen-readers — a frustratingly counterintuitive experience for the most part. With essential functions buried behind unlabeled buttons, it was like finding a needle in a haystack.
To counter these obstacles, I did some sleuthing into the JSON for the node I was writing. Once figured out, this could be directly uploaded to the workflow for testing. Again, testing was not a cake-walk given that the test button wasn’t always an available choice. Sometimes, I had to run the entire workflow to test it, which as you can imagine, was quite taxing.
Besides, the error-check button wasn’t exposed via ARIA. By the way, ARIA, or Accessible Rich Internet Applications, is a standard that aids screen-readers in understanding HTML objects that lack text alternatives. So there you have it, another bump in the road.
Despite these setbacks, I persisted and was finally able to demonstrate the sheer potential of N8n, proving it as a truly powerful tool. Mind you, someone with a lesser technical prowess might find this a tough nut to crack. For instance, to properly encode the body parameter for Whisper, I had to write my own server. Additionally, I had to write JavaScript to marshal the JSON sent from ChatGPT, because everything came through packed in the content field. To allow mapping of attributes to the inbound parameters of the next node, creating a new JSON object that explicitly provided top-level JSON objects was necessary.
In a nutshell, while this journey was fraught with challenges, it was rewarding in that it made me realize the immense potential of agentic AI in action. Do bear in mind, however, that navigating through this terrain requires a solid skillset, along with heaps of patience, resilience, and problem-solving ability.
0 Comments