How Close is AI to Acting As a Drupal Developer?
Last December, I wrote about AI as a tool for Drupal development – a concept that, for many, sounds like a huge leap forward, and it is. But it’s just the beginning. We’re not far from AI acting as an actual Drupal developer.
AI tools capable of doing developer-like work already exist in various forms. Among them GPT-4 and Google Bard. Tools such as these can analyze code, allowing conversations where errors and concerns are discussed and solutions are recommended. The result is often a significant increase in efficiency.
AI can also be leveraged as a development tool to generate code from scratch, seek suggested revisions, receive helpful feedback, and a lot more. At some point, AI may drive the entire Drupal web development process.
We’re not there yet, and here are five hurdles that need to be cleared.
Current Limitations to AI-Driven Web Development
1. There are limits on amount of input AI models can handle without having to compress the conversation.
Code creation involves a lengthy conversation back and forth between the developer and the AI powered Large Language Model (LLM). When the maximum input size supported by an LLM model is reached, the conversation must be automatically compressed so that it can continue.
During a lengthy code-creation conversation with GPT-4, using tools such as Bard and chatGPT, developers are not alerted to the fact that maximum conversation size has been reached. Instead, the conversation up to that point is automatically compressed and the process moves forward. This results in the AI "forgetting" details which can reduce the quality of its feedback.
While the compression algorithm used by various LLM apps may not be known, we know that it basically takes chunks of content from the conversation and reduces it to a manageable size. In many cases, this might not be a problem. There is, however, the potential for details to be lost in the process – particularly if we are talking about an activity such as code creation where there's no room for ambiguity.
To illustrate this limitation, here's a simple example created with GPT-3.5-Turbo that asks for essays about food from each country:
User: Please generate essays on foods for each country of the world. When I say continue, please generate an essay on the next country until you have generated essays for all countries. It is okay if you can't fit the whole essay in a single response; I will say continue and then you can proceed with the current essay. Please order the countries alphabetically and then start with the first country.
GPT: Let's start our culinary journey around the world with Afghanistan.
Afghanistan, a land of rich cultural heritage and breathtaking landscapes, offers a cuisine that reflects its history, geography, and diverse influences. Afghan cuisine is a fusion of flavors from various regions, including Central Asia, the Middle East, and South Asia. It's characterized by hearty dishes, aromatic spices, and a focus on fresh ingredients.
One of the most iconic dishes of Afghan cuisine is Kabuli Pulao...
After generating additional essays in the same conversation up to "China", I asked GPT to remember the first dish it listed in the first country's essay but it was not able to remember that far back and gave an incorrect response:
User: What was the first dish in the first country listed?
GPT: The first dish listed in the essay for the first country, which was Burkina Faso, is Tô, a thick porridge made from millet or sorghum flour, cooked with water and stirred until it reaches a smooth consistency.
One approach to avoiding the potential loss of detail due to compression is to start a new conversation. The challenge, however, is that the new conversation might not accommodate all relevant information without also requiring compression.
Another approach is to ask one-off questions in separate conversations; if the AI does not need the full context from the main conversation, asking those questions elsewhere helps avoid excess clutter, thus reducing the amount of compression needed to maintain the conversation.
In the near future, it's likely that new versions of GPT and other LLMs will be able to manage larger inputs. As longer, non-compressed conversations become possible, we will move from AI assisting with code snippets to larger files and then, ultimately, allow analysis and modification of entire repos with AI's generating pull requests.
As longer, non-compressed conversations become possible, we will move from AI assisting with code snippets to entire files and then, ultimately, allow analysis and modification of entire repos with AI's generating pull requests.
Input limitations is key among the reasons why AI is not yet positioned to step into the role of a Drupal developer. And there are others.
2. AI Models are prone to inaccurate guesses.
One facet of leveraging LLM's for development is their tendency to "hallucinate" incorrect output that may seem correct but is factually wrong.
I’ve had this happen when asking GPT-4 to generate a snippet of code, and then it proceeded to generate code that would be valid if certain functions and methods existed in Drupal. But they don’t. When I asked why it made up an invalid solution, it first apologized and then, when pressed, admitted it did not know and that it cannot "introspect." This makes sense as LLMs simply guess the next best word. This behavior illustrates the benefit of techniques such as chain-of-thought where the LLM is asked to talk through its process first rather than immediately output a solution.
Likely what’s needed to overcome the frequency of hallucinations is a bigger model which has been trained on more data. GPT-4 is bigger than GPT-3, and seems to be less prone to hallucinations. When GPT-4.5 or GPT-5 is released in the next few months, the occurrence of hallucinations is likely to decrease further.
3. Solutions might not match the current version of Drupal.
Another issue that I’ve encountered with GPT-4 is that solutions are sometimes suggested that would have been appropriate for Drupal 7 or Drupal 8, when I’m doing Drupal 10 work.
This kind of error could be the result of the target Drupal version, mentioned at the beginning of a conversation, being excluded due to compression or it could be another form of hallucination. To mitigate this, I'll periodically remind the AI we are talking about Drupal 10 as the conversation progresses.
Sometimes, a suboptimal solution can be corrected by pointing it out to the AI. The AI might then apologize and generate an alternate version of that code snippet.
4. Judgment might simply be bad.
The fourth risk is bad judgment. The AI model might offer solutions that work, but are not necessarily good solutions that follow best practices. When a developer is leveraging AI as a tool for Drupal development, bad advice is something that can be sifted through and discounted. The picture is very different when you are looking to AI to function as a developer, because it might create code that technically works, but is it future-proof or as scalable as code written by an expert human developer?
On one level, the potential for bad judgment can be explained by the fact that LLM's are modeled on the human brain, which also has the potential for mistakes and bad judgment. That said, GPT-4 shows improvements in judgment over GPT-3.5 and there is every reason to expect that judgment will continue to improve with subsequent versions.
5. And of course, security.
Currently, the most prominent AI solutions are hosted remotely by huge companies: OpenAI, Google, Microsoft, and Amazon. As such, there are inherent questions about the security of software developed by AI – particularly if sensitive data is at stake.
There is no guarantee that future AI models will not be trained on sensitive data that hackers might seek to get access. At this point, developers need to ask: “If the work I’m doing was shared publicly, would that be okay?”
Given how useful AI tools are, it's worth finding solutions to mitigate security risks. For example, sanitizing or encoding sensitive data before submitting it to the AI and then decoding its response; solutions are already being created to play this role.
Fortunately, for those of us who are Drupal developers, our work is often open source. When I’m leveraging AI to write an open source Drupal module, I’m not worried that others might get access to the code. In fact, that might actually be a good thing since future LLM's could suggest that module as a solution.
Imagine a dynamic similar to current Search Engine Optimization (SEO) strategies used to help a website come up higher in search rankings. This kind of thinking is still hypothetical, but in the future, a secondary motivation for integrating AI tools into a website might be to ensure future LLM models have information about the website baked into their training so they can suggest it to users.
What’s Next? Future Possibilities for AI Web Development
Now that we’ve covered what I view as the key reasons why we are not there yet, let’s look at how GPT-5 or GPT-6 might act as a developer.
The first limitation mentioned above is the input size. It's very likely that input capacity will only increase, possibly to the point that an AI will be able to hold an entire website “in its mind.” Vast new possibilities would emerge at that point.
Let’s compare an AI-driven world to current development workflows. Current processes for a Drupal support request can be summed up as follows:
- A client requests to make a change to their website.
- A developer provides a level of effort in terms of hours and a Jira ticket is created.
- The project manager calculates costs and coordinates with the client for approval to proceed.
- The project manager schedules the work with the developer.
- The development team completes the work and forwards it to QA.
- The client reviews the work and either requests revisions or okays it for deployment.
- The work is deployed.
- Post-deploy QA testing is done.
- The client confirmed work was deployed correctly.
In the future, as tools get smarter and can handle bigger inputs with better judgment, all of these steps could be handled by AI.
Imagine a Jira ticket containing a "chat widget" containing dialogue between an AI project manager, fine-tuned for managing the project and coordinating the work, talking to an "AI developer" and "AI QA specialist". They would collaborate like humans would, but much faster.
I've experimented with this kind of interaction, manually simulating a conversation between multiple AIs by copying and pasting prompts from one AIs conversation to another, where each AI is given unique information at the beginning of its conversation; current AI capabilities could actually handle all of the above steps, but the quality and judgment would not be there and the required troubleshooting and oversight would outweigh any efficiencies.
Ultimately, this kind of technological capability may raise the level of abstraction so human stakeholders can focus more on the big picture and less on micromanaging smaller details. It will still be possible to "zoom in" on details and refine them, but as AI tools improve, the probability that they will complete tasks correctly the first time will improve as well.
Transitioning to AI Driven Drupal Development
At this point, an interesting next step in transitioning from human developers to AI efficiencies would be incremental automations of every step in the process, until we get to the point that we’re confident of AI’s judgment and capacity.
Here's how this might look:
- A client submits a request.
- The AI reads it and applies judgment concerning the level of complexity and how to best proceed.
- A human developer would then review the AI's recommendations to determine whether the approach is correct.
It's a good idea to start experimenting with AI-driven development workflows now even though the technology isn't there yet. By prototyping this kind of activity, its performance can be evaluated across a range of criteria as new LLM models are released and the following questions come into focus:
- How much code can the LLM properly understand at once?
- How often does it hallucinate?
- How often does it give bad advice or have bad judgment?
- How quickly does it generate output?
- How well does it follow directions?
The transition to AI-driven web development will be a snowball effect. We're now at the point of using AI to make work more efficient. Next steps will be toward the creation of more automation, as we move toward systems that are self-maintaining and self-upgrading.
We’re now at the point of using AI to make work more efficient. Next steps will be toward the creation of more automation, as we move toward systems that are self-maintaining and self-upgrading.
Moving forward, remotely hosted AI models might not be the only option available. Considering the fact it wasn't long ago that a room full of mainframes was needed to accomplish what can now be done in a cell phone or watch, it’s realistic to envision advancements in AI technology where self-hosted models are within reach for organizations.
Rather than relying on OpenAI, for example, and having to trust that your data would not be hijacked, it might be just a matter of time before individual companies can run their own data centers and spin up their own LLM's that they could train, customize, and secure.
Security levels of different AI systems might be audited and serve as a competitive differentiator.
Will AI Take Jobs from Web Developers?
Ultimately, AI driven efficiencies will take on much of the work that web developers are currently responsible for. That’s the downside.
The alternative to becoming obsolete is to lean into AI as Promet is doing. Moving to AI-driven development will be a paradigm shift similar to moving from print media to the Internet; the companies and individuals that embrace AI will prosper.
Interested in continuing the conversation about how AI may impact Drupal development, as well as how it can be leveraged to make your website better, right now? Let’s talk.