Here I am reportuing poor procedure in Glide support, based on today’s incident. It is not known when exactly the problem with connection to external services occured, but first report about that was at about 6 am CET. It took not les than 4 hours for Glide support to open the tciket:
This ticket was labeled completely wrongly with performance delays. This is why it why it was graded as “Minor incident”. In fact it was unavailability for all apps using external connections (PDFMonkey and AIs were reported).
In discussion with Glide AI “support”, completely wrong responses were submitted. This is due to the fact that pertinent AI service is not instructed properly.
Pls do not answer to this in wrong way as previously described (such as “Works as designed!”) , but accept without much ado this as a “bug” in support SOP and fix it asap. This will lead to faster recognition and solving the bugs, which are u nderstandable and I am not blaming Glide for that, we all do mistakes!
If you need hints for improvement support procedures, pls contact me in inbox.
Now I have seen the report. It is fully correct and demonstartes great competence of Glide engineering team.
Unlike to that, this report makes my concerns about the competence of Glide support team even i more doubtfull territory! After such major infrastructural change:
related glitch was not detected for hours
there is no automatic check od services’ availability
instead the users have to report problems
after such first reports it took not les than 4 hours to generate the ticket
ticket generated was completely wrong, e.g. preformance delays as minor incident
the responses from AI “support” were completely inadequate (“The problem is in your update quota!”)
after hours the incident grading was risen to major
All these facts are demonstrating bad SOPs in Glide support. So, put the focus on incident itself aside, do concentrate yourself to support procedures improvement! It will make the resolution of future incidents more effective!
I don’t understand what is your point HERE? Maybe I wasn’t clear enough: incidents happen, this is inevitable. People are doing their best in fixing. But THIS very post has nothing to do with the respective incident. It deals with poor Glide support. If I am stressing this issue and all answers are trying to explain the engineering issue, I come to impression that nobody takes the support problem seriously. Even it seems that people are intentionally diverting due attention from support to the incident itself.
This is a kind of evidence that Glide doesn’t follow best practices in support procedures in general. Every report or request I made was primarily processed by following the rule “Works as designed!”. Someone brought even a new approach: “Design decision!”. When such “design decision” is wrong, then it is a BUG in design. This can be applied also in the matter of this post: someone decided to instruct AI to respond like this: “The problem is not on Glide side, the problem is on the users’ side!”. This is “support design decision” and one will for sure answer to me: “Design decision, thi is the Holy Bible for us, do not dare to question it!”.
In his post, Ryan discusses the technical aspect and the communication aspect of the incident. One could consider support as being included in communication, but also perhaps not.
If you want to draw attention the your comments about Glide support specifically, you could comment below Ryan’s post and link back to this topic here with your comments.
Maybe we should try to avoid confusion: my post was related to poor Glide support, where this instance (unavailability of external connections Oct 14th) was just one instance.
You commented this post with Ryan’s explanation of technical background. His explanation is fully valid and I have no comment about it. On the contrary, I find your comment here fully off-topic. Why? Because the subject in my post is SUPPORT and not ENGINEERING od BUG FIXING (which is the meritum of Ryan’s explanation).
So, I see no need to address anything to Ryan, because I have no comment on his explanation. If Glide wants to improve their support SOPs, here they have enough inputs to experience the problem. I cathegorized my post as a BUG, and they should react in that direction. They can do that by themnselves or they can ask for some external expertise.
If you think my comments are stirring confusion, my apologies, that’s not at all my intent. In fact I find your comments regarding Glide support during the incident extremely important and I hope they will be acknowledged.
I suggested you comment under Glide’s announcement because that is the official announcement regarding the incident. The topic is called “incident analysis”. In it Ryan broaches two topics: engineering and communication. I think support could have deserved a mention. Ryan’s focus at Glide in engineering so his natural focus during a technical incident is … engineering. Yet he still mentions communication which is part of the response of a tech incident. I think support is part of this response too, but perhaps he didn’t think of it, or it is out of his scope, or maybe it’s implied, or maybe as you believe support is a different topic altogether.
I do agree with you that support is a separate topic from engineering. I just think in the context of this tech incident, the two are related. The official announcement will be closely monitored by Glide for comments. Hopefully your comments will get the consideration they deserve, despite not appearing in the thread of the official announcement.
Rest-assured we will be reviewing both technical and process-related gaps (which includes Support, observability, communication, etc…) as part of this incident’s post-mortem.
"What We’re Doing to Prevent This from Happening Again
We will be actively pursuing the following corrective measures as a result of this incident:
Implementing more rigorous planning and team awareness for high-risk systems work.
Improving our monitoring and alerting to more clearly identify the extent and severity of any serious degradations to the user experience so we can more proactively communicate system status to our customers.
Revamping our incident response protocol to include more clearly defined roles and to better account for public awareness and progress reporting of any incident. "