First 40 PCs, then 100, and then 140 machines crashing, spreading fast. Then I stepped in…
The First 40
Around 10 AM, the Help Desk escalated to my Teams channel:
“40+ computers completely locked down—users can’t launch anything, not even Settings or Task Manager.”
My colleagues had been troubleshooting for 3.5 hours. When I heard the problem spread to over 100 users with no progress I jumped on the call. Microsoft Tier 2/3 support had just finished their investigation and concluded the problem “Didn’t seem to be any primary Microsoft services.” They’d had access to a few partially affected machines and they had tried everything: adjusting Group Policy, restarting services, updating Endpoint Central policies again. Nothing worked.
Microsoft support ruled out AppLocker since the configuration hadn’t changed in months. But that didn’t make sense because every on screen popup error to AppLocker until it eventually Blue Screened.
We found a few more partially affected users and I jumped on one of those machines with a help desk rep. The behavior looked like AppLocker was blocking everything except whitelisted apps, despite no config changes to AppLocker. BUT there had recently been changes to Endpoint Centrals Browser Restriction feature the night before. I restarted the Endpoint Central service, then restarted the computer, hoping to clear the issue, but when it came back up, it was completely bricked—nothing would run, and I couldn’t even remote back in.
The scope was alarming: 140+ machines affected, including most of our sales team. Each hour cost roughly $15,000 in lost sales.
The Breakthrough
Something felt wrong about dismissing AppLocker. I asked the Microsoft support engineer:
“Could this be an AppLocker conflict with another application control mechanism?“
He paused.
“Maybe… it just doesn’t make sense for AppLocker since the config looks fine and hasn’t changed.”
I’d noticed in the logs that the only affected machines appeared to be the machines where the new “Browser Restriction” feature was deployed to via our third-party MDM tool, Endpoint Central (EPC). EPC Browser Restriction was supposed to complement AppLocker. The two systems appeared to be conflicting, both trying to control application execution and blocking everything in the overlap.
I asked: “What registry key disables AppLocker enforcement temporarily so we can test?”
He provided it. Then after much pushback I finally got my team to agree to test it and my supervisor’s volunteered to have us used his computer since his was affected as well. So on a video chat from his Cell Phone he booted his computer in Safe Mode, I guided him to make the neccessary registry changes, then he restarted and after about 2 minutes, EVERYTHING worked perfectly in his PC.
In less than 2 hours after joining the call I had a solution. A solution to a problem that was had stumped the experts and several of my colleagues.
The Resolution
We documented the fix and trained the Help Desk team. They spent next few days recovering all 140 machines. What I had done in under two hours was make it possible for us to rescue all the devices without reimaging. Reimaging would have cost $20,000+ in Help Desk labor alone and would take well over 2 weeks and would cost the company at least $500,000 in lost sales.
The team learned critical lessons about phased rollouts and comprehensive testing (my colleague had only tested 3 machines for 6 hours the night before).
Technical Explanation
The problematic machines had extra drivers from previous Endpoint Central (EPC) MDM connections. These drivers attempted to enforce EPC application control before AppLocker policies loaded, including EPC Browser Restriction.
EPC appeared to use AppLocker’s AppID service under the hood to determine app hashes. The Browser Restriction rule that EPC generated was formatted incorrectly and was applying its browser-specific restrictions to all services system-wide.
The loop:
- Endpoint Central asks Windows:
- “What app is running, what is the appID and who made it?”
- Windows looks at the AppID service and replies:
- “explorer.exe with appID abc123 made by Microsoft”
- Endpoint Central:
- “If it’s not by Google or Firefox, kill it now”
- Windows:
- “OK…”
This repeated until the Endpoint Central Agent itself crashed and couldn’t receive updates to disable the policy.
Our Saving Grace
The AppID service appeared to be shared by both AppLocker and Endpoint Central. When we disabled AppID (effectively disabling AppLocker), Endpoint Central stopped receiving responses:
- Endpoint Central:
- “What’s the ID of the app you launched?”
- Windows tries to look at the AppID service and then says:
- “What apps? What AppID?”
- Endpoint Central:
- “No AppID? Ok, then never mind. Then there is no app to block.”
So EPC stopped issuing kernel-level kill commands.
Final Details
There is a lot more that went on to determine this and there is a lot more I did not cover hear for the sake of brevity such as:
- The
Session1_Initialization_Failed BSOD (STOP code 0x0000006D)BSOD message - How the BSOD was tied to the process creation callback
PsSetCreateProcessNotifyRoutineEx - How I linked it to Endpoint Centrals system drivers like
MEARWFltDriver.sys
Then there was push back to trying my solution. There was pushback about disabling AppLocker, about editing the registry, and push back about Endpoint Central being even remotely involved. Ultimately I was able to convince everyone that the disabling AppLocker / AppID would be temporary and it would get the Sales team back online in the shortest amount of time.