Investigate double booking in RIA scheduler

Summary

The online scheduler allowed a double booking on inspector Martin (RIA): two bookings 3 minutes apart on the same inspector.

  • 4249 Melinda Ln at 9:26 AM
  • 9665 Pine Thicket at 9:29 AM

Google Chat: https://chat.google.com/room/AAQAfgRDZA4/cH-RJkl9QtA/cH-RJkl9QtA?cls=10

Root cause (investigation)

The booking pipeline has no atomic "is this slot still free?" check. Availability is enforced only when a customer fetches /schedule/optimal-slots, and that endpoint is cached for 5 minutes. POST /inspection/book-job writes a new inspection with no overlap query, no DB transaction, no advisory/Redis lock, and no unique index. Two customers (or the same customer retrying) holding a stale slot list will both succeed.

1. No conflict check at commit time

router.post('/book-job', ...) in attik-backend/src/routes/inspection.ts (lines 622-657) validates body fields and Attik permission, then calls createInspection. Inside attik-backend/src/util/functions/inspection/createInspection.ts (lines 295-362) a new Inspection is built and .save() is called with no overlap query against existing Inspection/Event rows for the chosen _inspectorId over [datetime, endtime).

A search for overlap|conflict|alreadyBooked|isAvailable|checkAvailab inside createInspection.ts returns no matches. No transaction, no advisory lock, no Redis lock, no unique index across (_companyId, _inspectorId, time-range).

2. Slot endpoint is cached 5 minutes (in-memory, per pod)

attik-backend/src/routes/schedule.ts line 117:

router.get('/optimal-slots', cache(300), async (req, res) => {

attik-backend/src/util/cache/routeCache.ts keys the cache by req.originalUrl + res.locals.company._id with no invalidation API. Two customers issuing the same query (same serviceId, similar coords, same startDate, same company) get the same cache key and the same cached slot list β€” even after one has already booked. 3 minutes between bookings is well within this 5-minute TTL.

3. The "working hold" is not a lock

SchedulerContext calls manageWorkingHold to POST /event with type: 'Block'. The POST /event route in attik-backend/src/routes/event.ts accepts the Block without overlap validation. Two simultaneous customers can both create a Block on the same inspector/time, and neither sees the other's because of the slot cache.

Suggested fix

Pick one (or combine):

  • A. Server-side overlap check inside a transaction in createInspection.ts immediately before inspection.save(): query Inspection and Event for any overlap on the chosen inspector, return 409 on conflict. Run inside a Mongo session/transaction to avoid TOCTOU between the check and the insert.
  • B. Distributed lock via Redis (e.g. redlock) keyed by ${companyId}:${inspectorId}:${slot-bucket} around the create path.
  • C. Unique partial index on (_companyId, _inspectorId, datetime) excluding cancelled, as a backstop so a second insert simply fails.

Recommendation: A + C. A gives a friendly 409 and the right error path; C is cheap insurance.

Also consider:

  • Drop or invalidate the 5-minute cache on /schedule/optimal-slots. If kept, add a per-company version stamp in the cache key, bumped on inspection/event writes.
  • Reject overlapping Blocks in POST /event so the working-hold becomes a real reservation.

Investigation steps

  • Pull both inspections from Mongo and confirm same _inspectorId, overlapping [datetime, endtime), both jobCreationType = 'attik', scheduledTs ~3 min apart.
  • Check Express access logs around the booking time for POST /inspection/book-job count and /schedule/optimal-slots cache HIT/MISS.

Please authenticate to join the conversation.

Upvoters
Status

Planned

Board
🏠

Main App

Date

26 days ago

Author

Linear

Subscribe to post

Get notified by email when there are changes.