Python Hijinks: Outlook Safelink *Decoder*
published-date: 21 Jan 2023 23:12 +0700
categories: python-hijinks
tags: python
Synopsis
There were times when me and my colleague were asked to regularly update and copy links out of Outlook 365 mail to be inserted into certain form. But the url we copied were always too long, and the form they gave us has a character limit. For some reason (read: good), these links always redirect to Microsoft Office Defender unsafe malicious link detection system.
Here’s a sample on how the link would look like if you copied directly from Outlook.
https://your.company.windows.defender.domain.com/?url=https%3A//en.wikipedia.org/wiki/Sergeant_Reckless&data=05%7C01%7Cmy.mail%40domain.com%7C46575da2930c2aa97e550b27b1090595%7Cd0dd9bc880d23c27c6e5821a10edc04e%7C1%7C0%7C844353527029278340%7CUnknown%7Cpegytgkototbwiuhiqkbodlblaacaxulmzsqdwdtgekwfuableebpvucgfhdobzygxpohqqcare%3D%7C3000%7C%7C%7C&sdata=iXZKYIVxymHZqRIuefqinvbeXKHv%2BnRxdXMrYexvfRh%3D&reserved=0
Note all obfuscated data are manually randomized in this link. There’s no use on decoding anything out of it.
It’s pretty obvious our target link is embedded and formatted as escape characters in the query url
param. On top of that, the query goes through the defender system alongside additional payload, presumably the information of the receiver and the mail where it was copied. idk honestly.
Knowing it’s there, I did a quick googling and found a tool online to extract the og link (site: http://www.o365atp.com/, which looks dodgy asf but I tested it and its safe 🤞). I decided to write myself a little python script just in case the web went offline (it’s just a python implementation of the website).
The Script Tidbits
You can get the full version in my github repo: Outlook Safelink Decoder utilizing urllib.parse
in python. Although the title itself is a bit misleading because it doesn’t do any decoding at all.
This is a snippet of the only functional part of the script.
def safelinkDecode(url):
from urllib import parse
data = parse.urlparse(url)
query = data.query
if not query:
raise ValueError(f'tried to parse {url}: \n\tno valid query string in the given url')
queryfragment = [i.split('=') for i in query.split('&')]
qkeys, qvals = tuple(zip(*queryfragment))
qvals = map(parse.unquote, qvals)
queryfragment = dict(zip(qkeys, qvals))
return queryfragment
In summary, urllib.parse.urlparse
is used to get the query parameter from the url. Because all of the characters are escaped or quoted, urllib.parse.unquote
is used to unquote the quoted quote (try saying that out loud). The other stuff is just fancy way to unpack, map the unquote function into the value list, and wraps the result back into dictionary.
Passing previous link into this function will get this result.
{
"url": "https://en.wikipedia.org/wiki/Sergeant_Reckless",
"data": "05|01|my.mail@domain.com|46575da2930c2aa97e550b27b1090595|d0dd9bc880d23c27c6e5821a10edc04e|1|0|844353527029278340|Unknown|pegytgkototbwiuhiqkbodlblaacaxulmzsqdwdtgekwfuableebpvucgfhdobzygxpohqqcare=|3000|||",
"sdata": "iXZKYIVxymHZqRIuefqinvbeXKHv+nRxdXMrYexvfRh=",
"reserved": "0"
}
Aaand presto! the url now can be accessed from the url
key.
P.S. it seems some of the additional payload data sent to Windows Defender is encoded in 32 character long (presumably base64), the sender email in plain text, other 75 characters mumbo jumbo in lowercase, and uses |
as delimiter.