Yahoo Messenger Archive 1on1
One of the better features of yahoo messenger is the archive viewer. Unlike msn messenger, yahoo stores the archive in a structured format. One drawback of the archive viewer is that you have to be online and logged into yahoo to view your archives. This led me to look at the structure of the yahoo archive files with the “.dat” extension. But before I get into the details of the structure of the .dat file let me explain how the archives are arranged and stored.
All the archive files are stored inside the Yahoo!\Messenger\Profiles\user id folder .Archives are organized by buddy user ids. The are 3 main folders in here for Messages, Conferences and Mobile Messages. Inside the messages folder there is a sub folder created for every buddy id with which the user has had an IM (instant messaging) conversation. Each archive file inside this folder has a date stamp as part of the filename e.g. 20040802-userid.dat.
Now let’s get into the structure of a .dat file ,I use a hex viewer to view the contents of a file, a hex output of an archive file looks something like this
0000:0000 B6 5B 7B 40 00 00 00 00 01 00 00 00 00 00 00 00
0000:0010 00 00 00 00 B6 5B 7B 40 06 00 00 00 01 00 00 00
0000:0020 03 00 00 00 1E 0C 32 00 00 00 00 D7 66 7B 40 06
0000:0030 00 00 00 00 00 00 00 03 00 00 00 1E 00 17 00 00
0000:0040 00 00 57 69 7B 40 06 00 00 00 00 00 00 00 2C 00
0000:0050 00 00 02 0D 0B 4B 14 30 0A 10 1D 45 1E 04 1D 4B
0000:0060 1A 30 09 10 44 0C 18 11 01 4B 18 2F 14 14 43 16
0000:0070 56 04 0D 08 16 2A 0A 01 44 11 19 01 0F 12 00 00
0000:0080 00 00 64 69 7B 40 06 00 00 00 00 00 00 00 06 00
0000:0090 00 00 44 55 5E 5B 59 7B 00 00 00 00 5E 6B 7B 40
0000:00A0 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
0000:00B0 5E 6B 7B 40 06 00 00 00 01 00 00 00 02 00 00 00
0000:00C0 19 0E 00 00 00 00 5E 6B 7B 40 06 00 00 00 01 00
0000:00D0 00 00 1C 00 00 00 12 0C 0A 4B 0C 7F 10 14 08 0E
0000:00E0 56 11 01 4B 18 2F 14 14 44 17 13 06 0B 05 1C 2B
0000:00F0 08 0C 00 00 00 00 68 6B 7B 40 06 00 00 00 00 00
0000:0100 00 00 19 00 00 00 0F 00 1D 1F 1C 2D 00 14 1D 45
0000:0110 1E 00 4E 08 18 33 08 10 00 45 1B 00 4E 1E 09 00
0000:0120 00 00 00 E6 6B 7B 40 06 00 00 00 00 00 00 00 06
0000:0130 00 00 00 02 0D 0B 19 1C 60 00 00 00 00 C4 6C 7B
The messages in the archive file are not encrypted; yahoo uses a simple XOR algorithm to encode the messages (so much for security !!) .Every .dat file begins with a timestamp, all messages are also preceded by a timestamp. Every archived message has a 16-byte header at the beginning. Take a look at a typical header and message
0000:0010 00 00 00 00 B6 5B 7B 40 06 00 00 00 01 00 00 00
0000:0020 03 00 00 00 1E 0C 32 00 00 00 00
The header starts from B6 5B 7B 40 and ends at 03 00 00 00 .
The first 5 bytes of the header is the timestamp representing the time at which the message was sent or received.
This is followed by 3 reserved bytes always set to 00
The 9th byte indicates whether the message was received by a user or sent by the user to the buddy. If the value is 01 it indicates the message was received and if it is 00 it indicates that the message was sent by the user to the buddy.
This is again followed by 3 reserved bytes always set to 00
The 13th byte indicates the length of the message that is to follow
This is again followed by 3 reserved bytes always set to 00
The message is encoded using the XOR algorithm with the user id being one of the keys.
In this example the 3 byte message is 1E 0C 32, to get the actual message we need to XOR it back with the user id. The user id with which this message was encoded was ‘venky_dude’
The ASCII equivalent of each character of the user id is as follows
v- 118 ;e -101;n -110;k -107;y -121;_ -95;d -100;u -117;d -100;e -101;
The message bytes in decimal form will be
1E -30; 0C -12; 32 -50
After performing the XOR algorithm the output will be like this
30 (XOR) 118 = 104
12 (XOR) 101 = 105
50 (XOR) 110 = 92
This operation can be easily performed in c/c++ using the ^ operator.
e.g.
int x,y,z;
x=30;
y=118;
z= x ^ y;
printf("%c",z);
Converting the output as per the ASCII code we get the following output
104 – h
105 – i
92 - \
Hence the message that was received was “ hi\”.
It’s fairly easy to write a program which would print the decoded message by reading the .dat file.
The message finally in the end is padded by 3 reserved bytes always set to 0.
In case u guys have any queries regarding this article just send me a mail at venkat.mani@gmail.com