Venky's World

     

Yahoo Messenger Archive 1on1

 

One of the better features of  yahoo messenger is the archive viewer. Unlike  msn messenger, yahoo stores the archive in a structured format. One drawback of the archive viewer is that you have to be online and logged into yahoo to view your archives. This led me to look at the structure of the yahoo archive files with the “.dat” extension. But before I get into the details of the structure of the .dat file let me explain how the archives are arranged and stored.

 

All the archive files are stored inside the Yahoo!\Messenger\Profiles\user id folder .Archives are organized by buddy user ids. The are 3 main folders in here for Messages, Conferences and Mobile Messages.  Inside the messages folder there is a sub folder created for every buddy id with which the user has had an IM (instant messaging) conversation. Each archive file inside this folder has a date stamp as part of the filename e.g. 20040802-userid.dat.

 

Now let’s get into the structure of a .dat file ,I use a hex viewer to view the contents of a file, a hex output of an archive file looks something like this

 

 0000:0000  B6 5B 7B 40 00 00 00 00 01 00 00 00 00 00 00 00

0000:0010  00 00 00 00 B6 5B 7B 40 06 00 00 00 01 00 00 00

0000:0020  03 00 00 00 1E 0C 32 00 00 00 00 D7 66 7B 40 06

0000:0030  00 00 00 00 00 00 00 03 00 00 00 1E 00 17 00 00

0000:0040  00 00 57 69 7B 40 06 00 00 00 00 00 00 00 2C 00

0000:0050  00 00 02 0D 0B 4B 14 30 0A 10 1D 45 1E 04 1D 4B

0000:0060  1A 30 09 10 44 0C 18 11 01 4B 18 2F 14 14 43 16

0000:0070  56 04 0D 08 16 2A 0A 01 44 11 19 01 0F 12 00 00

0000:0080  00 00 64 69 7B 40 06 00 00 00 00 00 00 00 06 00

0000:0090  00 00 44 55 5E 5B 59 7B 00 00 00 00 5E 6B 7B 40

0000:00A0  00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00

0000:00B0  5E 6B 7B 40 06 00 00 00 01 00 00 00 02 00 00 00

0000:00C0  19 0E 00 00 00 00 5E 6B 7B 40 06 00 00 00 01 00

0000:00D0  00 00 1C 00 00 00 12 0C 0A 4B 0C 7F 10 14 08 0E

0000:00E0  56 11 01 4B 18 2F 14 14 44 17 13 06 0B 05 1C 2B

0000:00F0  08 0C 00 00 00 00 68 6B 7B 40 06 00 00 00 00 00

0000:0100  00 00 19 00 00 00 0F 00 1D 1F 1C 2D 00 14 1D 45

0000:0110  1E 00 4E 08 18 33 08 10 00 45 1B 00 4E 1E 09 00

0000:0120  00 00 00 E6 6B 7B 40 06 00 00 00 00 00 00 00 06

0000:0130  00 00 00 02 0D 0B 19 1C 60 00 00 00 00 C4 6C 7B 

 

The messages in the archive file are not encrypted; yahoo uses a simple XOR algorithm to encode the messages (so much for security !!) .Every .dat file begins with a timestamp, all messages are also preceded by a timestamp. Every archived message has a 16-byte header at the beginning. Take a look at a typical header and message

 

0000:0010  00 00 00 00 B6 5B 7B 40 06 00 00 00 01 00 00 00

0000:0020  03 00 00 00 1E 0C 32 00 00 00 00

 

The header starts from B6 5B 7B 40 and ends at 03 00 00 00 .

The first 5 bytes of the header is the timestamp representing the time at which the message was sent or received.

This is followed by 3 reserved bytes always set to 00

The 9th byte indicates whether the message was received by a user or sent by the user to the buddy. If the value is 01 it indicates the message was received and if it is 00 it indicates that the message was sent by the user to the buddy.

This is again followed by 3 reserved bytes always set to 00

The 13th byte indicates the length of the message that is to follow

This is again followed by 3 reserved bytes always set to 00

The message is encoded using the XOR algorithm with the user id being one of the keys.

In this example the 3 byte message is 1E 0C 32, to get the actual message we need to XOR it back with the user id. The user id with which this message was encoded was ‘venky_dude’

 

The ASCII equivalent of each character of the user id is as follows

v- 118 ;e -101;n -110;k -107;y -121;_ -95;d -100;u -117;d -100;e -101;

The message bytes in decimal form will be

1E -30; 0C -12; 32 -50

After performing the XOR algorithm the output will be like this

30 (XOR) 118 = 104

12 (XOR) 101 = 105

50 (XOR) 110 = 92

This operation can be easily performed in c/c++ using the ^ operator.

e.g.

int x,y,z;

x=30;

y=118;

z= x ^ y;

printf("%c",z); 

 

Converting the output as per the ASCII code we get the following output

104 – h

105 – i

92 - \

 

Hence the message that was received was “ hi\”.

 It’s fairly easy to write a program which would print the decoded message by reading the .dat file.

The message finally in the end is padded by 3 reserved bytes always set to 0.

In case u guys have any queries regarding this article  just send me a mail at venkat.mani@gmail.com