« Diskeeper 2007 will offer a free update to support Vista | Main | What do you want to see on the Diskeeper Blog? »

October 22, 2006

Fun with NTFS Data Streams

The following is a re-issue of an article written in June 2002 for the Diskeeper eLetter:
---

The NTFS file system contains an unheralded feature called streams. Using the stream facility, a single file can contain multiple streams of data, each separately addressable.

What could be done using such a facility? Well, if you remember old DOS-style database systems such as Paradox, youll remember that each database you made actually created a number of files. For example, if someone created Paradox database named films, hed end up with the files:

films.db
films.ix
films.r

If one of these files got accidentally deleted, the whole database was essentially deleted. If one needed to copy the database to another directory (folder), one would have to copy ALL the files to the new directory, or the database wouldnt function correctly.

What if all the files needed by a Paradox database could actually be folded into a single file? That would make accidental deletion of a single file harder, and would make the action of copying the database from one folder to another much easier and more reliable.

Using the stream facility, creating such a database system is much easier.

Using streams, a single file could be made, called:

films.pdx

Within that file could reside the streams

films.db
films.ix
films.r

At the Explorer level, however, all you would see is a single entry:

films.pdx

Lets have a little fun with streams...

(All the below examples were done on Windows XP. Your mileage may vary.)

(To run these examples, it is best to open a command prompt window. DO NOT USE THE RUN SELECTION FROM THE START MENU.)

Lets make a simple text file. Open a command prompt window and enter the command:

notepad testfile.txt

Notepad will, of course, ask if you want to create the file, and you say yes.

Enter the following string into notepad:

Once upon a midnight dreary

Exit notepad, saving the changes.

Back at the command prompt window, type the following command:

type testfile.txt

Of course, you get back the string Once upon a midnight dreary.

Now enter the following command at the command prompt:

notepad testfile.txt:hidden

(The quotes are part of the command.) Notepad will ask if you want to create the file, and you answer yes.

Enter the string into notepad:

While I pondered weak and weary

Exit notepad, saving the changes.

Back at the command prompt, enter the following command:

type testfile.txt

You get the line Once upon a midnight dreary again.

What happened to weak and weary?

Enter the following command at the command prompt:

dir

Well, theres no entry for testfile.txt:hidden, is there?

In fact, the directory output says:

06/18/2002 11:46 AM 27 testfile.txt

Hmmm. 27 bytes is Once upon a midnight dreary.

Enter the following command at the command prompt:

notepad testfile.txt:hidden

and lo! While I pondered weak and weary is still there!

Whats going on?

The line While I pondered weak and weary is in a stream whose name is hidden. Because it has a name, it is called, oddly enough, a named stream. The semantics for getting at a named stream are to take the file name (testfile.txt), append a colon (:), and append the stream name (hidden). The only way to get this past the command-prompt parsing in notepad parsing is to surround the whole string with double quotes.

Thats how we get the command

notepad testfile.txt:hidden

What about the midnight dreary string? Its contained in whats called the default or unnamed stream. The unnamed stream is what you get if you dont specify a stream name.

If we look at the MFT record for the file testfile.txt, it looks like this:

000000: 46494c45 30000300 4155925e 00000000 FILE0...AU.^....
000010: 0c000100 38000100 a0010000 00040000 ....8...........
000020: 00000000 00000000 04000000 c1620000 .............b..
000030: 0c000000 00000000 10000000 60000000 ............`...
000040: 00000000 00000000 48000000 18000000 ........H.......
000050: d00c70db f316c201 8070ad72 f816c201 ..p......p.r....
000060: 8070ad72 f816c201 8070ad72 f816c201 .p.r.....p.r....
000070: 20000000 00000000 00000000 00000000 ...............
000080: 00000000 82010000 00000000 00000000 ................
000090: 00000000 00000000 30000000 78000000 ........0...x...
0000a0: 00000000 00000200 5a000000 18000100 ........Z.......
0000b0: 05000000 00000500 d00c70db f316c201 ..........p.....
0000c0: d00c70db f316c201 d00c70db f316c201 ..p.......p.....
0000d0: d00c70db f316c201 00000000 00000000 ..p.............
0000e0: 00000000 00000000 20000000 00000000 ........ .......
0000f0: 0c037400 65007300 74006600 69006c00 ..t.e.s.t.f.i.l.
000100: 65002e00 74007800 74000000 00000000 e...t.x.t.......
000110: 80000000 38000000 00001800 00000100 ....8...........
000120: 1b000000 18000000 4f6e6365 2075706f ........Once upo
000130: 6e206120 6d69646e 69676874 20647265 n a midnight dre
000140: 6172792e 00000000 80000000 50000000 ary.........P...
000150: 000a1800 00000300 20000000 30000000 ........ ...0...
000160: 68006900 64006400 65006e00 2e007400 h.i.d.d.e.n...t.
000170: 78007400 00000000 5768696c 65204920 x.t.....While I
000180: 706f6e64 65726564 20776561 6b20616e pondered weak an
000190: 64207765 6172792e ffffffff 82794711 d weary......yG.
0001a0: 00000000 00000000 00000000 00000000 ................
0001b0: 00000000 00000000 00000000 00000000 ................
0001c0: 00000000 00000000 00000000 00000000 ................
0001d0: 00000000 00000000 00000000 00000000 ................
0001e0: 00000000 00000000 00000000 00000000 ................
0001f0: 00000000 00000000 00000000 00000000 ................

If you look at line f0, you can see the unicode name of the file.

If you look at line 120, you can see the data contained in the unnamed stream.

If you look at line 160, you can see the unicode name of the named stream. (Yes, its named hidden.txt, but whats a boy to do? We used notepad after all.)

If you look at line 170, you can see the data contained in the named stream.

Pretty neat, huh?

So, what problems are there with using data streams?

Well, as weve seen, only the UNNAMED data stream shows up in a DIR command or EXPLORER details. You cant tell how much data is contained in any named streams using standard user-level tools.

In fact, sometimes a file says its zero bytes, yet theres mega- or gigabytes in the named stream!

The problem with the lack of display data is that a malicious user could hide 12GB of data in a named stream, thus making your file server volume run out of space, and youd have no way to figure out whos eating all the space. This would be considered a classic denial of service attack.

Another problem, is, of course, that streams dont exist on FAT. If you copy a multi-stream file to a FAT volume, something funny will happen. All the streams will disappear except the unnamed stream.

So, we see that once an application begins to use streams, it can only run on NTFS volumes after that!

Posted by Michael at October 22, 2006 11:10 PM

Comments

Post a comment




Remember Me?