Announcement

Collapse
No announcement yet.

Parsing Compressed Folders

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Parsing Compressed Folders

    Hey, hi, how are ya. I wouldn't be Gypsy if I wasn't trying to parse something so here is my latest parse and insight on how to critically think. The thing about parsing is, it's always a puzzle, some puzzles (parsing .pls files) are very very simple and others can be quite a bit more complicated (parsing JSON). I had an idea and to make it happen I needed to switch gears and go from parsing delimiters to parsing file directories.

    Let's fill in some blanks first. I have a Zip library that does everything under the sun, except give me an actual directory structure. Instead of:

    folder/
    ..subfolder/
    ....file

    I get:

    "folder/subfolder/file"

    essentially you could say, instead of giving me a structure it spits out all the files as if their entire path is their name. This simply will not do. I didn't write the Zip library and I decided it would be much easier to extend the library than to track down and change one of it's primary functions.

    This led to a whole slew of problems. The ultimate goal is to have an Object that accurately represents the entire directory/file structure. Below is an example of the results I wanted

    Code:
    entire_zip:
    {
    	folder1:
    	{
    		file1:contents,
    		file2:contents
    	},
    	folder2:
    	{
    		folder55:
    		{
    			file82:contents,
    			file99:contents		
    		}
    	}
    }
    this way I can get files like this in my code

    entire_zip.folder2.folder55.file82

    Now I will officially explain how I solved this puzzle.

    The first thing to consider is whether or not something already exists. folder55 is a good example. Technically there are 2 paths that will have folder55 and it's imperative that we create it if it doesn't exist and skip it if it does. This is when I had the idea to make a reverse comparison stack (just made that up).

    Essentially what I do is take the full path that the zip provides me with and I chop it into an array on every "/". So "folder1/file1" becomes array("folder1","file1"). I then loop backwards through the array creating every stage of possibilities. Let make this our example path:

    "folder1/folder2/folder3/file"
    I chop that into
    array(folder1,folder2,folder3,file)

    I then use another array to store results

    storage[0] = array[3] //file contents
    storage[1] = Object()
    storage[1][array[2]] = storage[0]; //folder object and file contents
    storage[2] = Object()
    storage[2][array[1]] = storage[1] //both folders and file contents
    storage[3] = Object()
    storage[3][array[0]] = storage[2] //all folders and file contents

    you see? each index of the array is a completed portion from it's position clean out to the file contents. That's half of the trick. Now we have to compare our final object to each array position until we get one that doesn't already exist.

    That's easier said than done though. Obviously every time we compare if there is an existence we are also obligated to assimilate that new nest if it already exists.

    for instance
    Object[someName] (does this equal) array[2] = for examples it does
    now we have to become Object[someName] so we can check from there if the next array index is a match. If we did this

    Object = Object[somename] we would actually be truncating our main object... saying "now you only equal yourself from this point". This is where reference is key.

    temp = @Object

    now temp points to Object and when we want to go a nest deeper we just tell temp to act as a pointer to that new nest. Voila' our object gets populated by proxy.

    This was very hard to try and explain simply. I know I didn't do the best job. It is not clear at all, by any means that a reverse array stack comparison assignment by reference loop was the solution to my problem. As a matter of fact, before yesterday I had never even conceived of such a thing.

    My success in achieving my goal and concocting such a system was derived from NOT relying on my knowledge. If I would have relied on my knowledge who knows how many damn loops and crazy shit I would have made to finally get my results. My current script is only 134 lines, easily 50 of those are loading the zip and package declarations/imports/etc. So, 84 lines (approx) to PERFECTLY parse files and their directory structure into an object.

    When I first set out to solve this problem I was thinking it was going to be hundreds upon hundreds of lines, and I even began writing hundreds upon hundreds of lines. A little coffee, some secret agent music and about 30 minutes of staring at the wall - gave me ideas that shortened the script substantially.
    Last edited by MadGypsy; 04-23-2014, 12:17 PM.
    http://www.nextgenquake.com

  • #2
    I don't feel like I explained this very well, so here is the code for the primary parsing function. It runs one time for every file the zip loaded. The whole problem was that the zip library refuses to spit out directories, so this script literally only runs for an actual file, all of them.

    You may also notice that my temp variable does not refer to structure by reference (@syntactically). Any time you make a variable equal another of type Object (in AS3), the reference is automatic. I would have to call .prototype or .clone to make it unique.

    Code:
    //nodes will be like nodes("folder1,"folder2","filename")
    private function heirarchy(nodes:Array, content:*):void
    {
    	var heirs:Array = new Array();	
    	var heirname:String, index:int;
    			
    	while (nodes.length)
    	{
    		heirname = nodes.pop();	
    		index = heirs.length;	
    		if (!index)
    		{	//store the filename and contents
    			heirname = heirname.replace(/\./g, "$");
    			heirs[index] = new Object();
    			heirs[index].name = heirname
    			heirs[index].data = new Object();
    					
    			switch(heirname.split("$")[1])
    			{
    				case "xml":
    					heirs[index].data[heirname] = new XML();
    					heirs[index].data[heirname] = content as XML;
    					break;
    				default:
    					if (heirname.match("png|jpg|jpeg|bmp") != null)
    					{	heirs[index].data[heirname] = new Loader();
    						heirs[index].data[heirname] = content as Loader;
    					} else {
    						heirs[index].data[heirname] = new String();
    						heirs[index].data[heirname] = content as String;
    					}
    					break;
    			}
    		}
    		else
    		{	// Ex: heirs[ file, sub-folder/file, root/sub-folder/file ] - where "/" represents a nesting of objects
    			heirs[index] = new Object();
    			heirs[index].name = heirname;
    			heirs[index].data = new Object();
    			heirs[index].data[heirname] = new Object();
    			heirs[index].data[heirname] = heirs[index - 1].data;
    		}
    	}
    			
    	var temp:Object = structure;
    	var current:Object = new Object();
    			
    	if (heirs)
    	{
    		while (heirs.length)
    		{
    			current = heirs.pop();	
    			if(!temp[current.name])			
    			{	
    				temp[current.name] = current.data[current.name];	
    				return;
    			} else temp = temp[current.name];	
    		}
    	}
    }
    Edit: Oh, right, I never put all of this into context. I'm making a datagrid that populates itself based on directory structure. So,, let's say that this is a convention.

    Code:
    nameOfDatagrid/
    	data/
    	.	category1/
    	.	.	subCat1/
    	.	.	.	file1
    	.	category2/
    	.	.	file1
    from the above structure example:

    nameOfDatagrid - would be a folder named the same thing as the element on the stage.

    data - the folder that my script expects all datagrid data to come from

    category(num) - these folder names would populate a dropdown list that initiates the display of the selected folders data

    subCat - IF subCat, subCats name gets loaded into a button that toggles a submenu containing "file1" (all files in this dir)
    if NOT subCat the name of the file gets loaded into a button that initiates the display of the file data when clicked.

    So, ya see, I had to parse out the zip directories, in order for the names of the folders to have the ability to become buttons and menus.

    Object["full/path/to/file"] = file contents

    wasn't gonna get the job done. Now it's

    Object["full"]["path"]["to"]["file"] = file contents

    This may lead you to one other question? Why use a zip? Why not just get the directories raw from the file system? Well, you can't do that in Flash. And for good reason. Imagine if you were playing (ex) a flash video game that was on some website somewhere and it decided to start fucking with your directories. It's a security issue. I could ask you one file at a time if you want to load files that I can only assume are on your system... lol, that aint gonna work.

    So, I can't read your local filesystem (without permission) but I can read the contents of a zip. Actually, the Zip library I have allows me to read and write directly to the zip. Technically I could include a totally empty zip with a flash app and treat it like a local filesystem. I don't have to ask your permission to read or write to it either. This makes flash very powerful as a web UI. Try doing anything even similar to this in javascript. You could maybe save a little string of text in a cookie but you aren't going to to parse and populate a zip file on the users system.

    I could also use this system to read/write a zip on a server, as long as the flash app resides on the same domain or the remote domain it accesses has a policy file granting it permission.

    AIR ("super flash") allows you full filesystem access. I could see myself moving my project to AIR as well, in the future. It would take zero effort. I just go project/properties/compile type [select AIR]

    lol, it's all AS3. The difference is whether it's meant to run in a browser (flash player) or a standalone shell (AIR player). Ironically, flash player can be run standalone as well but, it's treated with the same restrictions as if it were running in a browser.
    Last edited by MadGypsy; 04-23-2014, 01:43 PM.
    http://www.nextgenquake.com

    Comment


    • #3
      Originally posted by MadGypsy View Post
      Try doing anything even similar to this in javascript. You could maybe save a little string of text in a cookie but you aren't going to to parse and populate a zip file on the users system.
      You can store stuff in local storage and there are zip libraries/tools for Javascript.
      Quake 1 Singleplayer Maps and Mods

      Comment


      • #4
        Really? They work without prompting the user?

        Meh, it doesn't matter. There is a whole lot that javascript can't do that I will be doing with this feature later on.
        http://www.nextgenquake.com

        Comment


        • #5
          Teh Awesome

          Oh man I'm stoked. I realized yesterday how I could bring my previous concepts with parsing delimiters to the table for this zip shit. Let's do a back-up so you can see how it's all the same stuff.


          parsing delimiters = converting object notation (in string format) into a qualified Object that flash can recognize. Then save that object as bytes in AVM format so the next time flash needs the object it doesn't have to parse delimiters. It just opens the byte file and :magic: it is already a readable object.

          parsing compressed files = this wasn't even supposed to be necessary but due to a lack of a clear directory structure being returned, I basically just parse this entire thing into an object that accurately represents the zip directories and files

          how all that comes together:

          It struck me that in parsing delimiters I am saving an Object to bytes. In parsing compressed folders I am making an Object, so reason would imply that I could also save that Object to bytes and eliminate the need to parse the zip into an object

          There is a little bit of a "fuck" with this. I can only save base types (object, array, number, string, etc) to a byte object. This isn't because that's all that is possible, it's because going into more involved types means you also have to save everything they inherit from. This would make saving all this to bytes pretty useless if the file is bloated with a bunch of class data that can be accessed a better way.

          That's ok though, I can work with the base types through byte files and all the extra stuff (images, music, etc) can just stay in the zip, as is.

          ---

          side info; I made a pls parser that parses all pls data to an object. .pls files are playlist files for almost every respected media player. The reason I just threw this info out will make sense in a second

          ---

          My results:

          I had a zip that contained 70 .pls files. The filesize for the zip was 22.9kb. I ran the zip through my zip parser and saved the resultant object as bytes. The new byte file was 18.6kb. So right there I have saved over 4kb. That doesn't seem like much but this is about to get even better.

          The pls files in the zip are just one aspect of everything I need to put in the zip to make my app work. That being said, the current conversion of the pls files to byte object still needs to be stored in the zip. Are you ready for this? My zipped pls byte object is 5kb. That's less than 25% the original size and it is the exact same data AND reading the byte file is an instantaneous Object that eliminates having to parse all the zip directories for that thing.

          ----

          Imma say this is a win/win/win situation. As long as I craftily assemble my zips so I can get the right balance of byte object vs other data, I can knock lots of parsing time and file size off of the completed zip. After all, before I had to traverse 3 or 4 directories and 70 files. Now I get one (even smaller) file and it's instantly the results.

          __________________________________________________ __

          Appendage:

          It's the crack of dawn (looks out the window) literally and I don't think anyone has even read this yet so instead of making a new post imma just add stuff here.

          I took everything above to another level. It's not apparent in my results. It's more apparent in my implementation. I wrote the super load anything and everything script. An hour or so ago, I had my load zip script, my load anything that is not a zip or image script and my load image script. 3 separate scripts that were fairly specific as to what they load. This is because it takes different classes to load these things. For instance "someFile.txt" can be loaded with URLLoader class, whereas "someFile.jpg" needs to be loaded with Loader class. I homogenized all of that shit and then took it all to another level and had my loader parse and/or type all contents before it returns the results. So now I just do this
          Code:
          with(new FileServer("some url"))
          {
          	addEventListener(Event.COMPLETE, handler);
          	addEventListener(ErrorEvent.ERROR, handler)
          }
          and when the callback is made to handler I either get error results (if there was any error at any point) or I get completely ready results. These results can come from anywhere and be anything. In other words, we aren't talking about only zips anymore. If you supply a valid url it will go get it no matter what it is and as long as there is no security sandbox issues it will go get it no matter where it is.

          How convenient, right? 4 lines of code to instantiate the loading and preparing of any file you can conceive. I have made many tests of the results and the types are correct every time. So far, everything I expect to happen and how I expect everything to be returned to me is perfect.

          I'll tell ya'. Being able to feed a function nothing but a URL and get back all data, completely ready, no matter what that data is or where it came from is one gigantic breath of fresh air. The only fuqup I can make at this point is supplying an incorrect url or trying to access a url which is outside of the security sandbox. For the former, we are talking about only one line of code in one spot, which makes tracking down the error effortless (change the url). In the case of the latter my Error handler will catch it and there is nothing I can do about accessing a url outside of the security sandbox other than let the user know that the url is a SecurityError.

          In short, there is no way to fuck up and there isnt some stack or chain of classes to hunt through for errors.

          ---

          This is all starting to come together into a tight knit unit. I need to work more on my save and addToZip features. Getting files is done. Saving them however, is not. I think I'm going to try and do the same thing for the save script that I did for the load except instead of a url it will receive the content as an argument (maybe a filename too).

          Having a completed save/load script of this versatility is going to go a long way in adding power to my app. However, that is not the final straw on data/file gathering. the next step is to write an ExternalInterface which can communicate with php (and hence SQL). So, not only will I be able to retrieve file data from any source, I will also be able to retrieve database data. I believe that at that point my core data system will be complete. Oh, almost, I still need to write a script that will import a remote SWF, rape it of all of it's classes and add them as assets to my app at runtime. This actually needs to be added to the load script that I said was finished... (dang - hmmm 6:39 got 2 hours before work....Imma go do that now.)
          Last edited by MadGypsy; 04-25-2014, 05:40 AM.
          http://www.nextgenquake.com

          Comment


          • #6
            FUCK TMOBILE! Fuckin post eating, never working, graaaaaaaaaaaaaaawwwwwwwwwwwggggggggggg

            Please somebody run a cable to my house so I can get rid of this shitty internet.
            ---
            Anyway, I finally got my SWF class raper to work. It was a nightmare. Even the AS3 language and components reference that is supplied BY ADOBE, THIS YEAR got it wrong.

            Numerous posts from like 2005 to 2007 got me working in the right direction but their stuff was wrong too. However, their stuff was probably correct back then. In the end it took a whopping TWO lines. TWO! almost 4 HOURS for TWO lines! I can at least say that the lines aren't in the same class. I can also say that if I followed Adobes example it would have taken 2 whole classes.

            Here's my awesome version of how to do it


            Right before you run loader.load(url, context)

            var context:LoaderContext = new LoaderContext();
            context.applicationDomain = ApplicationDomain.currentDomain;

            and then when the content is returned

            var myClass:Class = theData.loaderInfo.applicationDomain.getDefinition ('some.namespace.myClass') as Class

            That's it! 4 hours for that. I know what you are thinking too. 'Damn Gypsy, you learn math in Common Core or summin? That is THREE lines.". No it's not.

            var context:LoaderContext = new LoaderContext();
            context.applicationDomain = ApplicationDomain.currentDomain;

            is cleaner than

            var context:LoaderContext = new LoaderContext(false/*check for policy file*/, ApplicationDomain.currentDomain);

            Ironically though, if I was asked in Common Core what 1 + 1 is and I answered 3 but then gave them the reverse of my example, I would get marked as correct. Those people are whack-jobs. 1 + 1 is 2 and thats how many lines I added. I could break my "3" lines into 10 lines and it would just be more drawn out. It would still be 2 lines.
            http://www.nextgenquake.com

            Comment


            • #7
              is there actually any need for splitting everything into separate directories? full paths means less hash tables.
              you might also want to check that zip files larger than 4gb work properly (yay for zip64 support).
              and that unicode works, for the luls (yay utf-.
              Some Game Thing

              Comment


              • #8
                @need for directories


                Yes, actually I already explained it, but I know it's buried in a wall of text so I'll recap. The directory structure and contents directly populates the data grid. So, top level folders become categories, sub folders become sub-categories and files populate menus.

                181FM
                ....Pop
                ........Awesome 80's
                ........90's Pop
                ....Country
                ........Kickin Country

                You get the picture, I'm sure. However, I don't need to parse the directories in release version. That's what the AVM Byte Object is for. Actually it would be easiest (and accurate) to say the byte object is the stored results of the parse. Solo it is smaller than the zip and has the exact same information, already ready to go. Once I zip that Object it gets SUPER small. It goes from 18.6 to 5kb, whereas the same information pre-parsed and zipped is 22.9. So, I'm pretty set on using a combination of regular zip and AVM Objects that are the results of other zips. Really I just can't move past your standard base types for the AVM Object. So the only thing that should be in the zip aside from the AVMObj are audible and visual data (flv,f4v,mp4,mp3,wav,png,jpg,gif,swf)

                [email protected]

                pbbbbbbbbbbbbbbbt, I haven't made it to 4 megs much less 4 gigs Actually, I haven't even made it to 40k. Assuming I ever finish this never-ending idea, anyone that wanted to package 4 gigs of stuff with it needs to be shot.

                This is simply not meant for that. I'm shooting for a really powerful back-end for gathering data. I even added stuff simply because it was possible. Like my SWF raper. I'm not gonna use that shit. I'm writing the whole program. Why would I avoid compiling the classes into my app. That's just some external method for OTHER PEOPLE to extend my work. That's the focus of this app though. Data gathering. I'd even gather your DNA if flash had a way for me to do it. Of course where there is data gathering there is data displaying so that is the other end that I intend to focus on very heavily.

                If you have 4 gigs of stuff to put in an app, you don't need a zip, you need a cloud server.

                @unicode works

                What do you mean? If you mean works in the zip - all I can say is, every single thing I have put in the zip works and I have put everything I can think of that would be used in flash. If you mean something else I don't understand the question.
                http://www.nextgenquake.com

                Comment


                • #9
                  I was reading the AS3LCR (language and components reference). I like to do that sometimes cause there is a whole lot of stuff in this language and I like discovering things I have either forgotten or never knew.

                  Today turned out to be one of those 'forgotten' days and I stumbled on a little gem.

                  label - it works like this

                  Code:
                  someLabelName:
                  loop
                  {
                  	loop
                  	{
                  		break someLabelName;
                  	}
                  }
                  If we got rid of "someLabelName" and ran that concept it would break out of the inner loop and continue in the wrapper loop. By adding the label it breaks out of all loops and starts the looping over.

                  That can be really handy for some of the stuff I do. I live in a damn loop (I mean that in every perceivable way).

                  I have to hand it to Adobe. Their LCR is very well thought out and constructed. Every last possible thing is explained in sufficient detail and the more complex classes come with a fully working example. I don't think there is a spot in the document which mention other libraries that isn't clickable. Like when a class has properties of a return type which is not a member of that class. The return type is displayed and is clickable. You would have to be completely illiterate to get lost in the AS3LCR. The examples they give are the bare bones minimum to completely accomplish the goal so, there is plenty of room for the user to elaborate.

                  I've read lots and lots of docs, AS3 is my favorite and PHP is next. SQL docs make me want to cry. Python docs are no fun either. JQuery docs are pretty good. I think thats all the docs for the languages I know. The rest of the languages don't really have docs. There is probably some Javascript docs but I prefer to just roll with JQuery. Really you would have to be a masochist not to use some javascript framework. Why spend all that time writing everything 2 and 3 different ways to satisfy all the browsers? Not to mention that you would also have to write a hell of a lot more to accomplish anything in raw javascript.

                  What were we talking about? LOL!





                  I did it on purpose
                  Last edited by MadGypsy; 04-26-2014, 12:58 PM.
                  http://www.nextgenquake.com

                  Comment


                  • #10
                    I said that I put every file I could think of in a zip and got the results that I expected every time. Well, half of that sentence is the absolute truth and the first half is not true.

                    Right, right before I came back to working on this display engine thing, I had just made a stream player. When I started parsing the zip I guess I totally took it for granted that my stream player just needed to be massaged into shape to play mp3s from a zip. Boy was I wrong. It is incredibly complicated to get a sound to play from a zip. Especially when adobe didn't add the ability to load bytes into a sound until version 11 and I was apparently using 10.6 (even though I have up to 11.8 :/ whatever.). That makes it incredibly complicated (impossible) to load bytes from the zip into the sound class. However, even after realizing that, it was still a pain in the ass.

                    I got it to work though. I made damn sure it really works too. I have a zip file with 5 mp3s in it. I parse them out into an object and run a controlled loop on the object that is governed by SOUND_COMPLETE events. I even set it so that if I click the stage the music jumps forward 1 whole minute (but never exceeds song length), even after skipping through a song it will still play the next song. Let me tell ya, for the longest time, it wouldn't. For the longest time it wouldn't do anything, except mock me with proper bytesTotal values.

                    Now! I have put every file I can think of that would ever be used with what I am making into a zip and Objectified in flash with the results I expect. FUNC! except MP4.....grooooaaannn, it never ends. Actually, Imma make it even harder. I will also make it possible to gather any visual or audio asset from an swf. Right now an swf can be loaded on the stage just like you would in a browser or it can be used as a remote package of runtime classes. I'm going to wipe the board clean, if it's something you can put in an swf, it will now also be something you can get out of an swf.

                    swc's are zip files that contain an xml descriptor and an swf full of classes. Imma parse that too. That will pretty much cover every file possibility imaginable, real versatile. Then comes the php/sql connections. I already have a whole database management system and a database too. That was 2 months agos project. One day, the twain projects shall meet.
                    http://www.nextgenquake.com

                    Comment

                    Working...
                    X