Archive for December 11th, 2011
So… Since my dedup program has… kinda stalled while I try to figure out how to implement the actual identifying and displaying duplicate files, I’m looking at other things to do. I finally got back to dA recently, and discovered they opened a new upload/file storage thing called sta.sh. Interestingly enough,the announcement had a developers section, and as I thought, it’s got a public API.
Now, for my pictures, I tend to (well, as of this post, always) upload my pictures to dA. So a publish thing in Lightroom would be nifty, and might have some use for other people. Though I’m a bit iffy about usefulness, because you can also upload via FTP to sta.sh. Which would be a whole lot easier than custom plugins and the like, so I don’t think anything will come of it.
That said, resources I found:
www.deviantart.com/developers/stash – the Sta.sh API docs
www.deviantart.com/developers/oauth2 – dA’s OAuth2 API docs
www.adobe.com/devnet/photoshoplightroom.html – Lightroom SDK
github.com/ignacio/LuaOAuth – OAuth in Lua. Despite no mention of OAuth2, there’s a interestingly named ‘OAuth2.lua’ file.
regex.info/blog/lua/json – JSON in Lua
w3.impa.br/~diego/software/luasocket/ – LuaSocket – HTTP requests and whatnot
www.inf.puc-rio.br/~brunoos/luasec/reference.html – LuaSec – does https support
Though, the Lightroom API reference says “Sends or retrieves data using HTTP or HTTPS POST.”, so hopefully the above 2 files can be ignored.
In theory, that’s all that’s needed. Rip out the guts of one of the development plugins (most likely the FTP plugin), sprinkle the OAuth2 stuff here and there as necessary, with JSON for garnish, and it should work.
Emphasis on should.
Nothing ever works as expected.
Work on this project has… kinda stalled while I try to figure a few issues. I’m trying to write them out to get a better idea of what I’m actually trying to do.
Identifying the files is fairly simple – I have the content hashes of the files which I can compare. Problem is, the SQL query to get the list of duplicate files just returns the list of duplicate IDs – a direct result of GROUP BYcontent_hash, so I can’t extract the files unless I do a separate query for each of the duplicate IDs. And there’s not an option to not use GROUP BY, since I’m actually doing SELECT *, COUNT(*) GROUP BY content_hash WHERE COUNT(*) >1, and the query fails without the GROUP BY.
At this point, I’m fairly certain doing a SELECT path WHERE content_hash = xyz is the best bet, though I’m not too certain how that will scale up into the thousands of files. If each query takes 0.01sec, 1000 duplicates means the program will take 10 secs to just get the list of files – hopefully displaying will be pretty fast.
I’ll probably be implementing this next for testing.
But that’s a good lead in to my next point – displaying the duplicate files.
I was thinking a tree would be easiest. Then I could navigate the tree looking for duplicate files. But, wait. What if you want to see the location of the other duplicates? Have a second pane showing another tree with the folder highlighted? My use case for this is simple – I have files consolidated into a few folders, but if I’m removing duplicates, I’d want to remove the duplicates that haven’t been consolidated.
And, for that matter, how should I handle selecting the duplicate files? Manually is straight-forward if I have the tree – go and check each file. But automatically? Have a right-click and select ‘Check all duplicate files in this folder and sub-folders’ button? How do I make sure that whatever I delete, I’ve still got one copy?
And what happens if I’ve got 2 duplicate files in the same folder? How do I specify which file to select for deletion? Preserve the shortest filename? If one ends in a number and the other doesn’t, choose the numberless name in the assumption that the other was a copy+paste that Windows just renamed to file (2).ext? (GNOME I believe just does Copy of file.name, so that’s easy. Actually, I think Windows does the same if you copy & paste in the same directory, so, huh.)
SO MANY DECISIONS. D:
So… I managed to get annoyed enough enough with my desktop slowing down that I decided to just give up and (re)install Windows 7 to an extra old hard drive. (And then muck around with partitions to get the new improved system back on the faster drive. In retrospect, should have done the move first. Must remember that for next time.)
Anyway. Steps, and what I installed: Read the rest of this entry »