Alt+Ctrl+Backspace: March 2009

2009/03/25

How to use sprintf/wsprintf with std::string/std::wstring

Explanation

One of the few problems with std::string (and std::wstring) class in C++ is the lack of sprintf function that would return std::string as result. Or sprintf-styled constructor would be nice to have. Of course, there are many alternatives to sprintf - boost format library, sstream class and QString class in Qt 4, but so far plain old sprintf is most compact and easy when compared to them. sstream requires several statements(lines) to make a simple formatted string. Unless you like it too much, it is an overkill for making small formatted string. Boost format library will require boost or a part of it. And Qt 4 QString class will require entire Qt installation, acceptance of one of few available licenses, and it still won't be as compact as sprintf.

Solution

The first thing that comes to mind is to create temporary buffer, sprintf into buffer, and then assign buffer to std::string class. When your programs grows, you'll eventually get sick of many temporary buffers, besides it would still require few lines of code. Here is better way to do that:
Str.h:


#ifndef STR_H
#define STR_H
#include 

typedef std::string Str;
typedef std::wstring WStr;

WStr swprintf(const wchar_t* format, ...);
WStr vswprintf(const wchar_t* format, va_list args);
WStr swprintf(const WStr& format, ...);
WStr vswprintf(const WStr& format, va_list args);

Str sprintf(const char* format, ...);
Str vsprintf(const char* format, va_list args);
Str sprintf(const Str& format, ...);
Str vsprintf(const Str& format, va_list args);

#endif

Str.cpp:


#include "Str.h"

WStr swprintf(const wchar_t* format, ...){
 va_list args;
 va_start(args, format);
 WStr result = vswprintf(format, args);
 va_end(args);
 return result;
}

WStr vswprintf(const wchar_t* format, va_list args){
 const int bufferSize = 16384;
 wchar_t buffer[bufferSize];
 vswprintf(buffer, bufferSize, format, args);
 return WStr(buffer);
}

WStr swprintf(const WStr& format, ...){
 va_list args;
 va_start(args, format);
 WStr result = vswprintf(format, args);
 va_end(args);
 return result;
}

WStr vswprintf(const WStr& format, va_list args){
 return vswprintf(format.c_str(), args);
}

Str sprintf(const char* format, ...){
 va_list args;
 va_start(args, format);
 Str result = vsprintf(format, args);
 va_end(args);
 return result;
}

Str vsprintf(const char* format, va_list args){
 const int bufferSize = 16384;
 char buffer[bufferSize];
 vsnprintf(buffer, bufferSize, format, args);
 return Str(buffer);
}

Str sprintf(const Str& format, ...){
 va_list args;
 va_start(args, format);
 Str result = vsprintf(format, args);
 va_end(args);
 return result;
}

Str vsprintf(const Str& format, va_list args){
 return vsprintf(format.c_str(), args);
}

This will allow to quickly create formatted string in one function call.

Problems:

This code works with gcc on Linux system, but it might require some tweaking on different compilers. For example, mingw version of vswprintf has different number of arguments (it doesn't have "buffer size" argument), so it will need to be replaced by another function. In general, wchar_t-based printf functions might cause problems when making cross-platform application. For example, swnprintf exists on MSVC compiler but is missing in gcc. On other hand, default version of vswprintf used in MinGW compiler doesn't have "buffer size" argument, so it is vulnerable to buffer overruns (linux version of function doesn't have this problem).
And yet another problem is that linux/windows versions of wprintf-related functions might handle %s and %S differently. As I remember, in mingw compiler %S in swprintf does the same thing as %s in linux version of printf and vise versa. Those problems can be partially fixed by using few hacks, but people making portable applications with swprintf should be aware of those problems.

Another problem is that there is hard-coded size limit for created strings (it can be changed, but it still doesn't look "nice" when used with C++ string classes. This problem can be bypassed for std::string classes (by using vsnprintf which returns how much characters could not be written in buffer, so you could allocate buffer dynamically, then sprintf into it, and then assign it to std::string), but not for std::wstring (vswnprintf is not available on gcc, doesn't look like standard).

2009/03/07

how to find identical files on Linux using python script

In situation where you have two directories (A & B) with a lot of files, where some files in B directory are same as some files in A directory, you can use this python script to find identical files.

find_identical.py:

#!/usr/bin/env python
import os
import sys
import commands
#import string

if len(sys.argv) != 3:
    print """not enough arguments!
usage: find_identical.py source destination"""
    exit(0)

src=sys.argv[1]
dest=sys.argv[2]

#print "src: %s, dest: %s" % (src, dest)
print "#!/bin/sh"

srcText = commands.getoutput("md5sum %s/*" % src)
destText = commands.getoutput("md5sum %s/*" % dest)

def dictFromString(val):
    lines = val.split("\n")
    result = {}
    for line in lines:
        line = line.split()
        result[line[0]] = line[1]
    return result

def listFromString(val):
    result = val.split("\n")
    for i in range(len(result)):
        result[i] = result[i].split();
    return result;

srcDict = dictFromString(srcText)
destList = listFromString(destText)
filesFound = False
for file in destList:
    if srcDict.has_key(file[0]):
        print "rm \"%s\" #identical to %s" % (file[1], srcDict[file[0]])
        filesFound = True;
 
if not filesFound:
    print "#no identical files found"
    exit(1)

Script needs two arguments - name of first and second directory. By default script prints (to standard output) shell script that would remove files from B directory which present in A directory.
So, to remove files from B which already present in A, you'll need to run:

find_identical.py A B >applychanges.sh
bash applychanges.sh

This done this way so you can examine list of files which will be removed before removing them. This behavior can be easily changed by modifying line:

 print "rm \"%s\" #identical to %s" % (file[1], srcDict[file[0]])

to something you need.

Script needs md5sum to work.
Notice, that there is already linux software available that does same thing (finds identical files). So this script is mostly useful for learning python, or as base for making another script.

How to find largest file in directory on Linux system

If you run out of disk space and would like to know which files took the most of it, us following command.

In terminal "cd" into any folder (or your home folder) and type:

du -sm *|sort -nr|less

du -sm *|sort -nr|head

Explanation:
"du -sm *" returns how many megabytes file takes - for all files and directories within current directory.
"sort -nr" sorts list according to the value of the first column (which is size in megabytes), and puts largest files first.
"less" is used to scroll output using cursor keys.
"head" is used if you want to print only first largest files.

C++ utf8 to wstring conversion routine

Some time ago I was looking for C++ conversion routine that could convert utf8 string(stored in std::string) into std::wstring and vice versa. It is certainly possible to do that using setlocale and some C functions, but I wanted something done in "pure C++". I did some research, didn't find anything, and after all wrote conversion routine myself (using information on wikipedia).

Here is the code:

Str.h:

#ifndef STR_H
#define STR_H
#include 
#include 

typedef std::string Str;
typedef std::wstring WStr;

std::ostream& operator<<(std::ostream& f, const WStr& s); std::istream& operator>>(std::istream& f, WStr& s);
void utf8toWStr(WStr& dest, const Str& src);
void wstrToUtf8(Str& dest, const WStr& src);

#endif

Str.cpp:

/*
Copyright (c) 2009 SegFault aka "ErV" (altctrlbackspace.blogspot.com)

Redistribution and use of this source code, with or without modification, is
permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright
 notice, this list of conditions and the following disclaimer.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO
EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "Str.h"
#ifdef UTF8TEST
#include 
#endif

void utf8toWStr(WStr& dest, const Str& src){
 dest.clear();
 wchar_t w = 0;
 int bytes = 0;
 wchar_t err = L'�';
 for (size_t i = 0; i < src.size(); i++){
  unsigned char c = (unsigned char)src[i];
  if (c <= 0x7f){//first byte
   if (bytes){
    dest.push_back(err);
    bytes = 0;
   }
   dest.push_back((wchar_t)c);
  }
  else if (c <= 0xbf){//second/third/etc byte
   if (bytes){
    w = ((w << 6)|(c & 0x3f));
    bytes--;
    if (bytes == 0)
     dest.push_back(w);
   }
   else
    dest.push_back(err);
  }
  else if (c <= 0xdf){//2byte sequence start
   bytes = 1;
   w = c & 0x1f;
  }
  else if (c <= 0xef){//3byte sequence start
   bytes = 2;
   w = c & 0x0f;
  }
  else if (c <= 0xf7){//3byte sequence start
   bytes = 3;
   w = c & 0x07;
  }
  else{
   dest.push_back(err);
   bytes = 0;
  }
 }
 if (bytes)
  dest.push_back(err);
}

void wstrToUtf8(Str& dest, const WStr& src){
 dest.clear();
 for (size_t i = 0; i < src.size(); i++){
  wchar_t w = src[i];
  if (w <= 0x7f)
   dest.push_back((char)w);
  else if (w <= 0x7ff){
   dest.push_back(0xc0 | ((w >> 6)& 0x1f));
   dest.push_back(0x80| (w & 0x3f));
  }
  else if (w <= 0xffff){
   dest.push_back(0xe0 | ((w >> 12)& 0x0f));
   dest.push_back(0x80| ((w >> 6) & 0x3f));
   dest.push_back(0x80| (w & 0x3f));
  }
  else if (w <= 0x10ffff){
   dest.push_back(0xf0 | ((w >> 18)& 0x07));
   dest.push_back(0x80| ((w >> 12) & 0x3f));
   dest.push_back(0x80| ((w >> 6) & 0x3f));
   dest.push_back(0x80| (w & 0x3f));
  }
  else
   dest.push_back('?');
 }
}

Str wstrToUtf8(const WStr& str){
 Str result;
 wstrToUtf8(result, str);
 return result;
}

WStr utf8toWStr(const Str& str){
 WStr result;
 utf8toWStr(result, str);
 return result;
}

std::ostream& operator<<(std::ostream& f, const WStr& s){
 Str s1;
 wstrToUtf8(s1, s);
 f << s1;
 return f;
}

std::istream& operator>>(std::istream& f, WStr& s){
 Str s1;
 f >> s1;
 utf8toWStr(s, s1);
 return f;
}

#ifdef UTF8TEST
bool utf8test(){
 WStr w1;
 //for (wchar_t c = 1; c <= 0x10ffff; c++){
 for (wchar_t c = 0x100000; c <= 0x100002; c++){
  w1 += c; 
 }
 Str s = wstrToUtf8(w1);
 WStr w2 = utf8toWStr(s);
 bool result = true;
 if (w1.length() != w2.length()){
  printf("length differs\n");
  //std::cout << "length differs" << std::endl;
  result = false;
 }
 
 printf("w1: %S\ns: %s\nw2: %S\n", w1.c_str(), s.c_str(), w2.c_str());
 
 for (size_t i = 0; i < w1.size(); i++)
  if (w1[i] != w2[i]){
   result = false;
   printf("character at pos %x differs (expected %.8x got %.8x)\n", i, w1[i], w2[i]);
   //std::cout << "character at pos " << i  << " differs" << std::endl;
   break;
  }
  
 if (!result){
  printf("utf8 dump: \n");
  for (size_t i = 0; i < s.size(); i++)
   printf("%2x ", (unsigned char)s[i]);
 }
 
 return result;
}

int main(int argc, char** argv){
 std::wstring ws(L"фыва");
 std::string s("фыва");
 std::cout << ws << s << std::endl;
 std::cout << wstrToUtf8(utf8toWStr("фыва")) << std::endl;
 if (utf8test())
  std::cout << "utf8Test succesful" << std::endl;
 else
  std::cout << "utf8Test failed" << std::endl;
 return 0;
}
#endif

Code was successfully tested on 32bit linux system (see "utf8test()" routine) and seems to work. Should work on 32bit windows platform as well, but keep in mind that wchar_t on msvc has size of 2 bytes, so on windows platform routine won't handle unicode characters in range 0xffff..0x10ffff).

If you need routine like this, feel free to use it, just don't claim you wrote it.

Code is available under modified BSD license.

How to generate password on linux system

Every user eventually will need to create secure password (alphanumeric, not a common word, etc).
On linux, you can easily generate it without additional software

To generate eight characters long alphanumeric password use following line:


tr -dc '0-9a-zA-Z' </dev/urandom |head -c 8;echo

You can put this into shell script, if you are going to need it often.

In case you are a newbie and don't understand what this line does, explanation is below.

Explanation

/dev/urandom is one of a few special files on linux/unix systems (other files are /dev/zero, /dev/null and /dev/random).
/dev/urandom provides endless stream of pseaudo-random data, but even if you paste data from that file into text file, you won't be able to use it as password, because it will containt a lot of characters that doesn't fit into ASCII charset.
So that's why "tr" comand is used. "tr -dc '0-9a-zA-Z'" reads data from /dev/urandom and then prints to output only characters that fit into given character set (which is '0-9a-zA-Z'). See "man tr" for more details.
This means that if you want use another set of characters in password, you will need change character set to something else.
For example, '0-9' means only numeric characters, '0-9a-z' - numeric or lowercase latin letters, and '0-9a-zA-Z!@#$%^&*()_+-' will include extra characters to make your password more secure.

Because /dev/urandom is "endless", "head -c 8" is used to copy first 8 characters (see "man head" for explanation).
If you need more or less characters, change number accordingly.

"echo" command without arguments simply prints a newline. It is only useful if you want to print password into terminal (because without "echo" next shell prompt will be printed right after password - on the same line).

Possible problems
On system with UTF-8 locale, you won't be able to generate password with non-ASCII letters using this method. This is because tr handles only one character("byte") at the time, and UTF8-encoded non-ASCII character will use more than one byte.

2009/03/06

Making slackware packages from source without slackbuild

Originally posted on linuxquestions.org by me.

This article describes how to build software packages for slackware linux from source code.
It is not easiest method, but it is useful for quick package creation.
If you simply want to install software, and do not want to compile it, then either download precompiled packages from www.linuxpackages.net, or get slackbuild from www.slackbuilds.org. Another option is to use src2pkg utility or checkinstall (but checkinstall is mostly broken since Slackware 11.0).

I'm posting this because some people have problems with making packages from source code, and howto where I learned how to do it isn't available anymore.

WARNING!:

howto is pretty short and doesn't dive into details too much (although I initially thought about overview of commonly used build systems).
This doesn't work for all packages, only for most of them. Method is not universal
This stuff is written for people that don't like Slackbuilds, or want to compile software without slackbuild. If you simply want to install program as fast as possible, search for compiled package on www.linuxpackages.net, or search for slackbuild on www.slackbuilds.org.

Requirements
This method will work only if following conditions are met:

program package contains ./configure script.
Makefile produced by ./configure contains word DESTDIR (case-sensitive).

Seems pretty restrictive, but, fortunately, this covers 95% of all available linux software.

If there is no ./configure, or makefile doesn't have "DESTDIR" inside, this will not work. (there are exceptions from that rule, of course, but explaing them will take too much space)

Short explanation
In this example I assume that we have extracted our package, and right now are within source code directory, where ./configure script is located. In example our program has name "programname" and version "programversion". When compiling your own package, replace "programname" and "programversion" with values you need.

Most software can be turned into slackware package using this sequence:

./configure
make
mkdir pkg
make DESTDIR=`pwd`/pkg install
cd pkg
su
makepkg programname-programversion-1xyz.tgz
chown me.users ./programname-programversion-1xyz.tgz (instead of "me" use your user name)
mv ./programname-programversion-1xyz.tgz ..
cd ..
rm -r pkg
Ctrl+D (exits "su")

If package meets requirements mentioned before, this sequence will create slackware package for you.

How it works

"./configure" - configures package according to our system.
"make" - compiles package.
"mkdir pkg" - creates directory called "pkg" within same directory where ./configure is located. This "pkg" directory will be used to
store files that will be put into package.
"make DESTDIR=`pwd`/pkg install":
This tells to the "make" command to install package as if "pkg" subdirectory (that we created in current directory) were root "/" folder.
"`pwd`" inserts output of "pwd" command into current command line. DESTDIR is a variable used inside makefile. DESTDIR specify the root of installations, and because GNU make allows to override some variables during compilation, we can change it, and say that "pkg" subdirectory is a root of installation.
"cd pkg" we are going into "pkg" subdirectory.
"su" (enter password here) - changing privilegies to root, because makepkg (which is used to create packages) won't work with other privilegies.
"makepkg programname-programversion-1xyz.tgz" This creates slackware package without description. Answer "yes" to all questions (well, unless it doesn't suit your needs). Also, notice "1xyz" part of the filename instead of numbers. When you build custom packages, you need to be able to distinguish them from stock packages. To do that, instead of standard numbers numbers, use custom signature, something like "1abc", "2abc", etc. Use whatever combination you want (I use number + my nickname), as long as you can distinguish your package from packages made by other people.
"chown me.users ./programname-programversion-1xyz.tgz" Created package has "root.root" owner. If we want to be able later to move it somewhere, we need to change owner to our user ID.
"mv ./programname-programversion-1xyz.tgz .." We are moving our package into the source directory - i.e. one level up.
"cd .." Going one level up. (back into directory with "configure")
"rm -r pkg" we are cleaning up and removing our pkg subdirectory.
"Ctrl+D" - this equivalent to "exit" command which ends "su" session.

Done
Now you have your slackware package.
If something doesn't work, or you don't understand something, ask it on slackware forums.

Few Notes
Package created this way normally will end in "/usr/local" instead of "/usr". This is because "/usr/local" is default location for user-built packages. "/usr/local/" is fine if you are creating packages for yourself, but, for example, linuxpackages.net rejects packages that are not in "/usr". So if you want to distribute your packages, or submit them to linucpackages.net, then change installation location from "/usr/local" to "/usr" using --prefix configure option.

How to answer questions without wasting too much time

This text was originally posted on www.linuxquestions.org (here). It was created because there is already famous "How to ask questions the right way", but I don't remember any document for those answering questions.
Information should be useful for people that hang out on various forums/newsgroups answering questions and solving other people problems (mostly useful for linux users).

Okay, I've just got through another flamewar, so I decided to write a some info about answering questions without wasting too much time in the process. The content is based on personal experience, my own point of view, and isn't supposed to be absolute truth or something. I also don't expect someone to agree with me. Recommendations are written in random order, and are supposed to prevent wasting too much time typing replies, or breaking your keyboard too quickly.
Beware! Some people probably might find this thing offensive.

The recommendations are based on following assumptions:

You want to help other people to solve their problem.
You don't want to spend many hours per day doing that.
You want to get satisfaction from giving out info - i.e. people should be grateful, or discussion should be interesting.
You do not want to live on the forum, just post answers to some threads.
You don't want to have a bad mood after helping someone.

Here are recommendations:

Always remember that you can ignore other people instead of trying to reason with them. By "ignoring" I mean ignore list, which is located at this page on linuxquestions and often present in other forums. Ignore list is a very handy feature.
Never participate in threads about religion. There were many of those, and when there is a clash between believers and atheists (or simply followers of different religions), no matter which side you take, you'll never prove to "them" that you are right. Also, your opponents will never prove you that you are wrong. The thread will eventually degrade into pillow-fight, and someone will close it.
Never post in "Linux vs Windows" threads. Yes, this is tempting, but should be avoided. Such threads either never ends or many of them eventually degrade into flamewar. Which side "wins" in case of flamewar depends on forum, but in most cases, moderators win. Providing non-biased info in such threads is difficult, and sometimes leads to disappointment - especially when you discover that someone asked the question you already answered in details one year ago in the same thread.
If are feeling too emotional while typing a reply, do not type a reply. If you were offended, report incident to moderators. If you are angry, go outside and take a walk. If you don't like someone, ignore him or her (by adding into ignore list). Being furious or simply emotional causes too much typing. And too much typing means too much time wasted.
When someone posts something that doesn't fit into your system of beliefs, and it enrages you, do not try to explain something to that guy/girl, ignore him/her. If he posted something gross, then report it to moderators. To my experience, trying to explain something to someone who don't want to listen is #1 cause of wasting time on the internet. This applies to the choice of distribution, operating system, religion, some questions about women, and so on. Same principle applies to trolls or any person that pisses your off.
Do not reply in the threads started by spammers. They don't read it.
When replying to "newbie" section, try to keep your replies extremely short, but informational. It is tempting to overwhelm newbie with the amount of things you know, but this guy might not need it. He might be simply interested how to watch DVD, and not in the mood for linux history lesson or detailed comparison of all available dvd-players.
When you realize that you started to type extremely long, detailed instructions about "how to do something in linux", stop, put keyboard away, take a deep breath, relax, and then try to find howto on the subject you were writing about. There is high chance that someone already wrote what you were going to write right now, so you'll be reinventing wheel. If there is a howto on the subject, post a link to it or quote it (but still provide link). This will make howto more widespread and will save time for other people.
When you found a good howto or manual, and want to put a link to them in reply, consider quoting them. Websites sometimes disappear, and seeing dead link 3 years later (and answering same thing again) isn't much fun. If the source of article doesn't look "stable" enough, quote good portion of it (but still provide link). If website is unlikely to disappear, then link should be enough.
If you really pissed someone off, and was unable to pacify that person within small amount of replies (1..3), consider ignoring that person. This is because when someone thinks that you are "evil person that hates all people" (no matter what were your intentions, what did you post, how friendly you were, people might think that about you), you can waste considerable amount of time reasoning with that person and explaining your position. If you started such "clarification conversation", observe person's reaction, and if person is unlikely to change attitude quickly, don't waste your time. Some people can be reasoned with, some can't.
When you see newbie poster (which has 1 message on the forum) with a short question (literally short - one statement with poor punctuation, small amount of info, etc), and you want to write really long answer, stop, and think about it again. Poster might be "fly by homework question author" - i.e. he will ask question, and then mysteriously disappear, while other people will waste their time explaining things to him and writing huge replies for few weeks. Not all newbies magically disappear, but you should consider that possibility.
Don't do other people's homework, they normally won't truly appreciate it, and this will spoil your mood and take your time. I.e. if the guy asked how to make some script, do not engage in 3 hours of googling, researching and testing just to make that script for him, because in the end his "thanks" might not satisfy you. To my experience, answers that were written with little effort (link to howto, article) generate more positive emotional feedback (for you - i.e. you'll feel more happy) when people say "thanks" than answers where you spend hours to find solution. Making other people's homework makes sense only if the problem is extremely interesting for you and you really like to solve complicated problems. Notice, that even if you solved problem you still can choose not to provide info to the OP, if you think he is too lazy or didn't do his homework.
Do not live on the forums. If you are checking forums for new messages every 3 minutes and looking for _any_ discussion to participate in, and occasionally attempt to derail existing threads, then you should probably turn off computer and go jogging, watch the movie, take dog for a walk, or do anything that is fun, takes at least hour and doesn't involve computers. Helping people is fine, forums are good, but, unfortunately, for some people participating in such discussion is a bit addictive. And when you start spending too much time on the forums, quality of your answers decreases.

Now this is it. I hope this information will be useful for someone.

Alt+Ctrl+Backspace