Alt+Ctrl+Backspace: snippet

Showing posts with label snippet. Show all posts

2009/03/25

How to use sprintf/wsprintf with std::string/std::wstring

Explanation

One of the few problems with std::string (and std::wstring) class in C++ is the lack of sprintf function that would return std::string as result. Or sprintf-styled constructor would be nice to have. Of course, there are many alternatives to sprintf - boost format library, sstream class and QString class in Qt 4, but so far plain old sprintf is most compact and easy when compared to them. sstream requires several statements(lines) to make a simple formatted string. Unless you like it too much, it is an overkill for making small formatted string. Boost format library will require boost or a part of it. And Qt 4 QString class will require entire Qt installation, acceptance of one of few available licenses, and it still won't be as compact as sprintf.

Solution

The first thing that comes to mind is to create temporary buffer, sprintf into buffer, and then assign buffer to std::string class. When your programs grows, you'll eventually get sick of many temporary buffers, besides it would still require few lines of code. Here is better way to do that:
Str.h:


#ifndef STR_H
#define STR_H
#include 

typedef std::string Str;
typedef std::wstring WStr;

WStr swprintf(const wchar_t* format, ...);
WStr vswprintf(const wchar_t* format, va_list args);
WStr swprintf(const WStr& format, ...);
WStr vswprintf(const WStr& format, va_list args);

Str sprintf(const char* format, ...);
Str vsprintf(const char* format, va_list args);
Str sprintf(const Str& format, ...);
Str vsprintf(const Str& format, va_list args);

#endif

Str.cpp:


#include "Str.h"

WStr swprintf(const wchar_t* format, ...){
 va_list args;
 va_start(args, format);
 WStr result = vswprintf(format, args);
 va_end(args);
 return result;
}

WStr vswprintf(const wchar_t* format, va_list args){
 const int bufferSize = 16384;
 wchar_t buffer[bufferSize];
 vswprintf(buffer, bufferSize, format, args);
 return WStr(buffer);
}

WStr swprintf(const WStr& format, ...){
 va_list args;
 va_start(args, format);
 WStr result = vswprintf(format, args);
 va_end(args);
 return result;
}

WStr vswprintf(const WStr& format, va_list args){
 return vswprintf(format.c_str(), args);
}

Str sprintf(const char* format, ...){
 va_list args;
 va_start(args, format);
 Str result = vsprintf(format, args);
 va_end(args);
 return result;
}

Str vsprintf(const char* format, va_list args){
 const int bufferSize = 16384;
 char buffer[bufferSize];
 vsnprintf(buffer, bufferSize, format, args);
 return Str(buffer);
}

Str sprintf(const Str& format, ...){
 va_list args;
 va_start(args, format);
 Str result = vsprintf(format, args);
 va_end(args);
 return result;
}

Str vsprintf(const Str& format, va_list args){
 return vsprintf(format.c_str(), args);
}

This will allow to quickly create formatted string in one function call.

Problems:

This code works with gcc on Linux system, but it might require some tweaking on different compilers. For example, mingw version of vswprintf has different number of arguments (it doesn't have "buffer size" argument), so it will need to be replaced by another function. In general, wchar_t-based printf functions might cause problems when making cross-platform application. For example, swnprintf exists on MSVC compiler but is missing in gcc. On other hand, default version of vswprintf used in MinGW compiler doesn't have "buffer size" argument, so it is vulnerable to buffer overruns (linux version of function doesn't have this problem).
And yet another problem is that linux/windows versions of wprintf-related functions might handle %s and %S differently. As I remember, in mingw compiler %S in swprintf does the same thing as %s in linux version of printf and vise versa. Those problems can be partially fixed by using few hacks, but people making portable applications with swprintf should be aware of those problems.

Another problem is that there is hard-coded size limit for created strings (it can be changed, but it still doesn't look "nice" when used with C++ string classes. This problem can be bypassed for std::string classes (by using vsnprintf which returns how much characters could not be written in buffer, so you could allocate buffer dynamically, then sprintf into it, and then assign it to std::string), but not for std::wstring (vswnprintf is not available on gcc, doesn't look like standard).

2009/03/07

how to find identical files on Linux using python script

In situation where you have two directories (A & B) with a lot of files, where some files in B directory are same as some files in A directory, you can use this python script to find identical files.

find_identical.py:

#!/usr/bin/env python
import os
import sys
import commands
#import string

if len(sys.argv) != 3:
    print """not enough arguments!
usage: find_identical.py source destination"""
    exit(0)

src=sys.argv[1]
dest=sys.argv[2]

#print "src: %s, dest: %s" % (src, dest)
print "#!/bin/sh"

srcText = commands.getoutput("md5sum %s/*" % src)
destText = commands.getoutput("md5sum %s/*" % dest)

def dictFromString(val):
    lines = val.split("\n")
    result = {}
    for line in lines:
        line = line.split()
        result[line[0]] = line[1]
    return result

def listFromString(val):
    result = val.split("\n")
    for i in range(len(result)):
        result[i] = result[i].split();
    return result;

srcDict = dictFromString(srcText)
destList = listFromString(destText)
filesFound = False
for file in destList:
    if srcDict.has_key(file[0]):
        print "rm \"%s\" #identical to %s" % (file[1], srcDict[file[0]])
        filesFound = True;
 
if not filesFound:
    print "#no identical files found"
    exit(1)

Script needs two arguments - name of first and second directory. By default script prints (to standard output) shell script that would remove files from B directory which present in A directory.
So, to remove files from B which already present in A, you'll need to run:

find_identical.py A B >applychanges.sh
bash applychanges.sh

This done this way so you can examine list of files which will be removed before removing them. This behavior can be easily changed by modifying line:

 print "rm \"%s\" #identical to %s" % (file[1], srcDict[file[0]])

to something you need.

Script needs md5sum to work.
Notice, that there is already linux software available that does same thing (finds identical files). So this script is mostly useful for learning python, or as base for making another script.

C++ utf8 to wstring conversion routine

Some time ago I was looking for C++ conversion routine that could convert utf8 string(stored in std::string) into std::wstring and vice versa. It is certainly possible to do that using setlocale and some C functions, but I wanted something done in "pure C++". I did some research, didn't find anything, and after all wrote conversion routine myself (using information on wikipedia).

Here is the code:

Str.h:

#ifndef STR_H
#define STR_H
#include 
#include 

typedef std::string Str;
typedef std::wstring WStr;

std::ostream& operator<<(std::ostream& f, const WStr& s); std::istream& operator>>(std::istream& f, WStr& s);
void utf8toWStr(WStr& dest, const Str& src);
void wstrToUtf8(Str& dest, const WStr& src);

#endif

Str.cpp:

/*
Copyright (c) 2009 SegFault aka "ErV" (altctrlbackspace.blogspot.com)

Redistribution and use of this source code, with or without modification, is
permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright
 notice, this list of conditions and the following disclaimer.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO
EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "Str.h"
#ifdef UTF8TEST
#include 
#endif

void utf8toWStr(WStr& dest, const Str& src){
 dest.clear();
 wchar_t w = 0;
 int bytes = 0;
 wchar_t err = L'�';
 for (size_t i = 0; i < src.size(); i++){
  unsigned char c = (unsigned char)src[i];
  if (c <= 0x7f){//first byte
   if (bytes){
    dest.push_back(err);
    bytes = 0;
   }
   dest.push_back((wchar_t)c);
  }
  else if (c <= 0xbf){//second/third/etc byte
   if (bytes){
    w = ((w << 6)|(c & 0x3f));
    bytes--;
    if (bytes == 0)
     dest.push_back(w);
   }
   else
    dest.push_back(err);
  }
  else if (c <= 0xdf){//2byte sequence start
   bytes = 1;
   w = c & 0x1f;
  }
  else if (c <= 0xef){//3byte sequence start
   bytes = 2;
   w = c & 0x0f;
  }
  else if (c <= 0xf7){//3byte sequence start
   bytes = 3;
   w = c & 0x07;
  }
  else{
   dest.push_back(err);
   bytes = 0;
  }
 }
 if (bytes)
  dest.push_back(err);
}

void wstrToUtf8(Str& dest, const WStr& src){
 dest.clear();
 for (size_t i = 0; i < src.size(); i++){
  wchar_t w = src[i];
  if (w <= 0x7f)
   dest.push_back((char)w);
  else if (w <= 0x7ff){
   dest.push_back(0xc0 | ((w >> 6)& 0x1f));
   dest.push_back(0x80| (w & 0x3f));
  }
  else if (w <= 0xffff){
   dest.push_back(0xe0 | ((w >> 12)& 0x0f));
   dest.push_back(0x80| ((w >> 6) & 0x3f));
   dest.push_back(0x80| (w & 0x3f));
  }
  else if (w <= 0x10ffff){
   dest.push_back(0xf0 | ((w >> 18)& 0x07));
   dest.push_back(0x80| ((w >> 12) & 0x3f));
   dest.push_back(0x80| ((w >> 6) & 0x3f));
   dest.push_back(0x80| (w & 0x3f));
  }
  else
   dest.push_back('?');
 }
}

Str wstrToUtf8(const WStr& str){
 Str result;
 wstrToUtf8(result, str);
 return result;
}

WStr utf8toWStr(const Str& str){
 WStr result;
 utf8toWStr(result, str);
 return result;
}

std::ostream& operator<<(std::ostream& f, const WStr& s){
 Str s1;
 wstrToUtf8(s1, s);
 f << s1;
 return f;
}

std::istream& operator>>(std::istream& f, WStr& s){
 Str s1;
 f >> s1;
 utf8toWStr(s, s1);
 return f;
}

#ifdef UTF8TEST
bool utf8test(){
 WStr w1;
 //for (wchar_t c = 1; c <= 0x10ffff; c++){
 for (wchar_t c = 0x100000; c <= 0x100002; c++){
  w1 += c; 
 }
 Str s = wstrToUtf8(w1);
 WStr w2 = utf8toWStr(s);
 bool result = true;
 if (w1.length() != w2.length()){
  printf("length differs\n");
  //std::cout << "length differs" << std::endl;
  result = false;
 }
 
 printf("w1: %S\ns: %s\nw2: %S\n", w1.c_str(), s.c_str(), w2.c_str());
 
 for (size_t i = 0; i < w1.size(); i++)
  if (w1[i] != w2[i]){
   result = false;
   printf("character at pos %x differs (expected %.8x got %.8x)\n", i, w1[i], w2[i]);
   //std::cout << "character at pos " << i  << " differs" << std::endl;
   break;
  }
  
 if (!result){
  printf("utf8 dump: \n");
  for (size_t i = 0; i < s.size(); i++)
   printf("%2x ", (unsigned char)s[i]);
 }
 
 return result;
}

int main(int argc, char** argv){
 std::wstring ws(L"фыва");
 std::string s("фыва");
 std::cout << ws << s << std::endl;
 std::cout << wstrToUtf8(utf8toWStr("фыва")) << std::endl;
 if (utf8test())
  std::cout << "utf8Test succesful" << std::endl;
 else
  std::cout << "utf8Test failed" << std::endl;
 return 0;
}
#endif

Code was successfully tested on 32bit linux system (see "utf8test()" routine) and seems to work. Should work on 32bit windows platform as well, but keep in mind that wchar_t on msvc has size of 2 bytes, so on windows platform routine won't handle unicode characters in range 0xffff..0x10ffff).

If you need routine like this, feel free to use it, just don't claim you wrote it.

Code is available under modified BSD license.

Alt+Ctrl+Backspace

2009/03/25

How to use sprintf/wsprintf with std::string/std::wstring

Explanation

Solution

Problems:

Other solutions

2009/03/07

how to find identical files on Linux using python script

C++ utf8 to wstring conversion routine

Blog Archive

About Me

Followers