Archive for July, 2008

API design lessons

Tuesday, July 29th, 2008

At work today I was investigating a bug where the system shuts down in random code when you quit. There was no information as to why it crashed, and it crashed past the point where the debugger could catch it.

After a day of binary searching and logging, I found the culprit. The API provides a very slow asynchronous function. If you stop the relevant library before the function completes, it causes the shutdown crash noted above. It also causes the library to fail to reload after you shut it down. I actually was waiting for the asynchronous function, but my code to prevent blocking forever wasn’t waiting long enough:

DoAsynchFunction();
while (IsCompleted() && ElapsedTime() < 500)
Block();

This is like the 3rd time I’ve run into similar issues with this API. These kinds of design flaws (originally with DirectPlay) are what inspired me to write RakNet – a library that exposes functionality at the level users care about, rather than at the level the computer cares about.

Lessons that could be learned here

  1. Libraries need to clean up after themselves. If an asynchronous function is still processing, memory is still allocated, a dialog is visible, or whatever, just clean it up. It’s not acceptable to just crash instead, especially not at some random pointer later on.
  2. If an asynch function is going to take a longer than one would expect for what it does, it needs to be documented and the timeout controllable.
  3. If a library fails to load or reload, the error code needs to be specific and indicate how to solve the problem. This is about the 5th time with this API that I’ve gotten error messages such as “BUSY” or “INVALID STATE”
  4. If a function or library is going to fail, it should fail consistently. Failing half the time depending on irrelevant factors such as calls to other functions makes it very difficult to find the root of the problem.
  5. Functions should take care of their own dependencies when possible.

Examples of what RakNet does

  1. If you call Shutdown() at the same time you have an asynchronous send, disconnect, or connect pending, RakNet will block for the amount of time you specify, and then shut down, cleaning up these asynchronous functions for you. If you call CloseConnection(), you have the option to leave the connection open until pending messages are sent, and can also cancel at any time. RakNet has a specific test that calls every function at random for as long as you want, and it will run without crashing.
  2. Every asynch function in RakNet is fast – connecting takes on average 4 X your ping, disconnection takes from no time to 2 X your ping. The time to disconnect on no data sent is controllable, and you can cancel every asynch function.
  3. Most RakNet functions that can fail never do so in practice. This because RakNet will take care of dependencies for you, because it follows the principle that even if it’s 10X harder for RakNet to take care of than the user, it’s still worthwhile. Functions that can fail return specific codes – for example if you call an RPC with the wrong number of parameters, it will return this exact error, as well as the function called that caused this error.
  4. All RakNet functions that can fail do so consistently. RakNet does not make state assumptions about preconditions, which if untrue, cause inconsistent behavior
  5. If you call Connect() to a system you are already connected to, it will take over the existing connection. If you call it at the same time as the other system, both will complete. If the other system starts up later than your system, it will still complete. RakNet does everything it can to honor your request.

RakNet isn’t perfect and it would be easy to pick holes in cases where it doesn’t do these things. But I think the principle stands. It’s worthwhile for an API author to spend a week implementing a feature that would take the user 5 minutes to work around. The API author only has to spend the week once, but several hundred users may have to spend 5 minutes each.

New RPC system up

Sunday, July 27th, 2008

You can call functions on remote systems, with automatically serialized parameters, with a large range of supported types.

Video

RakNet brochures at Chinajoy

Thursday, July 24th, 2008

Housing bailout

Thursday, July 24th, 2008

Congress voted and passed a bill today in the House to bail out Freddie and Fannie May up to 800 billion dollars. So in other words, you and I are paying for greedy and/or stupid people who bought houses they couldn’t afford. And the stupid and/or greedy lenders that loaned to them. Responsible people, such as myself, who passed on insane house prices are currently living in apartments and end up paying for those that were irresponsible.

To make things worse, you’re not even bailing out the irresponsible home buyers. You’re bailing out the fat bankers that leveraged themselves that put themselves in this mess.

Why is it the goverment doesn’t step in when home prices go up 25% every year for 8 years? Yet, when the bubble bursts and prices go down 15% in one year it’s suddenly a crisis?

As usual, Ron Paul is the voice of reason.

http://www.house.gov/paul/index.shtml

boost::bind and boost::function for RPC

Friday, July 18th, 2008

Based on forum suggestions I’m looking into boost::bind and boost::function to do RPC calls rather than my current assembly code based system.

I may be misunderstanding what it does, but I believe that boost::function is a way to store function pointers (regular or object member). And boost::bind allows you to bind parameters to function pointers. Using templates, you can pass complex object types on the stack. My existing assembly implementation can only pass simple structures.

If I’m correct, I can do something like this:

void DoSomething(MyObject myObject) {…}
Call(“DoSomething”, myObject);
(On the remote system)
boost::function ptrToDoSomething;
(use boost::bind to bind a stack based instance of MyObject to DoSomething)
(Somehow call DoSomething)

I’m a bit vague on the details now but if it works it will compile out to what is essentially a native function call.

Another problem with my existing implementation of RPC is I am using memcpy for the parameter list. This won’t work with complex types that have custom serialization – even including strings. But by overloading < < and >> I can add support for this.

Normal types get a general template.

template <class templateType>
BitStream& operator<<(BitStream& out, const templateType& c)
{
out.Write(c);
return out;
}
template <class templateType>
BitStream& operator>>(BitStream& in, templateType& c)
{
bool success = in.Read(c);
assert(success);
return in;
}

The user can then overload types for a particular class, just replace tempateType with the name of the relevant class.

If this all works out I will finally have the ultimate implementation of RPC calls.

1. Works with C functions and object member functions, including those that use multiple inheritance, with an arbitrary inheritance order
2. Can pass types that use complex serialization (such as my string compression).
3. Can call functions that take complex types in the parameter list, types that cannot fit in a single register.
4. I think I can even do automatic object pointer lookups within parameter lists

In other words, the ability to call any authorized function on a remote system, using any type of parameter, including referenced pointers, with any type of data. Pretty exciting 🙂

Lobby code redesign

Wednesday, July 2nd, 2008

My current Lobby system for RakNet separated out the database calls from the C++ calls as follows:

1. Lobby operation is via a generalized operation, such as changing the status variable of a row. There is a unit test in a separate project for each lobby operation.
2. C++ operation is higher level, and uses the lobby operation in a more specific manner, such as setting your status to be a clan member.
3. Client code serializes the operation, does some local checks, and sends it to the server
4. Server code deserializes the operation, performs as many C++ checks as possible (returning error codes on failure) and performs the lobby operation in a thread
5. When the lobby operation completes, server serializes the result back to the calling client, updates internal state variables (such as creating a room), and sends notifications to other clients if needed.
6. Lastly, the client gets the serialized result, updates its own state variables (such as now in a room), and calls a callback that the user registered.

It’s a lot of work to even type all that, and every operation takes a HUGE time investment. I mean like half an hour of solid nonstop typing just for one operation, and there’s over 50 operations. That doesn’t even account for time spent debugging and documenting.

The problems I’ve been having is that there is too much code, tons of copy/paste, the system is somewhat buggy due to so much code, the same data can be duplicated as many as 3 times (database, server, and client), and it’s very hard and painful to change or write anything new. The system IS quite efficient though – many operations can be performed in C++ without ever accessing the database.

I am thinking of changing this as to use a stored procedure for each specific operation. ALL commands (login, create room, etc) are encapsulated into structures. The same structures are used for both input and output data. The structure is a functor, meaning it can be operated on in a thread. Therefore, all the client has to do is allocate a command, fill in the input details, and send it to the client interface. The client interface does nothing but calls the serialize function, which sends it to the server, which automatically adds it to a database processing thread. The thread will query and get results from a stored procedure on the database, serialize the results, and send it back to the client in the same structure. Lastly, a callback is called of the same name, and generated automatically through macros.

I think this would cut down on the total amount of code by 1/2. The lobby client and lobby server now do little more than call stored procedures. I would no longer have problems with complex database operations getting out of synch. And the system becomes scalable for free, because database operations are scalable to begin with. All I’d really need to handle on the server is network connectivity.