Entries Tagged 'scalability' ↓

The Evil Query

Here is the use case: I want to search all of my bookmarks by multiple tags with an “and” operation. That means I want to see bookmarks that are tagged with Java AND JavaEE, not just Java and not just JavaEE. I want to see bookmarks tagged with C# AND 3.5 not just C# and not just 3.5. For most of the project I’m using Hibernate and the Criteria API with a lot of success. In this case I was not able to massage Hibernate into creating the correct query and that’s fine. Hibernate has its uses and it is acceptable to write custom SQL when the need arises. I wrote a method to build up the query and execute it. Here is part of the method that builds up the query:

StringBuilder sqlString =
new StringBuilder(
“select distinct(b.id)
from bookmarks b “);
int counter = 0;
for(Tag tag : tags) {
String alias = “j” + Integer.toString(counter);
String joinFragment = ” inner join bookmarks_to_tags {alias}
on b.id = {alias}.bookmark_id
and {alias}.tag_id = ? “;
joinFragment = joinFragment.replace(“{alias}”, alias);
sqlString.append(joinFragment);
counter++;
}

Right away it looks like this query will get very expensive for large tag sets. For smaller sets the query performs in a reasonable amount of time but more than 12 tags makes the query take quite a while. This is exacerbated by the fact that this query will be executed multiple times if there is a sufficiently large result set (default is over 25) to give the user a paginated view. This totally flies in the face of being nice to the database. Hammering it with a complex query over and over won’t do the users any favors.

What are my options here? I can work on making this query less complex (something that I will certainly do), but there is still the problem that, no matter how nice this query becomes, the query will be executed over and over as a user pages through the results. The complexity of the query and its repeated execution makes the result set a good candidate for caching. Caching the result set in memory with a key tied to the user or their session will allow the expensive query to be executed once with the bookmarks stored in a cache. When the user pages through the results the cache will be consulted first where the result set will be stored. I’m in the process of selecting a caching package for this project so I will post my selection and integration process soon.

Until then, I’d be happy to hear any suggestions on making this query a little nicer while preserving its “andness”.

Spring Web MVC and the benefits of pagination

Lately I’ve been working on a project that keeps my web bookmarks sync’d across multiple computers and different web browsers. The project is implemented in Java using JBoss 4.2.2, MySQL 5.0 and Spring 2.0. It’s not a very big project right now, only about 40 class files and some small number of configuration files, but the amount of data the application deals with can get fairly large.
Continue reading →

Boston Scalability User Group

I’ve been digging around lately for a Boston area user group dedicated to architectural scalability and I haven’t been able to find one. Other user groups that I regularly attend have meetings centered around scalability once in a while, but I’m looking for something with a schedule dedicated to the topic. It’s a hot topic in the industry right now with a lot happening on different fronts and there should be some ongoing professional discussion dedicated what goes into growing an application.

Here are some of the topics I want to talk about:

  • Data growth and access - What kind of DBMS topologies allow maximum scalability without breaking the budget? What if the budget wasn’t a problem? How do you migrate your data model from hundreds of users to millions of users? Should you partition user data across shards or keep it in a central database? How do you profile data access paterns? Is your application mostly-read or mostly-write? When should you use an LDAP directory for data storage? When should you use MySQL and when should you use Oracle?
  • Application server scaling - app server clustering, web server integration, load balancing, application session management, data caching
  • Web Tier - load balancing reverse proxies, data caching, working with HTTP, considerations for exposing functionality via REST
  • Language conisderations and platform choices - Dynamic languages vs. Static languages, Linux vs. Windows, Solaris vs. Linux, RedHat vs. Oracle, IBM vs. Oracle - How do you make these choices? What evaluation critera are most important? Which ones are misleading?
  • Framework scalability - How can you scale if your framework can’t? Does Rails really scale better than Spring Web MVC? Where do PHP frameworks fit in? What are the alternatives? This is where we will investigate what you gain and lose by binding yourself to a particular framework, how to keep the coupling to a minimum and how to use your framework as a solid foundation instead viewing it as a cage.
  • Emerging technology - Should you become familiar with Map Reduce and Hadoop? What kind of impact do object databases have on your application? Should you buy your own Sun or Dell boxes or use Amazon’s EC2 and S3?
  • Application architecture - How do you write an application that scales? What does your application look like as it grows from servicing hundreds to millions of users? How does it handle session management? How does it access datastores? Are there any design patterns that are helpful? What are the anti-patterns to be aware of?

Like I said, I haven’t found a group dedicated to discussing these topics. If you know of one in the Boston area please let me know. Assuming there isn’t one I am willing to start one. I’d like to start off small and meet at coffee shops around Burlington or Lowell. I have no delusions that this is going to start off or even become as big as NEJUG is now. If it starts off as a few people getting together to talk about scalability trends, cool caching solutions and specific products then I’d call it a good start.

The meeting location is still TBD and will be based on how many people are interested in attending. It will probably be somewhere in Burlington, MA. There is no planned presentation at this time. Instead we will have a meet and greet and then discuss scalability trends and news, what approaches to scalability are commonly used now and what are the plans for the future.

If you are interested in coming or in finding out more information please email me at <my first name>@<my domain name>.<my tld> or leave a comment below. Please make sure to include your email address when you fill out the form so that I can get in touch with you - your email address will not be displayed on my site.