Archive for the ‘Code’ Category

HipChat and the little connection that could

Three weeks ago, we introduced HipChat’s brand new, badass web client. It’s fast, beautiful and built to change how people connect. Needless to say, we’re incredibly proud of it. But, as much as we wanted a perfect launch, we weren’t so lucky: if you tried to use the client in the first week or two, you might have noticed a few hiccups.

Sorry about that.

In the spirit of Open Company – No Bullshit we wanted keep our users informed about the recent outages and what we did to fix the issue. Some of those outages degraded other areas of HipChat, like slowing our main website and message delivery. We’ve made moves to strengthen our web client’s stability so these issues never happen again.

How connecting to the HipChat web client works, at 10,000 feet

  1. You log into www.hipchat.com, creating a session with HipChat’s web layer.
  2. After logging in, you click Launch the web app which, in the web layer, creates a session with our BOSH Server.
  3. Once connected, our BOSH server in turn creates a session with our XMPP server.

In this chain, our BOSH Server is the weakest link. It wasn’t standing up to the popularity of the new client. And unfortunately, it’s coupled to our main web tier in a really bad way.

As our BOSH server came under pressure, it triggered a large number of sessions to reconnect. This, coupled with other issues, would cause hipchat.com to degrade. This is what happened the last week of March.

The little connection that could

With the new web client, the goal was to improve client reconnection, allowing HipChat to maintain resiliency toward network changes, roaming, outages, etc.

Previously, HipChat’s web client attempted reconnection every 10 – 30 seconds following a disconnection. This time around, we wanted a better experience: reconnecting as “automatically” as possible, hoping users never noticed a thing.

To do this, we decreased the connection retry from 10-30 seconds, down to 2 seconds. This drastically shortened time, combined with a surge of new users, strained our system. When we re-wrote the hipchat-js-client, we tried to ensure our users we had reasonable polling rates with exponential back-off and eventual timeout.

Here’s what the new reconnect model looked like:

webclient reconnect

The initial reconnection attempts were too aggressive for the amount of traffic we saw. So, our first action was to quickly update the back-off rate and initial poll time to be more reasonable.

The problem with exponential back-off

As always, things get complicated when we consider this at scale (webscale). Let’s say a large number of clients become disconnected at once due to a BOSH node failure. With our current reconnection model, we saw the following traffic pattern:

backoff_expo_ts

(Above example from AWS Blog, not actually pulled from HipChat, but you get the idea.)

Well, that’s not that much more awesome.

We’ve effectively just bunched all the reconnection requests into a series of incredibly high-load windows where all of the clients compete with each other. What we really want is more randomness. We implemented a heavily jittered algorithm design.  This gives us the benefit of having the least number of competing clients, and encourages the clients to back off over time.

waitTime = min(MAX_WAIT, random_integer_between(MIN_WAIT, lastComputedWaitTime * BACKOFF_RATE))

backoff_fj_ts

(Again, this example from AWS Blog. They have prettier graphs.)

This model has had a huge impact, and made the service much more resilient.

Untangling the Gordian knot

As mentioned, our BOSH server and our web tier are unfortunately coupled. Currently, it’s the web tier’s job to attach a pre-authed BOSH session to new clients. We do a lot of nginx hackery to ensure that your web session and your BOSH session are live, and are routed to the same box. This means anytime a web client reconnects, it hammers on its corresponding web box making both unstable.  This also makes scaling our BOSH server really tricky. And worse, it prevents service isolation since we shared a lot of resources between our web site and HipChat’s web client.

As of March 26th, we’ve deployed changes that allow our web sessions and BOSH sessions to be uncoupled. In fact, all of our new web client users are already using this new auth method. This means we can scale our main website and our web client independently. We’ve already set up isolated worker pools for each. Together, these changes should ensure a misbehaving web client doesn’t cause a dead hipchat.com.

Double the trouble, double the fun

Since we knew session acquisition was our biggest pain point, we combed through our connection code, looking for ways to make it less expensive. We noticed that it was double-hitting Redis in some cases. A fix was quickly deployed, and the results?

double-query-redis

They speak for themselves.

How’s it looking?

Since we made these changes, distribution of load on our system has been much improved. In the graphs below, the white lines show the start of Friday 3/27.

last 4

Four days of traffic prior to change (Tue – Fri)


last 14

Preceding two weeks of traffic (Mon – Fri, Mon – Fri), notice/compare Fridays (end user platform use level is approximately the same).

Many thanks

We’ve got a long list of stability and performance fixes in the pipeline to keep up with amazing growth in demand for HipChat. Thanks for your patience and support. (heart) (hipchat).

Announcing the new HipChat Web beta!

5 months ago Josh Devenny 11 Comments

We launched the new HipChat Mac app, then followed that with the new HipChat iOS app. And now, it’s the web’s turn!

We’re proud to introduce the new HipChat Web beta. It’s a brand new client, re-built from the ground up to be much faster. It’s beautiful, speedy and packed with even more features than before.

HipchatWebBeta

A redesign

The Atlassian Design Guidelines are the guidelines (naturally) we follow when designing Atlassian products, services and add-ons. The emphasis is on a lighter, cleaner design that focuses on conversations and files you share with your team.

We followed the spirit of these guidelines in the new Mac app, have done the same in the iOS app and now the web app. It will definitely feel familiar if you currently use existing Atlassian products.

A revamp

We haven’t just given it a coat of paint though, we’ve added a few more features that will make you smile! Some of the things we’ve added:

  • A new right hand sidebar, which breaks out People, Links and Files – this gives you easy access to everything that has been shared in the chat.
  • A new left hand sidebar, which splits your Room chats from your 1-1 chats
  • A new header in rooms, which shows you the title and topic, as well as a new header in 1-1 chats, which shows you timezone and personal details.
  • Emoticon autocomplete – yes, we complete you. Just type ( and we will take it from there.
  • Beautiful file previews – on upload, in the chat view, and in the side bar. See your files in full detail.
  • Invite your team – we added a link in the header which allows you to invite email addresses, or mailing lists – making it easier than ever before to add your team to HipChat.
  • YouTube videos play inline – just press play.

A rebuild

It’s more than just a redesign and a revamp – the new HipChat Web is a brand new application. We set out to build the fastest, more reliable web app we could, and that took a new set of tools. We built the new HipChat Web in React.js, which has proven to be a fantastic experience. You can read about the rebuild in much more detail.

Try it out!

We shipped the opt-in beta to a subset of our users a few weeks ago and they have provided us with loads of feedback!

We’ve been working hard to make sure that the new web client is awesome – there are a still few bugs that we need to fix, and a few features we need to add, but we think it’s ready for everyone to use. So today we’re turning the opt-in dialog on to all of our users. You can also click the button below to be opted in.

A note on some things that are still missing:

  • Video and voice call support – you’ll have to use a native client or the old web client for calls, for the time being.
  • IE support – it will support IE10+, however there are lots of IE bugs, so it’s best if you use Chrome or Firefox for now.
  • Some slash commands – /code, /me and /quote are the only supported commands right now.

We’re working hard to make the new client better than ever, and will be switching it on to 100% of users in the near future. Until then, you will be able to opt-out. Remember that you can click the “Feedback” button to let us know what you think.

Try the beta!

HipChat has achieved the ultimate in awesomeness. We give you: The Caffeinator.

Coffee is the center of the tech geek universe; the bean is our sun to which we gravitate.

 

With HipChat’s explosive growth, what used to be an easy coffee run became a trek around the building with folks handing over cards, cash and various forms of wampum. Couple that with people wanting their coffee black, with skim milk, almond milk, a Slim Jim hanging off the top – the ordering process got hectic.

 

This is coffee, not a Bloody Mary!

We started with a hand written list. Soon after realized, “wait, we’re a tech company. Let’s solve this.”

Next, was a list application. It worked well enough, but there were issues with getting folks on board. The app was static and without any back end or data persistence – it was just people and coffees.

So, the list application was 86ed.

Someone needed to up the ante: someone needed to create…a HipChat Bot.

Tired of lists, Leo Balan created the Caffeinator, a HipChat Connect add-on. Over a weekend, Leo cobbled some time together during his daughter’s naps. His desire for easy coffee ordering was that strong.

Leo assumed trying to write a HipChat Bot, test it, deploy it and into the Product Growth room would be hard. Turns out, it was pretty easy.

Coffee was on its way

Following a comprehensive and easy to follow guide, Leo created The Caffeinator. That informative guide became Leo’s bible.

Leo set up his dev environment and the results looked promising. He was pretty sure he pulled it off. The next step was creating a HipChat account with a test room.

The basic breakdown is pretty simple:

Create a “coffee run group”, and then add everyone you want to include. People will input their default coffee order.

Every morning The Caffeinator will reset, and all members have to do is opt-in. Your default order will be saved – a simple command will earn you your daily brew.

 

Alphabetical by last name, you can see the next three people whose turn it is to buy. Once someone has bought, they can either “/done” or someone can do it for them. The buying order automatically updates.

 

So now, coffee orders around the HipChat offices are pretty easy.

I need this for my team. Can I use it?

The Caffeinator came together quickly, so there’s room for improvement. We’re sharing the code with our users because we want to challenge our community. What’s the next for The Caffeinator?

HipChat Connect add on’s are brimming with untapped potential; if we’re creating applications that can tackle an office coffee run, what’s the next innovation of tomorrow?

Get involved, check out the code, make improvements, offer suggestions.

Coffee today, Mars tomorrow?

 

Bring your code into the conversation with the HipChat Bitbucket integration

9 months ago Jeff Park 7 Comments

If you’re a Bitbucket user, we’ve got some great news for you. We’ve worked with the Bitbucket team to create a new integration to provide more useful information about your repositories within your HipChat rooms. 

The new integration includes notifications for the following actions:

  • Commits – when a new commit is pushed, or commented on
  • Pull requests – when a pull request is created, commented on, merged, or declined
  • Issues – when an issue is created, updated, or commented on

By piping notifications into your chat room, you can hold a conversation about latest commits, pull requests, and issues with your developers right away. HipChat lets you host lightweight code reviews right within your chat room and ship code faster.

Bitbucket notifications in HipChat

This integration is pre-installed on all HipChat groups. You can find instructions on how to configure the new integration in our documentation.

If you’re new to Bitbucket, be sure to check it out – unlimited private Git or Mercurial repositories. Host your code in the cloud, for free.

Embedding HipChat

2 years ago Don Brown 6 Comments

Ever wanted to embed a little chat panel into your page? Well, now you can with the jQuery HipChat Plugin:

  1. Write this:
    $(function() {
      $('.any-div-class').hipChatPanel({
        url: "YOUR_GUEST_ACCESS_URL",
        timezone: "PST"
      });
    });
  2. To embed this:

Digging Deeper

There are actually several things going on here.

First, we added a “minimal mode”, which renders the web client without chrome whether you are a guest or normal user. Second, we added an “anonymous mode” that allows a guest to instantly connect to a room with minimal hassle. For more information on these and other new parameters, see the embedding HipChat knowledge base article.

Under the covers, the jQuery plugin is basically creating an iframe with both minimal and anonymous mode enabled. By default, the plugin will stick a “Chat” button on an area of the page. When clicked, the button will open up a new anonymous chat session. However, if you don’t use jQuery or want full control, you can use the new query parameters in the guest access URL.

This effort is but one of several new features we’ll be announcing soon to make integrating and even building on HipChat easy and accessible. If you are a developer and have ideas on how the HipChat integration experience could be improved, please don’t hesitate to let us know in the comments or, better yet, the suggestions forum!

Performance Tuning iOS – Making Mobile Fast

3 years ago Chris Rivers 0 Comments

Disclaimer: The following post is intended for a somewhat technical audience. If you come across any unfamiliar words or phrases, please refer to your local software developer.

The problem

Our iOS app has suffered from pretty abysmal performance issues for any groups with more than a couple dozen users and rooms. Since joining Atlassian, we’ve experienced the issue first hand even more acutely (the Atlassian HipChat group has hundreds of users and rooms). We had to do something about it.

The analysis

Before any good performance update comes the part where you close your eyes and get the raw numbers about how slow your current app is. Some of our findings:

  1. Avoid blocking HTTP requests. Before we could even try to connect to our chat servers, we had to hit an HTTP API to get user and server information. This request alone would take ~1.5 seconds to complete on WiFi and a whopping 3s on average for a 3G connection.
  2. Minimize round trips to the server. XMPP has a well-defined spec for securing the chat connection. Unfortunately, it involves no less than 3 back-and-forths between the client and server. Add authentication and session setup onto that and you end up with a good 6 back and forths.
  3. Avoid updating the UI when getting lots of data. Apart from basic network latency, the other thing that makes the app slow is doing the actual work of reading XMPP and setting up all the stuff you actually see. This turned out to be costing a ton of resources since each time we received a user’s presence (the data that tells us whether they’re available/away/dnd, etc), we would update the Lobby (assuming you were viewing the Lobby, which was quite likely since that’s where the app opens to). Calculated out, it turned out that we were spending 12-15 seconds just handling everyone’s presence for a big group like Atlassian. Definitely unacceptable.
  4. Stringprep –  Y U SO SLOW. Most XMPP libraries use a process called “stringprep” when creating JID objects (JIDs are the unique user identifying strings used in XMPP). Stringprep ensures that your JIDs are valid and conform to the spec. Unfortunately, we found that stringprep was responsible for nearly 50% of our CPU usage on the phone (we create lots of JIDs). Fortunately, we run a closed system where we can validate JIDs beforehand. This became the easiest fix of all: remove stringprep :)
  5. iOS offers a sophisticated local storage for a reason. Before this update, we made minimal use of local storage on the phone. The problem was that we were just clearing it out completely when you reconnected. Not exactly the best use of the Core Data framework

The changes

  1. Previously on connection, something like this happened (times are for a 3G connection in San Francisco):
    1. PHONE: HTTP Request to get info about which chat server to connect to (~1-3 seconds)
    2. WEB SERVER: “Here’s the information you requested – use it to connect to SERVER”
    3. PHONE: “Hey SERVER, I’m gonna start a session” *(0.3 seconds)
    4. SERVER: “Ok – but first we gotta make sure nobody can snoop on our conversation. Let’s use something called TLS”
    5. PHONE: “Oh, ok, I know TLS! I officially request we secure using TLS” (0.3 seconds)
    6. SERVER: “You may proceed” … <at this point, the TLS handshake happens and all data sent and received is encrypted> (0.4 – 0.6 seconds)
    7. PHONE: “Phew – ok, let’s start that connection again” (0.3 seconds)
    8. SERVER: “Connection started, but first, you need to authenticate”
    9. PHONE: “Got it – here’s the username/password info my user provided” (0.3 seconds)
    10. SERVER: “Glorious success! Your credentials were correct.”
    11. PHONE: “Great, let’s really start a chat session now” (0.3 seconds)
    12. SERVER: “Sure thing – but wait, I need an identifier for this session so I can tell it apart from that connection you made from your desktop”
    13. PHONE: “Oh, ok – how about we call it the ‘iphone’ session?” (0.3 seconds)
    14. SERVER: “That is acceptable – here is your full identifying string…”
    15. PHONE: “Cool, now I officially am starting a session…” <after this we actually start getting chat data> Total time: 3.2s – 5.4s – just to start getting actual chat data from the server
  2. On a 3G connection, when each request to the server can take up to several hundred milliseconds, this is pretty terrible. So we set out to reduce the number of required requests as much as possible. This is what we came up with:
    1. <we already know the server to hit, no need to ask WEB SERVER anymore> <the SSL handshake happens as soon as we open the connection to SERVER> (0.4 seconds) PHONE:  “Hey SERVER, let’s start a session” (0.3 seconds)
    2. SERVER: “Sure thing – connection has been opened. Please choose an auth method: <includes a special HipChat-only auth>”
    3. PHONE: “Here’s the authentication data, including the identifier for my session” (0.3 seconds)
    4. SERVER: “Excellent – authentication validated. Here is your full identifying string. You may now request data” Total time: 1 second
  3. Finally, the other major app slowdown happened when we received the flood of initial presences right after logging in. These triggered a cascading update of displayed data in the Lobby. And as anyone who has done UI performance tuning before, having lots of display drawing to do can cripple a user experience. Our solution: have the server let us know when the flood of presences is done, then update the UI. The result: 80% fewer cycles spent during the connection process.

For the future…

Unfortunately, when we began writing the iOS app, it was on the cusp of the release of iOS 4 (which plagued some devices with performance problems), so we were hesitant to use any of the brand-new features. We still have work to do to move our app over to ARC. We still don’t make nearly enough use of Grand Central Dispatch (mostly because we don’t use the most current version of XMPPFramework). We also have some server improvements to make that will let us further speed up the app for really big teams. XMPP offers a spec for roster versioning which could replace the full roster refreshes we do now. This is just a first step to making the app more usable for larger teams (and part of how we’re making sure that our Native OSX App is going to be lightning fast). The update is available on the App Store today. Download it and let us know what you think!

Capistrano notifications in HipChat

Our friends at Mojo Tech recently released a simple Ruby wrapper for the HipChat API which has special support for Capistrano, a popular deployment framework. After adding a few lines to your Capistrano scripts you’ll receive room messages during deployments, rollbacks, and migrations.

To install the gem, run:

$ gem install hipchat

Then add the following to your Capistrano script:

require 'hipchat/capistrano'

set :hipchat_token, "your token"
set :hipchat_room_name, "your room"
set :hipchat_announce, false # notify users?

Pretty easy! Check out the project page for more details.

By the way, we have integrations with other services such as GitHub, Heroku, and MailChimp as well as API libraries in other languages. Let us know if you’ve done something cool with our API that you’d like to share.

XCode tips for TextMate users

Switching from TextMate to XCode to work on the upcoming HipChat iPhone app has been a little painful. TextMate has some incredibly helpful features that I feel very unproductive without. Luckily some of them can be enabled or emulated. Here’s what we’ve found:

Go to File (Command-T)

TextMate's Go to File popup - so awesome!

The Go to File popup is probably TextMate’s most beloved feature. After using it for a while it seems amazing that its not part of every IDE out there. XCode’s “Open quickly” (Command-Shift-D) doesn’t cut it because it doesn’t do partial matching on the filename.

There are two options (both paid tools) for adding this functionality to XCode:

  1. PeepOpen — $12, beta software, great potential but not actively maintained
  2. Code Pilot — $30, more mature and full of other nice features

If you want to bind either tool to Command-T you’ll need to unbind XCode’s default key binding for the “Show Fonts” dialog first. To do that, open your XCode preferences and clear out option shown here.

Key Bindings

The “Delete line” (Control-K) and “Duplicate line” (Control-Shift-D) shortcuts can be added system wide by placing the following in ~/Library/KeyBindings/PBKeyBinding.dict:

{
    "^$K" = (
        "selectLine:",
        "cut:"
    );
    "^$D" = (
        "selectLine:",
        "copy:",
        "moveToEndOfLine:",
        "insertNewline:",
        "paste:"
    );
}

Full details and other options are available in this Stack Overflow post.

Themes

TextMate's Twilight theme in XCode

Using the same theme in all your IDEs can make it a lot easier to jump between them seamlessly. Here’s a tool you can use to convert your existing TextMate theme to work in XCode: XThemes.

There’s also a downloadable version of the Twilight theme for XCode here.

Tabs

Unfortunately there’s no way to get a tab-based display in XCode, so you’ll have to get good at switching between files using your Command-T replacement or via one of the suggestions here.

We hope these tips can make you a little more productive in XCode. If you’re looking for more make sure you check out this Stack Overflow thread. Oh and keep an eye out for the HipChat iPhone app.

GitHub is making me lazy but I like it

5 years ago Garret Heaton 1 Comment

Open source projects depend on community cooperation. Successful projects have a healthy group of individuals and companies submitting code, writing documentation, and testing new features. Unfortunately it’s not always easy to contribute because different projects will use different bug trackers, version control systems, and approval processes. Package maintainers also have a hard time handling all the incoming patches in a timely manner which frustrates the contributors.

In 2005 Linus Torvalds created the Git version control system in order to solve problems he was having dealing with patches to the Linux kernel. A few years later GitHub came along with a nice web interface on top of Git, making it trivially easy to fork, patch, and contribute to projects hosted there. The standardized wiki and issue tracker features mean that many projects are setup in the same way. Once you learn how to contribute to one project on GitHub you know how to commit to all of them.

Unfortunately GitHub makes it so easy that I’ve found myself becoming lazy. It feels a lot harder to contribute to non-GitHub projects because it often requires signing up for their custom bug tracker, learning the patch process, and waiting longer before the patch is accepted. That extra friction is sometimes enough to prevent me from submitting a fix, and that’s not good for the project.

Ease of contribution is clearly an important factor for open source and other community-driven projects (just look at Wikipedia). As GitHub continues to grow, are more projects going to feel pressure to switch? I think they will, and I’m looking forward to it. Better software is good for everyone.

5 tips for running a company blog using WordPress

WordPress is an incredibly popular blogging platform for all types of blogs. It’s easy to setup and maintain, looks nice, and has thousands of well-maintained plugins to choose from (including ours). When we were setting up this blog choosing WordPress was a no-brainer. But we soon realized that running a company blog was a little different than a personal one. We wanted it to fit into our existing workflow, tools, and infrastructure and we weren’t sure if WordPress was going to get in the way. It turns out WordPress is very flexible and didn’t cause any trouble. But we still learned a lot and wanted to share some tips:

1. Store your WordPress install in a repository

You store your other code in source control, so why not your blog? Probably because the standard WordPress install instructions tells you to download a zip, install it on a standalone server, and manage everything through the web admin. This is fine for your personal site, but is not the best setup for  a company blog. Storing it in a repository will make it easier to test changes locally (see #3), share the code between multiple people, have a record of changes, and manage deployments using a tool like Capistrano.

Luckily WordPress will happily live in a repository (a git repo on GitHub, in our case). Keep in mind that most of the config changes you make in the web admin will be stored in the database, not files you can check in to your repo. We maintain a WordPress database on our production systems as well as one in our dev environment. Any changes to plugin configuration, users, page content, etc need to be made in each environment independently.

Note: You’ll probably want to add wp-content/uploads/ to your .gitignore or svn:ignore since that content is environment-specific.

2. Run it on multiple servers

Hopefully your site is already running on multiple servers behind a load balancer so that it’s more redundant. Your blog should get the same treatment! It turns out there’s only one part of WordPress that doesn’t scale horizontally – file uploads. Since they get saved to the local disk, they’ll only appear on one of the servers in your cluster. If a visitor requests the image from one of the other servers they’ll see a broken image. Our solution was to use the Amazon S3 for WordPress plugin so that all our uploads are stored on S3 instead. The media gallery features of the web admin aren’t 100% compatible with this plugin (you’ll see some broken images), but we were OK with that. Another option would be to upload all your files to another part of your site or an external service like Flickr.

Note: The S3 plugin says it’s only compatible up to WordPress 2.7 but it’s working for us on 3.0.1. There’s also a new S3 plugin that looks promising, but we haven’t tested it.

3. Test upgrades in a dev environment first

We suggest installing your blog on a local server and using it to test all WordPress core, plugin, and theme upgrades before rolling them out to your live blog. Plugins and themes are easily upgraded using the links in the WordPress admin. Just verify that things are still working after the upgrade, check in the updated files, and release.

WordPress core upgrades are a little more complicated, or so we thought. We knew that these upgrades often modify the database during the upgrade process and we weren’t sure how we’d run those upgrades on our production systems. It turns out that WordPress is smart about not making breaking changes to the database so we’re able to follow the same process we do for plugins and themes. We just deploy newer versions of the code (like 3.0) to our production systems running an older version of the database (like 2.9) and everything works. The first person to go to the WordPress admin UI will be prompted with a ‘Database upgrade required’ page and WordPress will take care of updating things. Very cool!

4. Make the easy scalability improvements

If you’re not already running PHP on your production systems you may not have PHP’s APC module installed. Lucky you! Just install it, restart Apache, and your blog should be loading noticeably faster. If you’re interested in the reasons behind this, check out the Wikipedia article on PHP accelerators.

Second, install a cache plugin like wp-super-cache or batcache if you’re expecting serious bursts in traffic. You don’t want to have a popular post end up on the front page of Reddit generating unnecessary load on your servers.

5. Create your own theme

We got a little lazy making our HipChat theme originally and had just edited the ‘default’ theme. It turns out that WordPress will overwrite your changes as soon as you perform a core upgrade. Of course you’re using source control, so that’s not a big deal, right? :) Instead, just read the theme docs and learn how to make a theme for real. It’s almost as simple as copying another theme’s contents into a new directory and making your changes there.

Hopefully we’ve given you some ideas of how to make your company blog more reliable and easier to maintain. Please leave a comment if you have something to add or would like us elaborate.