Archive for the ‘How HipChat Works’ Category

Matt McDaniel

(goodnews) New and improved emoticons

By Matt McDaniel | 2 weeks ago | 15 Comments

Here on the HipChat team, we take emoticons seriously. It’s really weird to spend a healthy portion of your work day discussing the finer points on tiny pictures from the internet. But when we see how happy those tiny pictures make everyone, it’s all worth it. Today, we want to tell you about a few ways we’ve improved emoticons in HipChat.

Emoticons for (allthescreens)

All of our global emoticons are now built to support hi-res displays, like Apple Retina displays and the higher end of the Android pixel density spectrum. This change resolved the much voted-on UserVoice ticket that we’ve had on our list for a while.

For every new image, we created hi-res sizes up to 4x (because you never know what the future will bring), so they’ll look good on both your fancy high-res displays and that old monitor your work won’t upgrade for you.  We’ve been rolling out these Retina emoticons over the last few weeks, and we’ll continue to tweak them every so often. Check out the big changelog below for details.

Sharing our favorites

Included in this latest release are a few of our favorite emoticons. Many, like (whoa) Keanu, were added because of your feedback. Others, like (disappear) and (salute), are ones we use so much around Atlassian that it didn’t feel right to keep them for ourselves.

  

Don’t forget that you can upload your very own custom emoticons just for your team. So next time you don’t have to wait for us to update the global set with (whoa) and can BYOK (Bring Your Own Keanu).

One size fits most

In addition to Retina-fying our global set, we improved our emoticon uploader. Now, admins can upload new, hi-res emoticons. And even better, we’ll automatically scale them for you. We suggest starting with an image around 120px (the new maximum). After the image goes through some emoticon uploader magic, you’ll end up with an effective 30px image after scaling.

We debated how to handle retina images. We chose this auto-scaling method because it’s far easier to upload new emoticons, or update an existing one, when you only have to create and maintain one size.

Protip: Keep in mind that whatever you upload is going to be scaled down 4 times, so a 1px stroke is going to be hard to see on a low-res monitor. We learned the hard way how much that can really effect line-drawing emoticons. That’s why you’ll see “beefed up stroke width” on a lot of the changelog below. It also helps if your image starts with dimensions that are multiples of 4 on both sides. For example: 116 px wide is great (116/4=29), but 115 px wide isn’t (115/4=28.75). 

And now that you can’t stand to look at your old custom emoticons on your Retina monitor, you can re-upload any of your existing custom emoticons by deleting old ones and uploading a new, larger versions. We’ll continue to refine and add to our global emoticon set.

The other “emoticons”

We also updated our set of little faces. Similar to the situation with the Retina emoticons above, we needed higher resolution images than the current assets allowed. We also wanted a new set of our own.

We started looking at emoji sets for inspiration and pulled in little touches from our HipChat brand, like using the logo smile shape as often as possible, and we ended up with a set that we really like. As with everything else, we’ll be iterating on them here and there, and someday may bring full emoji support.

Oh, and the icy blue corpse thumb thing we did is dead and buried. Dig it out if you want: (corpsethumb). We’re replacing the re-colored Apple emoji thumbs-up and thumbs-down with a new pair of icons that don’t look at all like Facebook. (badpokerface)

Check out hipchat.com/emoticons to see all the new emoticons and try them out with your team! And keep the feedback coming at help.hipchat.com.

Changelog

Changes made to global emoticons

  • (awyeah): beefed up stroke width
  • (badass): beefed up stroke width
  • (badjokeeel): added contrast and adjusted size
  • (bumble): added contrast
  • (caruso): adjusted size
  • (challengeaccepted): beefed up stroke width
  • (chewie): added contrast, brightened
  • (chucknorris): brightened up to bring out the Chuck Norris
  • (clarence): added contrast, brightened
  • (derp): beefed up stroke width
  • (dumb): enlarged to show face better
  • (facepalm): added contrast, tightened crop to improve chest to face and palm ratio
  • (fonzie): added contrast
  • (freddie): beefed up stroke width
  • (gangnamstyle): cleaned up gif
  • (gates): adjusted a little to match (jobs)
  • (goodnews): scaled up a little
  • (haveaseat): added contrast
  • (heart): updated to match smilies, adjusted crop
  • (ilied): lightened up to bring out details
  • (indeed): cropped to just face and highlighted monocle
  • (itsatrap): added contrast, brightened up to bring out detail
  • (jackie): beefed up stroke width
  • (jobs): adjusted to match (gates), added contrast and brightened
  • (joffrey): new image, fewer spoilers, more like the others (not shown above)
  • (kennypowers): added contrast and brightened
  • (krang): beefed up stroke width, reduced contrast between colors
  • (lincoln): brightened to blow out noise
  • (lolwut): added contrast and brightened, scaled up
  • (notbad): beefed up stroke width
  • (notsureif): replaced with (fry) to match
  • (philosoraptor): brightened to bring out detail
  • (present): added contrast and brightened
  • (reddit): removed white fill from antenna shape
  • (romney): added contrast and brightened
  • (sadpanda): using a sadder, more forward-facing panda
  • (samuel): added contrast and levels adjusted to stand out more on white
  • (skyrim): added contrast and brightened
  • (sweetjesus): crammed into (iseewhatyoudidthere)’s head shape to resolve weird corners
  • (taft): cropped closer to face, mustache volume increased, mustache/face contrast increased
  • (twss): added contrast and brightened
  • (wtf): enlarged to show face better
  • (yodawg): added contrast and brightened
  • (yuno): resized because he was too small
  • (zoidberg): resized because he was too small

New global emoticons

  • (awesome)
  • (aww)
  • (awwyiss)
  • (badtime)
  • (bicepleft)
  • (bicepright)
  • (borat)
  • (carl)
  • (catchemall)
  • (chef)
  • (cookie)
  • (corpsethumb)
  • (disappear)
  • (doh)
  • (donotwant)
  • (downvote)
  • (drool)
  • (evilburns)
  • (excellent)
  • (feelsbadman)
  • (feelsgoodman)
  • (finn)
  • (ftfy)
  • (giggity)
  • (goldstar)
  • (haha)
  • (huehue)
  • (hugefan)
  • (jake)
  • (meh)
  • (motherofgod)
  • (nice)
  • (noidea)
  • (notit)
  • (ohmy)
  • (paddlin)
  • (rockon)
  • (salute)
  • (sap)
  • (standup)
  • (taco)
  • (tayne)
  • (thatthing)
  • (theyregreat)
  • (toodamnhigh)
  • (unacceptable)
  • (upvote)
  • (waiting)
  • (whoa)
  • (yeah)
  • (youdontsay)
zuhaib

Elasticsearch at HipChat: 10x faster queries

By zuhaib | 6 months ago | 6 Comments

Last fall we discussed our journey to 1 billion chat messages stored and how we used Elasticsearch to get there. By April we’d already surpassed 2 billion messages and our growth rate only continues to increase. Unfortunately all this growth has highlighted flaws in our initial Elasticsearch setup.

When we first migrated to Elasticsearch we were under time pressure from a dying CouchDB architecture and did not have the time to evaluate as many design options as we would have liked. In the end we chose a model that was easy to roll out but did not have great performance. In the graph below you can see that requests to load uncached history could take many seconds:

Average response times between 500ms-1000ms with spikes as high as 6000ms!

Identifying our problem

Obviously taking this long to fetch data is not acceptable, so we started investigating.

Hipchat Y U SO SLOW

What we found was a simple problem that had been compounded by the sheer data size we were now working with. With CouchDB we had stored our datetime field as a string and built views around it to do efficient range queries; something it did very well and with little memory usage.

So why did this cause such a performance problem for Elasticsearch?

Well, an old and incorrect design decision resulted in us storing datetime values in a way that was close to ISO 8601, but not entirely the same. This custom format posed no problem for CouchDB as it treated it as any other sortable string.

On the other hand, Elasticsearch keeps as much of your data in memory as possible, including the field you sort by. Since we were using these long datetime strings it needed much memory to store them: up to 18GB across our 16 nodes.

In addition, all of our in app history queries use a range filter so we can request history between two datetimes. For Elasticsearch to answer this query it had to load all the datetime fields from disk to memory for the query, compute the range, and then throw away the data it didn’t need.

As you can imagine, this resulted in high disk usage and cpu wait i/o;

But as we mentioned earlier, Elasticsearch stores this datetime field in memory, so why can’t it use that data (known as field data) instead of going to disk? It turns out that it can, but only if you are using a numeric range for your index, and we were using these custom datetime strings.

Kick off the reindexing!

Once we identified this problem we tweaked our index mapping so it would store our datetime field as a datetime type (with our custom format) so all new data would get stored correctly. We leveraged Elasticsearch’s ability to store a multi-field which meant we were able to keep our old string datetimes around for backwards compatibility. But what about the old data? Since Elasticsearch does not support mapping a change onto an old index, we’d need to reindex all of our old data to a new set of indices and create aliases for them. And since our cluster was under so much IO load during normal usage we needed to do this reindexing on nights and weekends when resources were available. There were around 100 indices to rebuild and the larger ones took up 12+ hours.

Elasticsearch helped this process by providing helper methods in their client library to assist in our reindexing. We also built a custom script around their Python client to automate the process and ensure we caused no downtime or lost data. We hope to share this script in the future.

The fruits of our labor

Once we finished reindexing we switched our query to use numeric_ranges and the results were well worth the work:

Going from 1-5s to sub-200ms queries (and data transfer)

So the big takeaway from this experience for us was that while Elasticsearch dynamic mapping is great for getting you started quickly, it can handcuff you as you as you scale.  All of our new projects with Elasticsearch use explicit mapping templates so we know our data structure and can write queries that take advantage of them. We expect to see far more consistent and predictable performance as we race towards 10 billion messages stored.

We’d love be able to make another order of magnitude performance improvement to our Elasticsearch setup and ditch our intermediate Redis cache entirely. Sound fun to you too? We’re hiring! https://www.hipchat.com/jobs

zuhaib

HipChat + Elasticsearch guest list expanded

By zuhaib | 1 year ago | 1 Comment

You want it – You got it

75 more slots to attend SF Elasticsearch’s Meetup on November 18th

Capacity reached! But we will be recording the talk and sharing it with the Elasticsearch community.

At HipChat, we’re big fans of Elasticsearch. It’s helped us scale our infrastructure. The title of this talk will be “Heavy Lifting: How HipChat Scaled to 1 Billion Messages.”

Originally, we thought 75 people might register for the talk. But, the Bay Area Elasticsearch community is bigger and more passionate than we anticipated. So we’re doubling our guest count to 150 people. RSVP now to save your spot!

Note: We also changed the time of the Meetup. The main talk will begin at 6:30pm. Not 7pm.

What you’ll learn

One of the keys to our success has been building a scalable backend. Elasticsearch has played a big part in this.

We plan to talk about how we scaled to sending over 1 Billion messages and how Elasicsearch allows us to index and make in near-realtime search possible for all 1 billion messages. We will also discuss our future with Elasticsearch — using it for more than just search and logs. We’ll share some tips and things we learned (and are still learning) about our transition to Elasticsearch.

Why you should attend

  • Free pizza, beer and sodas
  • A chance to talk with our engineering team, including HipChat Founders
  • You’ll get some of HipChat’s popular meme stickers
  • Chance to win limited-edition HipChat t-shirts
  • Learn something cool

Did we mention we’re hiring?

Full disclosure: we know the Elasticsearch community is packed with incredible engineering talent. We’d love to talk with you about current and future opportunities to build the best damn group chat application for teams. We’ll have one of our talent coordinators on-site in case you have questions about the company, our values or the hiring process.

Can’t make it? You can always submit a resume to jobs@hipchat.com. We (heart) smart people.

zuhaib

How HipChat scales to 1 Billion Messages

By zuhaib | 1 year ago | 11 Comments

When Atlassian acquired HipChat, we had sent about 110 million messages. Today that number has grown tenfold, and it’s still growing at a record pace. Scaling to meet these demands has not been easy but the HipChat Ops team is up to the task. We thought it’d be cool to shine some light on what it took, infrastructure wise, for those who are curious about this kind of stuff. In this post, we’ll highlight how we use CouchDB, ElasticSearch, and Redis to handle our load and make sure we provide as reliable a service for our users as possible.

Road to 1 billion messages

Getting off the Couch to scale chat history and search

Originally HipChat had a single m2.4xlarge EC2 Instance running CouchDB as datastore for chat history and Couch-lucene for search, a fine set up for a small application. However, once we started to grow, we began to hit the limits of CouchDB and AWS instance size, and we’d be out of memory daily. We kicked off a project to look at other data stores and indexers to solve this problem, and we concluded that the first step involved upgrading our search indexer. So we kicked Lucene to the curb in favor of Elasticsearch.

Heeding the advice of the Loggly team, we set up 7 Elasticsearch index servers and 3 dedicated master nodes to help prevent split brain. Elasticsearch lets us add more nodes to our cluster when we need more capacity, so we can handle extra load while concurrently serving requests. Moreover, the ability to have our shards replicated across the cluster means if we ever lose an instance, we can still continue serving requests, reducing the amount of time HipChat Search is offline.

For chat history, we still use CouchDB as our datastore, but we are beginning to hit limits with AWS trying to fit everything into a single instance. Just prior to hitting a billion messages, we noticed that during compaction, our EBS volume storing our CouchDB files was running out of disk space. AWS limits EBS volumes to 1TB, so as a stop gap solution, we decided to try out EBS Raid. We at HipChat don’t believe in one-off solutions, so we used a slightly hacked version of AWS using the Opscode Chef cookbook to automate the process of creating, mounting, and formatting our RAID arrays. Our hack can even rebuild the RAID using EBS Snapshots. True webscale stuff.

Currently, we pull data from couchDB using a custom ruby import script, but since Elasticsearch has treated us so well, we are looking to replace CouchDB with just Elasticsearch. If you want to hear more about this, we plan on giving a talk about Elasticsearch at a meetup here at Atlassian.

Caching in on Redis

We at HipChat use Redis a lot, caching everything from XMPP session info to up to 2 weeks of chat history. Originally we started with two Redis servers, one caching stats and the other caching everything else, but we soon realized that we’d need more help. Today, we shard our data over 3 Redis servers, with each server having its own slave. We continue to dedicate one of these servers to hosting our stats, while leaving the other two to cache everything else.

However, even with these changes, we found that we had to upgrade our Redis history instance size as we were running out of memory close to our billion message milestone. We will continue to improve the scalability in this area of the HipChat architecture, so we can handle load and ween off our dependence on Redis clustering to mitigate single points of failure.

Future

This is just a highlight of some parts of the HipChat infrastructure we needed to tweak to help us reach 1 billion messages. We still have a long ways to go to scale HipChat for our growing enterprise needs – improving our Redis architecture for example. A more robust system, increasing performance of our code, and mitigating or removing Single Points of Failure are large objectives that our Ops team look forward to tackling in the coming months.

If you want to learn more or think you can help us scale HipChat better, I suggest you come by our meetup. If you can’t make it to that, feel free to submit your resume here. Our team is growing fast, and we would love to have you on our team.

zuhaib

HipChat search now powered by Elasticsearch

By zuhaib | 1 year ago | 8 Comments

We recently announced new search improvements in HipChat that support advanced searching of chat histories. At the same time, to support the scale at which HipChat is growing we needed to rethink our search architecture. The result? We switched to Elasticsearch.

Previous Setup

Originally, HipChat search was powered by a single AWS instance running couchdb-lucene. This was acceptable in the early days. But we had the issue of a single point of failure for our search system.

As HipChat grew, we needed a bigger and bigger AWS instance – to the point that we were using the 2nd largest memory instance AWS had. Even then we  experienced periods of search outages preventing our users from searching all because we have a single instance with no redundancy.

Say Hello to Elasticsearch

We determined our previous setup was not sustainable so we kicked off a project to find a new search engine. After kicking the tires on a few solutions we landed on Elasticsearch.

Why Elasticsearch?

  • It is built on top of apache lucene so it is familiar to us
  • It supports distributed nodes allowing us to run multiple nodes on different AWS availability zones
  • It supports robust plugins, including one allowing AWS node discovery
  • We could roll it out with as little impact to users as possible

You can read more about all the great Elasticsearch features here.

Deploying Elasticsearch at HipChat

Search error percentage before and after Elasticsearch

Rolling out Elasticsearch at HipChat with as little impact to users was a key goals of ours. During the evaluation phase we wrote a script that duplicated our search queries in production and ran them against the search engines we were testing, logging any differences in the response and logging response times to statd/graphite. This allowed us to figure our which service could handle the load we generated.

Working with the Elasticserch consulting team we determined that the standard couchdb-river would not work for us so we built a custom ruby importer to support the type of performance we needed. We hope someday to open source this script but currently its very tailored to our needs.

At HipChat, we leverage feature toggling for many of our features so when we roll out something new we can enable it for only certain groups. This allows us to test at scale without causing disruption to all of our customers. We used this feature to roll out Elasticsearch slowly to a small sub-set of customers (Thanks to all the customers who helped us Beta test Elasticsearch!). Once we got comfortable with Elasticsearch and saw it was beating out couchdb-lucene we decided to roll it out to all of our customers with minimal impact on end users.

The Ops team at HipChat is still working on other ways to scale HipChat to make it as reliable as possible!

Happy Searching all!