I had a few scary moments the other day – I went to ls ~/Desktop and received a permission denied error!!!!
Ultimately, this wasn’t a “real” problem, just that my terminal program had lost permission to access that directory. But before I figured that out, I went down the rabbit hole of weird Apple file system attributes. (I always forget that the command is xattr and not chattr.)
Things got really confusing when I noticed that the com.apple.macl permission on ~/Desktop had a value of “-1”! And that wasn’t changeable from the terminal! Not even via sudo. Shades of selinux!
Eventually, the rabbit warren led me to a blog post, which quoted a hacker news article about a side effect of pasting a Finder link into a terminal window. Ta da! Copy/paste and the problem was solved. (Of course, I have no idea how the corruption happened, which is a different issue.)
]]>I love using WSL – most of my daily work is done there. Almost all the rest done with cloud based tools, so the only thing I need to backup is WSL.
The problem is, my company’s backup software of choice will only handle “real” windows files. It gets quite unhappy if you ask it to backup the WSL virtual drive.
My solution: bup. While not the “latest hotness”, it was trivial to install and run. I ended up writing a wrapper script to add a “--backup” option, and default my destination.
My approach:
#!/usr/bin/env bash
# wrap bup with my default location
# support my default usage
# use the WSL location of the Windows directory
export BUP_DIR="${BUP_DIR:-/c/Users/hwine/bup}"
real_bup=$(type -ap bup | tail -n +2 | head -1)
if $do_backup; then
time "${real_bup}" index "${HOME}"
time "${real_bup}" save -n "${HOME##*/}" "${HOME}"
else
"${real_bup}" "$@"
fi
Note that the “bup index” operation is the long pole on any backup. After a typical day’s work, the index takes about 5 minutes, and the actual backup is less than 10 seconds.
]]>While there are a number of instructions about installing pre-commit globally on the web, I didn’t find one with all the extras to convince my colleagues. This is that:
]]>Recently, I’ve been on the Windows Insider fast ring, where updates come twice a week. At times, updates would not succeed on first try, but would on second. This gradually got worse and worse, and I found a workaround.
]]>Starting a new tag for various WSL (Windows Subsystem for Linux). These will likely get less relevant over time. (I am _so_ looking forward to the 2019 Fall update.)
These tips are what got me started, and consists of both WSL specific practices, but also maintaining a similar approach for working in other operating systems (Windows and MacOS being the top.
]]>I just found, and read, Clément Delafargue’s post “Why Auto Increment Is A Terrible Idea” (via @CoreRamiro). I agree that an opaque primary key is very nice and clean from an information architecture viewpoint.
However, in practice, a serial (or monotonically increasing) key can be handy to have around. I was reminded of this during a recent situation where we (app developers & ops) needed to be highly confident that a replica was consistent before performing a failover. (None of us had access to the back end to see what the DB thought the replication lag was.)
]]>I often find that I want to email around a doc I’ve put together with sphinx (I often use the *diag or graphviz extensions). Sadly, the world hasn’t embraced the obvious way of supporting this via ePub [1] readers everwhere. What I want is plain html output, with nothing fancy. There’s probably a style out there, but I just add the following target to the Makefile generated by sphinx-quickstart:
mailhtml:
$(SPHINXBUILD) -b singlehtml -D html_theme=epub $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
[1] | My favorite reader is a the Firefox extension “EPUBReader” |
PyBay held their first local Python conference this last weekend (Friday, August 19 through Sunday, August 21). What a great event! I just wanted to get down some first impressions - I hope to do more after the slides and videos are up.
]]>tl;dr: No need to panic - modern vcs-sync will continue to support the gecko-dev & gecko-projects repositories.
Today’s the day to celebrate! No more bash scripts running in screen sessions providing dvcs conversion experiences. Woot!!!
I’ll do a historical retrospective in a bit. Right now, it’s time to PARTY!!!!!
]]>tl;dr: We’ll be shutting down the Firefox mirrors on Bitbucket.
A long time ago we started an experiment to see if there was any support for developing Mozilla products on social coding sites. Well, the community-at-large has spoken, with the results many predicted:
]]>
Someone just accused me of writing Enterprise Software!!!!!
Well, the “someone” is Mahmoud Hashemi from PayPal, and I heard him on the Talk Python To Me podcast (episode 54). That whole episode is quite interesting - go listen to it.
]]>In my (apparently) continuing list of tiny hassles with PyEnv, I finally figured out how to “fix” the PyEnv notion of a virtualenv. This may apply only to my setup: my main python version is managed by homebrew.
]]>As of IPython 4, the procedure for generating kernels in venvs has changed a bit. After some research, the following works for me:
. path/to/venv/bin/activate # or whatever
pip install ipykernel
python -m ipykernel install --user \
--name myenv --display-name "Python (myenv)"
If you’re running the jupyter notebook, do a full page reload to get the new kernel name displayed in the menu.
]]>One of the challenges of maintaining a legacy system is deciding how much effort should be invested in improvements. Since modern vcs-sync is “right around the corner”, I have been avoiding looking at improvements to legacy (which is still the production version for all build farm use cases).
While adding another gaia branch, I noticed that the conversion path for active branches was both highly variable and frustratingly long. It usually took 40 minutes for a commit to an active branch to trigger a build farm build. And worse, that time could easily be 60 minutes if the stars didn’t align properly. (Actually, that’s the conversion time for git -> hg. There’s an additional 5-7 minutes, worst case, for b2g_bumper to generate the trigger.)
The full details are in bug 1226805, but a simple rearrangement of the jobs removed the 50% variability in the times and cut the average time by 50% as well. That’s a savings of 20-40 minutes per gaia push!
Moral: don’t take your eye off the legacy systems – there still can be some gold waiting to be found!
]]>I was fortunate enough to be able to attend Dev Ops Days Silicon Valley this year. One of the main talks was given by Jason Hand, and he made some great points. I wanted to highlight two of them in this post:
- Post Mortems are really learning events, so you should hold them when things go right, right? RIGHT!! (Seriously, why wouldn’t you want to spot your best ideas and repeat them?)
- Systems are hard – if you’re pushing the envelope, you’re teetering on the line between complexity and chaos. And we’re all pushing the envelope these days - either by getting fancy or getting lean.
Our industry has talked a lot about “Blameless Post Mortems”, and techniques for holding them. Well, we can call them “blameless” all we want, but if we only hold them when things go wrong, folks will get the message loud and clear.
If they are truly blameless learning events, then you would also hold them when things go right. And go meh. Radical idea? Not really - why else would sports teams study game films when they win? (This point was also made in a great Ignite by Katie Rose: GridIronOps - go read her slides.)
My $0.02 is - this would also give us a chance to celebrate success. That is something we do not do enough, and we all know the dedication and hard work it takes to not have things go sideways.
And, by the way, terminology matters during the learning event. The person who is accountable for an operation is just that: capable of giving an account of the operation. Accountability is not responsibility.
Part way through Jason’s talk, he has this awesome slide about how system complexity relates to monitoring which relates to problem resolution. Go look at slide 19 - here’s some of what I find amazing in that slide:
- It is not a straight line with a destination. Your most stable system can suddenly display inexplicable behavior due to any number of environmental reasons. And you’re back in the chaotic world with all that implies.
- Systems can progress out of chaos, but that is an uphill battle. Knowing which stage a system is in (roughly) informs the approach to problem resolution.
- Note the wording choices: “known” vs “unknowable” – for all but the “obvious” case, it will be confusing. That is a property of the system, not a matter of staff competency.
While not in his slide, Jason spoke to how each level really has different expectations. Or should have, but often the appropriate expectation is not set. Here’s how he related each level to industry terms.
- Best Practices:
The only level with enough certainty to be able to expect the “best” is the known and familiar one. This is the “obvious” one, because we’ve all done exactly this before over a long enough time period to fully characterize the system, its boundaries, and abnormal behavior.
Here, cause and effect are tightly linked. Automation (in real time) is possible.
- Good Practices:
Once we back away from such certainty, it is only realistic to have less certainty in our responses. With the increased uncertainty, the linkage of cause and effect is more tenuous.
Even if we have all the event history and logs in front of us, more analysis is needed before appropriate corrective action can be determined. Even with automation, there is a latency to the response.
- Emergent Practices:
Okay, now we are pushing the envelope. The system is complex, and we are still learning. We may not have all the data at hand, and may need to poke the system to see what parts are stuck.
Cause and effect should be related, but how will not be visible until afterwards. There is much to learn.
- Novel Practices:
- For chaotic systems, everything is new. A lot is truly unknowable because that situation has never occurred before. Many parts of the system are effectively black boxes. Thus resolution will often be a process of trying something, waiting to see the results, and responding to the new conditions.
There is so much more in that diagram I want to explore. The connecting of problem resolution behavior to complexity level feels very powerful.
<hand_waving caffeine_level=”deprived”>
My experience tells me that many of these subjective terms are highly context sensitive, and in no way absolute. Problem resolution at 0300 local with a bad case of the flu just has a way of making “obvious” systems appear quite complex or even chaotic.
By observing the behavior of someone trying to resolve a problem, you may be able to get a sense of how that person views that system at that time. If that isn’t the consensus view, then there is a gap. And gaps can be bridged with training or documentation or experience.
</hand_waving>
The Duo application is nice if you have a supported mobile device, and it’s usable even when you you have no cell connection via TOTP. However, getting Viscosity to allow both choices took some work for me.
For various reasons, I don’t want to always use the Duo application, so would like for Viscosity to alway prompt for password. (I had already saved a password - a fresh install likely would not have that issue.) That took a bit of work, and some web searches.
Disable any saved passwords for Viscosity. On a Mac, this means opening up “Keychain Access” application, searching for “Viscosity” and deleting any associated entries.
Ask Viscosity to save the “user name” field (optional). I really don’t need this, as my setup uses a certificate to identify me. So it doesn’t matter what I type in the field. But, I like hints, so I told Viscosity to save just the user name field:
defaults write com.viscosityvpn.Viscosity RememberUsername -bool true
With the above, you’ll be prompted every time. You have to put “something” in the user name field, so I chose to put “push or TOTP” to remind me of the valid values. You can put anything there, just do not check the “Remember details in my Keychain” toggle.
]]>Password Store (aka “pass”) is a very handy wrapper for dealing with pgp encrypted secrets. It greatly simplifies securely working with multiple secrets. This is still true even if you happen to keep your encrypted secrets in non-password-store managed repositories, although that setup isn’t covered in the docs. I’ll show my setup here. (See the Password Store page for usage: “pass show -c <spam>” & “pass search <eggs>” are among my favorites.)
Have gpg installed on your machine.
Install Password Store on your machine. There are OS specific instructions. Be sure to enable tab completion for your shell!
Setup a local password store. Scroll down in the usage section to “Setting it up” for instructions.
Clone your secrets repositories to your normal location. Do not clone inside of ~/.password-store/.
Set up symlinks inside of ~/.password-store/ to directories inside your clone of the secrets repository. I did:
ln -s ~/path/to/secrets-git/passwords rePasswords
ln -s ~/path/to/secrets-git/keys reKeys
Enjoy command line search and retrieval of all your secrets. (Use the regular method for your separate secrets repository to add and update secrets.)
Rationale:
Notes:
tl;dr: You might find this gist handy if you enable HashKnownHosts
Modern ssh comes with the option to obfuscate the hosts it can connect to, by enabling the HashKnownHosts option. Modern server installs have that as a default. This is a good thing.
The obfuscation occurs by hashing the first field of the known_hosts file - this field contains the hostname,port and IP address used to connect to a host. Presumably, there is a private ssh key on the host used to make the connection, so this process makes it harder for an attacker to utilize those private keys if the server is ever compromised.
Super! Nifty! Now how do I audit those files? Some services have multiple IP addresses that serve a host, so some updates and changes are legitimate. But which ones? It’s a one way hash, so you can’t decode.
Well, if you had an unhashed copy of the file, you could match host keys and determine the host name & IP. [1] You might just have such a file on your laptop (at least I don’t hash keys locally). [2] (Or build a special file by connecting to the hosts you expect with the options “-o HashKnownHosts=no -o UserKnownHostsFile=/path/to/new_master”.)
I through together a quick python script to do the matching, and it’s at this gist. I hope it’s useful - as I find bugs, I’ll keep it updated.
Bonus Tip: https://github.com/defunkt/gist
Is a very nice way to manage gists from the command line.
Footnotes
[1] | A lie - you’ll only get the host name and IP’s that you have connected to while building your reference known_hosts file. |
[2] | I use other measures to keep my local private keys unusable. |
As much as GMail’s search syntax makes me long for PCRE, there are some unobvious gems laying around.
For example, I get tons of mail about releases. Occasionally, I need to monitor a given release, paying attention to not only the automated progress, but also human generated emails as well. Here’s my current setup:
That’s pretty standard. The productivity add is when I use the “multi-inbox” feature in the web ui. I set the top one to be just the unread ones with the special label from today:
newer_than:1d label:SPECIAL_LABEL is:unread
With positioning of “extra panels” to the right side, I get a very focussed look at any issues I need to look at!
Messages:
No Messages:
I love seeing that “(no messages)” text!
]]>Tonight I attended the San Francisco Dev Ops meetup at Vungle. The topic was one we often discuss at Mozilla - how to simplify a developer’s life. In this case, the solution they have migrated to is one based on Docker, although I guess the title already gave that away.
Long (but interesting - I’ll update with a link to the video when it becomes available) story short, they are having much more success using DevOps managed Docker containers for development than their previous setup of Virtualbox images built & maintained with Vagrant and Chef.
Sigh. That’s nice. When you come back from PTO, just re-run the script to get the latest updates - it won’t take nearly as long as only the container deltas need to come down. Presto - back to work!
A couple of other highlights – I hope to do a more detailed post later.
- They follow the ‘each container has a single purpose’ approach.
- They use “helper containers” to hold recent (production) data.
- Devs have a choice in front end development: inside the container (limited tooling) or in the local filesystem (dev’s choice of IDE, etc.). [2]
- Currently, Docker containers are only being used in development. They are looking down the road to deploying containers in production, but it’s not a major focus at this time.
Footnotes
[1] | Thanks to BFG for clarifying that docker-foo is kept in a separate repository from source code. The docker.sh script is in the main source code repository. [Updated 2015-03-11] |
[2] | More on this later. There are some definite tradeoffs. |
On Jan 29, I treated myself to a seminar on Successful Lean Teams, with an emphasis on Kanban & Kaizen techniques. I’d read about both, but found the presentation useful. Many of the other attendees were from the Health Care industry and their perspectives were very enlightening!
Hearing how successful they were in such a high risk, multi-disciplinary, bureaucratic, and highly regulated environment is inspiring. I’m inclined to believe that it would also be achievable in a simple-by-comparison low risk environment of software development. ;)
What these hospitals are using is a light weight, self managed process which:
- ensures visibility of changes to all impacted folks
- outlines the expected benefits
- includes a “trial” to ensure the change has the desired impact
- has a built in feedback system
That sounds achievable. In several of the settings, the traditional paper and bulletin board approach was used, with 4 columns labeled “New Ideas”, “To Do”, “Doing”, and “Done”. (Not a true Kanban board for several reasons, but Trello would be a reasonable visual approximation; CAB uses spreadsheets.)
Cards move left to right, and could cycle back to “New Ideas” if iteration is needed. “New Ideas” is where things start, and they transition from there (I paraphrase a lot in the following):
For me, I’m drawn to the 2nd and 3rd steps. That seems to be the change from current practice in teams I work on. We already have a gazillion bugs filed (1st step). We also can test changes in staging (4th step) and update production (5th step). Well, okay, sometimes we skip the staging run. Occasionally that *really* bites us. (Foot guns, foot guns – get your foot guns here!)
The 2nd and 3rd steps help focus on changes. And make the set of changes happening “nowish” more visible. Other stakeholders then have a small set of items to comment upon. Net result - more changes “stick” with less overall friction.
Painting with a broad brush, this Kaizen approach is essentially what the CAB process is that Mozilla IT implemented successfully. I have experienced the CAB reduce the amount of stress, surprises, and self inflicted damage amongst both inside and outside of IT. Over time, the velocity of changes has increased and backlogs have been reduced. In short, it is a “Good Thing(tm)”.
So, I’m going to see if there is a way to “right size” this process for the smaller teams I’m on now. Stay tuned….
]]>I fought this for quite a few days on a background project. I finally found the answer, and want to ensure I don’t forget it.
tl;dr:
Activate all the python versions you need before running tox.
After I upgraded my laptop to OSX 10.10, I also switched to using pyenv for installing non-system python versions. Things went well (afaict) until they didn’t. All of a sudden, I could not get both my code tests to pass, and my doc build to succeed.
The error message was especially confusing:
pyenv: python2.7: command not found
The `python2.7' command exists in these Python versions:
2.7.5
Searching the web didn’t really shed any enlightenment. I’d find other folks who had the problem. I wasn’t alone. But they all disappeared from the bug traffic over a year ago (example). And with no sign of resolution.
Finally, I tried different search terms, and landed on this post. The secret – you can have multiple pyevn instances “active”. The first listed is the one that a bare python will invoke. The others are available as python*major*.*minor* (e.g. “python3.2”) and python*major* (e.g. “python3”)
]]>This last Wednesday, I went to a meetup on ChatOps organized by SF DevOps, hosted by Geekdom (who also made recordings available), and sponsored by TrueAbility.
I had two primary goals in attending: I wanted to understand what made ChatOps special, and I wanted to see how much was applicable to my current work at Mozilla. The two presentations helped me accomplish the first. I’m still mulling over the second. (Ironically, I had to shift focus during the event to clean up a deployment-gone-wrong that was very close to one of the success stories mentioned by Dan Chuparkoff.)
My takeaway on why chatops works is that it is less about the tooling (although modern web services make it a lot easier), and more about the process. Like a number of techniques, it appears to be more successful when teams fully embrace their vision of ChatOps, and make implementation a top priority. Success is enhanced when the tooling supports the vision, and that appears to be what all the recent buzz is about – lots of new tools, examples, and lessons learned make it easier to follow the pioneers.
Heck, many teams use irc for operational coordination. There are scripts which automate steps (some workflows can be invoked from the web even). We’ve got automated configuration, logging, dashboards, and wikis – are we doing ChatOps?
Well, no, we aren’t.
What do you get for giving up all those options and flexibility? Here was the “ah ha!” concepts for me:
- Each ChatOps room is a “shared console” everyone can see and operate. No more screen sharing over video, or “refresh now” coordination!
- There is a bot which provides the “facts” about the world. One view accessible by all.
- The bot is also the primary way folks interact and modify the system. And it is consistent in usage across all commands. (The bot extensions perform the mapping to whatever the backend needs. The code adapts, not the human!)
- The bot knows all and does all:
- Where’s the documentation?
- How do I do X?
- Do X!
- What is the status of system Y?
- The bot is “fail safe” - you can’t bypass the rules. (If you code in a bypass, well, you loaded that foot gun!)
Thus everything is consistent and familiar for users, which helps during those 03:00 forays into a system you aren’t as familiar with. Nirvana ensues (remember, everyone did agree to drink the koolaid above).
The speaker selection was great – Dan was able to speak to the benefits of committing to ChatOps early in a startup’s life. James Fryman (from StackStorm) showed a path for migrating existing operations to a ChatOps model. That pretty much brackets the range, so yeah, it’s doable.
The main hurdle, imo, would be getting the agreement to a total commitment! There are some tensions in deploying such a system at a highly open operation like Mozilla: ideally chat ops is open to everyone, and business rules ensure you can’t do or see anything improper. That means the bot has (somewhere) the credentials to do some very powerful operations. (Dan hopes to get their company to the “no one uses ssh, ever” point.)
My next steps? Still thinking about it a bit – I may load Err onto my laptop and try doing all my local automation via that.
With the new developer services components, I find myself once again updating my Bugzilla Quick Search search plugin. This time, I’ll document it. :)
Here are the steps:
[edit: here’s my current file as a sample]
]]>Just a quick note to let folks know that the Developer Services team continues to make improvements on Mozilla’s Mercurial server. We’ve set up a status page to make it easier to check on current status.
As we continue to improve monitoring and status displays, you’ll always find the “latest and greatest” on this page. And we’ll keep the page updated with recent improvements to the system. We hope this page will become your first stop whenever you have questions about our Mercurial server.
]]>Chatting with Aki the other day, I realized that word of all the wonderful improvements to the try server issue have not been publicized. A lot of folks have done a lot of work to make things better - here’s a brief summary of the good news.
The biggest remaining slowdown is caused by rebuilding the cache. The cache is only invalidated if the push is interrupted. If you can avoid causing a disconnect until your push is complete, that helps everyone! So, please, no Ctrl-C during the push! The other changes should address the long wait times you used to see.
There has long been a belief that many of our hg problems, especially on try, came from the fact that we had r/w NFS mounts of the repositories across multiple machines (both hgssh servers & hgweb servers). For various historical reasons, a large part of this was due to the way pushlog was implemented.
Ben did a lot of work to get sqlite off NFS, and much of the work to synchronize the repositories without NFS has been completed.
All along, folks have been discussing our try server performance issues with the hg developers. A key confusing issue was that we saw processes “hang” for VERY long times (45 min or more) without making a system call. Kendall managed to observe an hg process in such an infinite-looking-loop-that-eventually-terminated a few times. A stack trace would show it was looking up an hg ancestor without makes system calls or library accesses. In discussions, this confused the hg team as they did not know of any reason that ancestor code should be being invoked during a push.
Thanks to lots of debugging help from glandium one evening, we found and disabled a local hook that invoked the ancestor function on every commit to try. \o/ team work!
With the ancestor-invoking-hook disabled, we still saw some longish periods of time where we couldn’t explain why pushes to try appeared hung. Granted it was a much shorter time, and always self corrected, but it was still puzzling.
A number of our old theories, such as “too many heads” were discounted by hg developers as both (a) we didn’t have that many heads, and (b) lots of heads shouldn’t be a significant issue – hg wants to support even more heads than we have on try.
Greg did a wonderful bit of sleuthing to find the impact of ^C during push. Our current belief is once the caching is fixed upstream, we’ll be in a pretty good spot. (Especially with the inclusion of some performance optimizations also possible with the new cache-fixed version.)
To take advantage of all the good stuff upstream Hg versions have, including the bug fixes we want, we’re going to be moving towards removing roadblocks to staying closer to the tip. Historically, we had some issues due to http header sizes and load balancers; ancient python or hg client versions; and similar. The client issues have been addressed, and a proper testing/staging environment is on the horizon.
There are a few competing priorities, so I’m not going to predict a completion date. But I’m positive the future is coming. I hope you have a glimpse into that as well.
One handy feature of CVS was the presence of the Attic directory. The primary purpose of the Attic directory was to simplify trunk checkouts, while providing space for both removed and added-only-on-branch files.
As a consequence of this, it was relatively easy to browse all such file names. I often would use this as my “memory” of scripts I had written for specific purposes, but were no longer needed. Often these would form the basis for a future special purpose script.
This isn’t a very commonly needed use case, but I have found myself being a bit reluctant to delete files using DVCS systems, as I wasn’t quite sure how to find things easily in the future.
Well, I finally scratched the itch – here are the tricks I’ve added to my toolkit.
A simplistic version, which just shows when file names were deleted, is to add the alias to ~/.hgrc:
[alias]
attic=log --template '{rev}:{file_dels}\n'
Very similar for git:
git config --global alias.attic 'log --diff-filter=D --summary'
(Not actually ideal, as not a one liner, but good enough for how often I use this.)
Pro tip - if you have a Fitbit or other small BLE device, go get a “bluetooth finder” app for your smartphone or tablet. NOW. No thanks needed.
I ended up spending far-too-long looking for my misplaced black fitbit One last weekend. Turned out the black fitbit was behind a black sock on a shelf in a dark closet. (Next time, I’ll get a fuchsia colored on – I don’t have too many pairs of fuchsia socks.)
After several trips through the house looking, I thought I’d turn to technology. By seeing where in the house I could still sync with my phone, I could confirm it was in the house. I tried setting alarms on the fitbit, but I couldn’t hear them go off. (Likely, the vibrations were completely muffled by the sock. Socks - I should just get rid of them.)
Then I had the bright idea of asking the interwebs for help. Surely, I couldn’t be the first person in this predicament. I was rewarded with this FAQ on the fitbit site, but I’d already followed those suggestions.
Finally, I just searched for “finding bluetooth”, and discovered the tv ads were right: there is an app for that! Since I was on my android tablet at the time, I ended up with Bluetooth Finder, and found my Fitbit within 5 minutes. (I also found a similar app for my iPhone, but I don’t find it as easy to use. Displaying the signal strength on a meter is more natural for me than watching dB numbers.)
]]>During this last RelEng workweek, I thought I’d try a new VIM plugin for reST: RIV. While that didn’t work out great (yet), it did get me to start using Vundle. Vundle is a quite nice vim plugin manager, and is easier for me to understand than Pathogen.
However, the Vundle docs didn’t cover two cases I care about:
- converting Pathogen modules for Vundle usage
- using with bundles not managed by either Pathogen or Vundle. (While Vundle running won’t interfere with unmanaged bundles, the :BundleClean command will claim they are unused and offer to delete them. That’s just too risky for me.)
The two cases appear to have the same solution:
- ensure all directories in the bundle location (typically ~/.vim/bundles/) are managed by Vundle.
- use a file:// URI for any bundle you don’t want Vundle to update.
For example, I installed the ctrlp bundle a while back, from the Bitbucket (hg) repository. (Yes, there (now?) is a github repository, but why spoil my fun.) Since the hg checkout already lived in ~/.vim/bundle, I only needed to add the following line to my vimrc file:
Bundle 'file:///~/.vim/bundle/ctrlp.vim/'
Vundle no longer offers to delete that repository when BundleClean is run.
I suspect I’ll get errors if I ever asked Vundle to update that repo, but that isn’t in my plans. I believe my major use case for Vundle will be to trial install plugins, and then BundleClean will clean things up safely.
]]>I don’t have MySQL installed globally, so need to do this dance every time I add it to a new virtualenv:
]]>
- Install the bindings in the virtual env. The package name is MySQL-python.
- Symlink libmysqlclient.18.dylib from the /usr/local/mysql/lib tree into site-packages of the virtualenv
- Add the following to the virtual env’s activate script::
- DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/path/to/venv/site-package
- optionally add /usr/local/mysql/bin to PATH as well.
[This is an experiment in publishing a doc piece by piece as blog entries. Please refer to the main page for additional context.]
Mozilla, like most operations, has the Repositories of Record (RoR) set to only allow “fast forward” updates when new code is landed. In order to fast forward merge, the tip of the destination repository (RoR) must be an ancestor of the commit being pushed from the source repository. In the discussion below, it will be useful to say if a repository is “ahead”, “behind”, or “equal” to another. These states are defined as:
- If the tip of the two repositories are the same reference, then the two repositories are said to be equal (’e’ in table below)
- Else if the tip of the upstream repository is a ancestor of the tip of the destination repository, the upstream is defined to be behind (’B’ in table below) the source repository
- Otherwise, the upstream repository is ahead (’A’ in table below) of the source repository.
Landing a change in the normal (2 repository case: RoR and lander’s repository), the process is logically (assuming no network issues):
Make sure lander’s repository is equivalent to RoR (start with equality)
Apply the changes (RoR is now “Behind” the local repository)
- Push the changes to the RoR
if the push succeeds, then stop. (equality restored)
if the push fails, simultaneous landings were being attempted, and you lost the race.
When simultaneous landings are attempted, only one will succeed, and the others will need to repeat the landing attempt. The RoR is now “Ahead” of the local repository, and the new upstream changes will need to be incorporated, logically as:
- Remove the local changes (“patch -R”, “git stash”, “hg import”, etc.).
- Pull the changes from RoR (will apply cleanly, equality restored)
- Continue from step 2 above
When an authorized committer wants to land a change set on an hg RoR from git, there are three repositories involved. These are the RoR, the git repository the lander is working in, and internal hggit used for translation. The sections below describe how this affects the normal case above.
On the happy path (no commit collisions, no network issues), the steps are identical to the normal path above. The git commands executed by the lander are set by the tool chain to perform any additional operations needed.
Occasionally, multiple people will try to land commits simultaneously, and a commit collision will occur (steps 3a, 3b, & 3c above). As long as the collision is noticed and dealt with before addition changes are committed to the git repository, the tooling will unapply the change to the internal hggit repository.
In real life, network connections fail, power outages occur, and other gremlins create the need to deal with “sad paths”. The following sections are only needed when we’re neither on the happy path nor experiencing a normal commit collision.
Because these cases cover every possible case of disaster recovery, it can appear more complex than it is. While there are multiple (6) different sad paths, only one will be in play for a given repository. And the maximum number of operations to recover is only three (3). The relationship between each pair of repositories determines the correct actions to take to restore the repositories to a known, consistent state. The static case is simply:
Simplistic Recovery State Diagram
Note
In reality, it is impractical to guarantee the RoR is static during recovery steps. That can be dealt with by applying the process described in the flowchart to restore equality and using the tables below to locate the actions.
The primary goal is to ensure correctness based on the RoR. The secondary goal is to make the interim repository as invisible as possible.
Key | RoR <-> hggit | hggit <-> git | Interpretation | Next Step to Equality |
---|---|---|---|---|
Ae | Ahead | equal | someone else landed | pull from RoR |
AA | Ahead | Ahead | someone else landed [1] | pull from RoR |
AB | Ahead | Behind | someone else landed [1] | back out local changes (3a above) |
ee | equal | equal | equal | nothing to do |
eA | equal | Ahead | someone else landed [2] | pull to git |
eB | equal | Behind | ready to land | push from git |
Be | Behind | equal | ready to land [2] | push to RoR |
BA | Behind | Ahead | prior landing not finished, lost from git [3] | corrupted setup, see note |
BB | Behind | Behind | prior landing not finished, next started [4] | back out local changes (3a above) from 2nd landing |
Table Notes
[1] (1, 2) This is the common situation of needing to update (and possibly re-merge local changes) prior to landing the change
[2] (1, 2) If the automation is working correctly, this is only a transitory stage, and no manual action is needed. IRL, stuff happens, so an explicit recovery path is needed.
[3] This “shouldn’t happen”, as it implies the git repository has been restored from a backup and the “pending landing” in the hggit repository is no longer a part of the git history. If there isn’t a clear understanding of why this occurred, client side repository setup should be considered suspect, and replaced.
[4] Lander shot themselves in the foot - they have 2 incomplete landings in progress. If they are extremely lucky, they can recover by completing the first landing (“hg push RoR” -> “eB”), and proceed from there.
The deterministic approach, which also must be used if landing of first change set fails, is to back out second landing from hggit and git, then back out first landing from hggit and git.) Then equality can be restored, and each landing redone separately.
Next Step | Active Repository | Command |
---|---|---|
pull from RoR | hggit | hg pull |
pull to git | git | git pull RoR |
push from git | git | git push RoR |
push to RoR | hggit | hg push |
Note
that if any of the above actions fail, it simply means that we’ve lost another race condition with someone else’s commit. The recovery path is simply to re-evaluate the current state and proceed as indicated (as shown in diagram 1).
Flowchart to Restore Equality
Flowchart to Restore Equality
Speaker Notes
Following are the slides I presented at the RELENG 2013 workshop on May 20th, 2013. Paragraphs formatted like this were not part of the presented slides - they are very rough speaker notes.
If you prefer, you may view a PDF version.
Issues and solutions encountered in maintaining a single code base under active development in both hg & git formats.
Mozilla Corporation operates an extensive build farm that is mostly used to build binary products installed by the end user. Mozilla has been using Mercurial repositories for this since converting from CVS in 2007. We currently use a 6 week “Rapid Release” cycle for most products.
Speaker Notes
We currently have upwards of 4,000 hosts involved in the continuous integration and testing of Mozilla products. These hosts do approximately 140 hours of work on each commit.
Firefox Operating System is a new product that ships source to be incoporated by various partners in the mobile phone industry. These partners, experienced with the Android build process, require source be delivered via git repositories. This is close to a “Continuous Release” process.
Speaker Notes
A large part of the FxOS product is code used in the browser products. That is in Mercurial and needs to be converted to git. Most new code modules for FxOS are developed on github, and need to be converted to Mercurial for use in our CI & build systems.
Speaker Notes
Ideal was to allow developers to make dvcs as personal a choice as editor.
Speaker Notes
These social coding sites, such as github and bitbucket, make it much easier for new community members to contribute.
Speaker Notes
For most use cases, the approximately 20 minute average we’re achieving is acceptable.
Speaker Notes
A common use case here is a developer wanting to start a self serve build. If the commit was to git, the self serve build won’t be successful until that commit is converted to hg.
We are continuing work on this. It is closely tied to determining which commit broke the build, when multiple repositories are involved.
Android never wants history to appear to change. Downstream servers allow only fast forward changesets and deny deletions.
Speaker Notes
Either approach is self consistent. It is when the two need to interact that challenges arrise.
Speaker Notes
Speaker Notes
This requires inhouse resources to respond urgently to patch the conversion machinery. Without conversion, there are no builds.
I’m aware of two commercial alternatives. Both of these use a centralized RoR which supports git and/or hg interfaces for developer interaction.
Speaker Notes
And at least one explicitly does not have a git back end.
You can leave it to developers to scratch their own itch independently. Given diversity of workflows, this may be more cost effective than obtaining consensus.
Areas of particular interest for further study include:
What is the set of enforceable assertions which would ensure the tooling can maintain lossless conversion between DVCS?
What minimum conditions must be maintained in conversions to preclude downstream conflicts?
What workflows can be supported to minimize issues?
Are there best practice incident management protocols for addressing problem commits.
Speaker Notes
The common example is a commit contains sensitive material it should not. There are cases were limiting the scope of distribution can have significant business value.
[This is an experiment in publishing a doc piece by piece as blog entries. Please refer to the main page for additional context.]
In the old days, before DVCS, “commit” only had only one real purpose. It was how you published your work to the rest of the world (or your project’s world at least). With DVCS, you are likely committing quite often, but still only occasionally publishing.
[This is an experiment in publishing a doc piece by piece as blog entries. Please refer to the main page for additional context.]
With all the changes to support git, how will that affect a committer’s workflow? (For developer impact, see this post.)
The primary goal is to work within the existing Mozilla commit policy [1]. Working within that constraint, the idea is “as little as possible”, and this post will try to describe how big “as little” is.
Remember: all existing ways of working with hg will continue to work! These are just going to be some additional options for folks who prefer to use github & bitbucket.
]]>[Refer to the main page for additional context.]
With all the changes to support git, how will that affect a developer’s workflow? (The committer’s workflow will be covered in a future post.)
The idea is “not much at all”, and this post will try to define “not much”.
Remember: all existing ways of working with hg will continue to work! These are just going to be some additional options for folks who prefer to use github & bitbucket.
]]>A long time ago (December of 2011), I sent out a brief survey on DVCS usage to Mozilla folks (and asked them to spread wider). While there were only 42 responses, there were some interesting patterns.
Disclaimer!
I am neither a statistician nor a psychometrician.
I believe you can see the raw summary at via this link. What follows are the informal inferences I drew from the results (remember the disclaimer).
Wowza! I found the killer feature in git - you can have your cake and eat it, too!
Every time I’ve had to move to a new VCS, there’s never been enough time available to move the complete history correctly. Linux had this problem in spades when they moved off BitKeeper onto git in a very-short-time.
The solution? Take your time to convert the history correctly (or not, you can correct later), then allow developers who want it to prepend it on their machines, without making their repo operate any differently from the latest one.
Read on for more about replace/graft feature.
[Refer to the main page for additional context.]
The purpose of this post is to present a very high level picture of the current Firefox build & release process as a set of requirements. Some of these services are provided or supported by groups outside of releng (particularly it & webdev). This diagram will be useful in understanding the impact of changes.
This is the first in a series of posts about the “support git in releng” project. The goal of this project, as stated in bug 713782, is:
… The idea here is to see if we can support git in Mozilla’s RelEng infrastructure, to at least the same standard (or better) as we already currently support hg.
My hope is that blog posts will be a better forum for discussion than the tracking bug 713782, or a wiki page, at this stage.
These posts will highlight the various issues, so that the vague definitions above become clear, as do the intermediate steps needed to achieve completion.
]]>[Refer to the main page for additional context.]
Based on discussions to date, everyone seems to have similar ideas about what “supporting git for releng” means. Later posts will highlight the work needed to ensure the ideal can be achieved, and how to arrive there.
For this post, I intend to limit the viewpoint and scope to that of the developer impact. Release notions (such as “system of record”) and scaling issues won’t be mentioned here. (N.B. Those concerns will be a key part of the path to verifying feasibility, but do not change the goal.)
As a reminder, I’m just talking about repositories that are used to produce products. [1]
]]>One of the things that excited me about the opportunity to work at Mozilla was the chance to change perspectives. After working in many closed environments, I knew the open source world of Mozilla would be different. And that would lead to a re-examination of basic questions, such as:
Q: Are there any significant differences in the role a VCS plays at Mozilla than at j-random-private-enterprise?
A: At the scale of Mozilla Products [1], I don’t believe there are.
But the question is important to ask! (And I hope to ask more of them.)