Code52 Show and Tell: My pet project CopyCat - Lessons learned

CopyCat – A big fat cat walking and waltzing the internetz

Figure: CopyCat – A big fat cat walking and waltzing the internetz
 

I am a pretty normal geek. At least I think I am normal and a geek. I see geeks as people (yes really) that want to play with things, discover new things and build things. When geeks play with things they discover new things quite often.
(Similar story heard and “adapted” from Scott Hanselman) 
After I ‘ve discovered something I want to scream it out to someone:

“Look at this dude!! Isn’t that awesome???”

A normal reaction that you get from a non-geek:

“Wow… Is this showing the weather in form of colors that you could just get by watching out the window?”

This is the pattern for the reaction that you normally get
“Wow… Is this <something> that does <somethingUselessAtBeginning> that you could just get by doing <somethingElse>?”

As you may realized I am such a geek and so I play with different things in my spare time, and build little silly things.
Deep inside me there is still this hope to build something random awesome and get rich and buy me a mountain or an island.
2 other reasons for doing so are:

#2 I want to play with the new shiny toys that I can’t play with during my 9-5 job.

#3 It is so easy to develop something new these days. We have so many Lego blocks available that you can just plug together.

 

For those 3 reasons I created CopyCat.

I have a couple of “awesome” (or maybe silly?) ideas in my "List with Ideas to rule the world" and this February I got around to finally build “CopyCat”

 

 

What is CopyCat?
CopyCat solves 2 problems:

  1. You create some great content and want to make sure that no one is stealing it.
    CopyCat looks around the web for your content and tells you if someone copied it.
  2. You create some content with the help of Google and want to make sure it is still distinctive enough from other content.
    So that no one can say that you copied from others, as it happened to Defense Minister Guttenberg

--------------------------

   

With CopyCat I really wanted to get the minimum usable product online ASAP. A so called MVP

“If your beta product doesn't embarrass you, you are not Startup”
Eric Ries (I think said this somewhere)

After playing around with AppHarbor and Bitbucket (git) I figured that putting an "idea" online is easy.

The Lego Pieces that I used:
From “File New MVC App” until Going Live, there are only a couple of steps. See AppHarbor Support: Deploying your first application using Git

These are the Lego pieces I used

  • ASP.NET MVC
  • jQuery
  • Elmah
  • Cassette (Uglify and Bundle CSS and JS)
  • BitBucket: Source control Git
  • Msys git (No Github Windows client yet)
  • Appharbor: Hosting (with Automated Continuous Deployment)
  • NUnit: Test framework
  • MongoDB: NoSQL repository
  • C# MongoDB driver
  • Nuget: Package Management to get all Lego pieces
  • TDD: No production code without failing test (almost).
    I am not test driving my MVC controllers, because I found that too cumbersome
  • trello: Taskboard for keeping track of features and work

You might ask: Why is this code not open source on Github or somewhere?
Mainly because I couldn’t figure on how to exclude my API keys from other curious readers.

 

 

Lessons learned

Code maintenance
  • Maintaining tests takes time and is definitely worth the effort
     
  • Treat tests the same as production code
    Solution layout of CopyCat
    Figure: Solution layout of CopyCat

    Figure: Test layout of EndToEnd Tests, Integration Tests and Unit Tests
    Figure: Test layout of EndToEnd Tests, Integration Tests and Unit Tests. Makes your dev life easier
Tests
  • Limited Web APIs are a #pain with Integration tests
     
    I am using GoogleSearch API and that is limited to a certain amount of calls per day. Running tests that use that API causes you to reach the API limit pretty quick and you get failing tests for API limit reason.
    Or even worse you have to pay some money, because you used the API too much.
    My solution: Have a façade for your External API, and have only some Integration tests that work with the external API
     
      
  • TDD doesn't help you write an algorithm 
    I tried and I failed.
    I am just much better by thinking on paper. Uncle Bob has an interesting approach called: The Transformation Priority Premise 
    This is my biggest learning point over the last months… Watching the Bowling Kata

    Uncle Bob’s suggestion TDDing an algorithm. From https://twitter.com/peitor/status/200836812384641025
    Figure: Uncle Bob’s suggestion TDDing an algorithm. From https://twitter.com/peitor/status/200836812384641025
     
     
     
  • TDD against different Search APIs is #pain
    For example
         BingSearch! 'My blog with my daily problems'    returned: 2
         BlekkoSearch! 'My blog with my daily problems'    returned: 1
         DaumSearch! 'My blog with my daily problems'    returned: 146538
         GigaBlastSearch! 'My blog with my daily problems'    returned: 1
         GoogleSearch! 'My blog with my daily problems'    returned: 1
         YahooSearch! 'My blog with my daily problems'    returned: 8
    Figure: Search for 'My blog with my daily problems'. Result: Daum doesn't allow to search for whole sentences 


         BingSearch! 'That’s the reason why you should do small checkins'    returned: 0
         BlekkoSearch! 'That’s the reason why you should do small checkins'    returned: 0
         DaumSearch! 'That’s the reason why you should do small checkins'    returned: 84
         GigaBlastSearch! 'That’s the reason why you should do small checkins'    returned: 0
         GoogleSearch! 'That’s the reason why you should do small checkins'    returned: 0
         YahooSearch! 'That’s the reason why you should do small checkins'    returned: 1
    Figure: Another search: Result: Yahoo index is most up to date

    Search APIs. Fake them all

-----------------------------------


Talk is cheap, show me the code
Linus Torvals

Here some tests for your pleasure. Those specify the method that splits text into smaller chunks.
public class TextSplitterSpecs

    [Test]
    public void ReturnSameString_IfWordsParamIsGreater()
    {
        const string twoWords = "Peter Gfader";
        var result = testSplitter.Split(twoWords, maxWords: 3);

        result.OnPosition(0).Should(Be.EqualTo(twoWords));
    }

    [Test]
    public void SplitString_IfWordsAreMoreThanParam()
    {
        const string twoWords = "Peter Gfader";
        var result = testSplitter.Split(twoWords, maxWords: 1);

        result.OnPosition(0).Should(Be.EqualTo("Peter"));
        result.OnPosition(1).Should(Be.EqualTo("Gfader"));
    }

    [Test]
    public void IgnoreWhiteSpace_BetweenSplitWords()
    {
        const string twoWords = "Peter                                    Gfader";
        var result = testSplitter.Split(twoWords, maxWords: 1);

        result.OnPosition(0).Should(Be.EqualTo("Peter"));
        result.OnPosition(1).Should(Be.EqualTo("Gfader"));
    }

 

 

 

The codez

TDD really helped me drive out the core business logic, which is the TextSplitter class.
That class splits text into smaller chunks of text, that I submit to the search engines.

public class TextSplitter
{

    public ICollection<string> Split(string stringToSplit, int maxWords)
    {
        if (string.IsNullOrEmpty(stringToSplit))
        {
            return new Collection<string>();
        }

        stringToSplit = ReplaceSpecialCharacters(stringToSplit);
        
        IEnumerable<string> sentences = Split(stringToSplit, 1, '.');
        
        ICollection<string> sentenceParts = Split(sentences, 1, ',');

        return Split(sentenceParts, maxWords, ' ');
        
    }

    private static string ReplaceSpecialCharacters(string stringToSplit)
    {
        stringToSplit = stringToSplit.Replace("\r\n", ".");
        stringToSplit = stringToSplit.Replace("\n", ".");
        stringToSplit = stringToSplit.Replace("\r", ".");
        stringToSplit = stringToSplit.Replace('?', '.');
        stringToSplit = stringToSplit.Replace('"', ' ');
        return stringToSplit;
    }

    private ICollection<string> Split(IEnumerable<string> sentenceParts, int maxWords, char separator)
    {
        ICollection<string> finalResult = new Collection<string>();
    
        foreach (var sentence in sentenceParts)
        {
            var temp = Split(sentence, maxWords, separator);

            foreach (var word in temp)
            {
                finalResult.Add(word);
            }
        }
        return finalResult;
    }


    private IEnumerable<string> Split(string stringToSplit, int maxWords, char separator)
    {
        if (string.IsNullOrEmpty(stringToSplit))
        {
            return new Collection<string>();
        }

        stringToSplit = RemoveDuplicateSpaces(stringToSplit);

        string[] arrayOfStrings = stringToSplit.Split(separator);
        if (arrayOfStrings.Length > maxWords)
        {
            var result = SplitByMaxWords(arrayOfStrings, maxWords);
            return result;
        }


        return new Collection<string> { stringToSplit };
    }

    private static IEnumerable<string> SplitByMaxWords(IEnumerable<string> arrayOfStrings, int maxWords)
    {
        var result = new Collection<string>();
        int i = 1;
        string word = string.Empty;
        foreach (var singleWord in arrayOfStrings)
        {
            word = word + " " + singleWord;
            if (i >= maxWords)
            {
                result.Add(word.Trim());
                word = string.Empty;
                i = 0;
            }
            i++;
        }

        // add last potential word
        // i>1
        if (!word.IsEmpty())
        {
            result.Add(word.Trim());
        }
        return result;
    }

}

Let me know if you smell any smells

 

VS2012RC
  • The UI of VS2012 RC is not as bad as I expected from all those blog posts
  • The VS2012 Solution Explorer needs more contrast I think
    Figure: VS2012 RC Solution Explorer needs more contrast!
    Figure: VS2012 RC Solution Explorer needs more contrast!

 

Overall Lesson learned
  • Coding is fun

 

My next steps?

  • More features for CopyCat

Yours?

2 comments:

Unknown said...

Nice post. Btw, Haacked (since working at Github) released a GitHub client for Windows. However, you can use that for non-github repos as well:
http://haacked.com/archive/2012/05/30/using-github-for-windows-with-non-github-repositories.aspx

Peter Gfader said...

I have used it and it is quite nice... The startup of the GitHub shell is slow...

Which git client do you use?

Post a Comment

Latest Posts

Popular Posts