Mums with suffix tree

Discussion in 'Computer Science & Culture' started by ANGELA2, May 24, 2003.

Thread Status:
Not open for further replies.
  1. ANGELA2 Registered Member

    Messages:
    5
    Hello everybody.

    I have a project to explain how i can find mums in 2 strings or more. i read some articles and i understood how i can find 2 matching but i didnt understand how to know what is the maximal macthing.
    if someone here know "suffix tree" and can help me iwill be grateful.
     
  2. Google AdSense Guest Advertisement



    to hide all adverts.
  3. SG-N Registered Senior Member

    Messages:
    1,051
    I did it last year, so I will have a look on my docs. However could you explain what you want ? I don't understand "mums", and what "maximal" do you want ? (longest common string... ?)
     
  4. Google AdSense Guest Advertisement



    to hide all adverts.
  5. ANGELA2 Registered Member

    Messages:
    5
    HEY


    this is what i need:
    • given genomes A and B
    – find all maximal, unique, matching
    subsequences (MUMs)
    – extract the longest possible set of matches that
    occur in the same order in both genomes
    – output the alignment

    In generaly i have many strings and i need to know what is the maximal matching that occur only 1 time each string.
    for example:
    A- abcart
    B- dfcat
    ca & t are the Mums.

    I nees to know how the algorthm works(with suffix tree).
     
  6. Google AdSense Guest Advertisement



    to hide all adverts.
  7. SG-N Registered Senior Member

    Messages:
    1,051
    According that you already know how to make a generalized suffix tree (a suffix tree that contains all the suffix of your sequences - use Ukkonen if you don't know how ro build it) you can find the longest common part. You will have to use the consensus nodes (a node is called a consensus node if we can find all the sequences numbers in the sub-tree's leaves) and more precisely the maximal consensus nodes (no other consensus nodes in the sub-tree).
    OK, so you need to print these maximal consensus nodes. To do it, you will use a Depth First Search algorithm (DFS) to see all the nodes :
    Code:
    [B]procedure dfs([/B]s : node[B])[/B]
         num := num + 1;
         visited(s) := num;
         for each son of s (called w)
              if visited(w) = 0 then dfs(w);
    
    You will initialize num to 0 and s is the current node inicialized to the root of your tree. Thus visited(node) is the rank of this node.
    Here is the bad news : we didn't finished this exercice... However we had the main idea : you can get on the tree (with dfs) and for each node, find a recursive way to check if it is a maximal consensus node. If it is, you put it in a list. Then select the node(s) with the biggest rank...
    A possibility : use a structure for your nodes (that could keep its rank).
     
  8. SG-N Registered Senior Member

    Messages:
    1,051
    If you've got any questions...

    Please Register or Log in to view the hidden image!

     
  9. ANGELA2 Registered Member

    Messages:
    5
    Thanks alot!!!!
    I'm now going to try build it.
    buy the way, i have one more question: do you know a way to find the biggest nodes online ? cause with DFS i need to do the procedure only after the suffix tree was built. if i will know which nodes are the biggest in the same time that the tree is built up i will benfit "Run time".
    and again, thanks alot!

    Please Register or Log in to view the hidden image!

     
  10. SG-N Registered Senior Member

    Messages:
    1,051
    Building a tree is not really hard if you use a structure or a class (I hope that you'll use C or C++...). The methode I gave you is using a tree and I don't know any other methodes. In fact I don't know if you can get faster but I don't think so.
    Just try to do it this way and if you've got a problem I will check if I've got a program that does something like that : I made a project in which I had to use a tree, so I may had an optimised dfs (I don't remember). Anyway, try it alone at first... (I will not be able to give you the algorithme before monday).
    Have a nice week !
     
  11. ANGELA2 Registered Member

    Messages:
    5
    OK
    I will try it.
    Thanks alot.
     
Thread Status:
Not open for further replies.

Share This Page