@@ -357,6 +357,142 @@ takes a list of columns to sort by.
357357 tips = tips.sort_values([' sex' , ' total_bill' ])
358358 tips.head()
359359
360+
361+ String Processing
362+ -----------------
363+
364+ Length
365+ ~~~~~~
366+
367+ SAS determines the length of a character string with the ``LENGTHN ``
368+ and ``LENGTHC `` functions. ``LENGTHN `` excludes trailing blanks and
369+ ``LENGTHC `` includes trailing blanks.
370+
371+ .. code-block :: none
372+
373+ data _null_;
374+ set tips;
375+ put(LENGTHN(time));
376+ put(LENGTHC(time));
377+ run;
378+
379+ Python determines the length of a character string with the ``len `` function.
380+ ``len `` includes trailing blanks. Use ``len `` and ``rstrip `` to exclude
381+ trailing blanks.
382+
383+ .. code-block :: none
384+
385+ tips['time'].str.len()
386+ tips['time'].str.rstrip().str.len()
387+
388+
389+ Find
390+ ~~~~
391+
392+ SAS determines the position of a character in a string with the
393+ ``FINDW `` function. ``FINDW `` takes the string defined by
394+ the first argument and searches for the first position of the substring
395+ you supply as the second argument.
396+
397+ .. code-block :: none
398+
399+ data _null_;
400+ set tips;
401+ put(FINDW(sex,'ALE'));
402+ run;
403+
404+ Python determines the position of a character in a string with the
405+ ``find `` function. ``find `` searches for the first position of the
406+ substring. If the substring is found, the function returns its
407+ position. Keep in mind that Python indexes are zero-based and
408+ the function will return -1 if it fails to find the substring.
409+
410+ .. code-block :: none
411+
412+ tips['sex'].str.find("ALE")
413+
414+
415+ Substring
416+ ~~~~~~~~~
417+
418+ SAS extracts a substring from a string based on its position
419+ with the ``SUBSTR `` function.
420+
421+ .. code-block :: none
422+
423+ data _null_;
424+ set tips;
425+ put(substr(sex,1,1));
426+ run;
427+
428+ In Python, you can use ``[] `` notation to extract a substring
429+ from a string by position locations. Keep in mind that Python
430+ indexes are zero-based.
431+
432+ .. code-block :: none
433+
434+ tips['sex'].str[0:1]
435+
436+
437+ Scan
438+ ~~~~
439+
440+ The SAS ``SCAN `` function returns the nth word from a string.
441+ The first argument is the string you want to parse and the
442+ second argument specifies which word you want to extract.
443+
444+ .. code-block :: none
445+
446+ data firstlast;
447+ input String $60.;
448+ First_Name = scan(string, 1);
449+ Last_Name = scan(string, -1);
450+ datalines2;
451+ John Smith;
452+ Jane Cook;
453+ ;;;
454+ run;
455+
456+ Python extracts a substring from a string based on its text
457+ by using regular expressions. There are much more powerful
458+ approaches, but this just shows a simple approach.
459+
460+ .. code-block :: none
461+
462+ firstlast = pd.DataFrame({'String': ['John Smith', 'Jane Cook']})
463+ firstlast['First_Name'] = firstlast['String'].str.split(" ", expand=True)[0]
464+ firstlast['Last_Name'] = firstlast['String'].str.rsplit(" ", expand=True)[0]
465+
466+
467+ Upcase, Lowcase, and Propcase
468+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
469+
470+ The SAS ``UPCASE ``, ``LOWCASE ``, and ``PROPCASE `` functions change
471+ the case of the argument.
472+
473+ .. code-block :: none
474+
475+ data firstlast;
476+ input String $60.;
477+ string_up = UPCASE(string);
478+ string_low = LOWCASE(string);
479+ string_prop = PROPCASE(string);
480+ datalines2;
481+ John Smith;
482+ Jane Cook;
483+ ;;;
484+ run;
485+
486+ The equivalent Python functions are ``upper ``, ``lower ``, and ``title ``.
487+
488+ .. code-block :: none
489+
490+ firstlast = pd.DataFrame({'String': ['John Smith', 'Jane Cook']})
491+ firstlast['string_up'] = firstlast['String'].str.upper()
492+ firstlast['string_low'] = firstlast['String'].str.lower()
493+ firstlast['string_prop'] = firstlast['String'].str.title()
494+
495+
360496 Merging
361497-------
362498
0 commit comments